KR20240046975A

KR20240046975A - Data-adaptive tensor analysis method and tensor analysis apparatus

Info

Publication number: KR20240046975A
Application number: KR1020220126028A
Authority: KR
Inventors: 강유; 박용찬; 손상준; 조민용
Original assignee: 서울대학교산학협력단
Priority date: 2022-10-04
Filing date: 2022-10-04
Publication date: 2024-04-12

Abstract

A data-adaptive tensor analysis method and a tensor analysis apparatus are provided. The tensor analysis apparatus includes an input/output unit configured to receive data and output a result of calculating and processing the data, a storage unit configured to store a program for performing the data-adaptive tensor analysis method, and a control unit including at least one process and configured to analyze a tensor stream received through the input/output unit by executing the program, wherein the control unit outputs a tensor decomposition result calculated through a tensor decomposition method using a complementary matrix from a new tensor slice of the tensor stream, detects a change point of the tensor stream based on the tensor decomposition result, and performs tensor decomposition in different ways according to a degree of change in the change point of the tensor stream.

Description

Data adaptive tensor analysis method and tensor analysis device {DATA-ADAPTIVE TENSOR ANALYSIS METHOD AND TENSOR ANALYSIS APPARATUS}

본 명세서에서 개시되는 실시예들은 데이터 적응형 텐서 분석 방법 및 장치에 관한 것으로, 더욱 상세하게는, 텐서 스트림을 적응적으로 분해하여 텐서 분해의 정확도 및 연산량을 모두 개선할 수 있는 텐서 분석 방법 및 장치에 관한 것이다. Embodiments disclosed herein relate to a data adaptive tensor analysis method and device, and more specifically, to a tensor analysis method and device that can improve both the accuracy and computational amount of tensor decomposition by adaptively decomposing a tensor stream. It's about.

텐서 분해는 비디오 데이터, 센서 데이터 등과 같은 3차원 이상의 데이터를 분석하는데 쓰이는 방법 중 하나이며, 각 차원에 해당하는 인자 행렬들(factor matrices)과 각 개념간 세기를 나타내는 코어 텐서를 추출한다.Tensor decomposition is one of the methods used to analyze three-dimensional or more data such as video data, sensor data, etc., and extracts factor matrices corresponding to each dimension and a core tensor representing the intensity between each concept.

기존의 텐서 분해 기법들은 빠른 수행 속도를 위해 시간축 업데이트를 생략하거나 이전 시간 단계의 인자 행렬을 그대로 이용하는 방식을 채택하였는데, 이들은 데이터의 패턴이 시간에 따라 변할 때 정확도가 급격하게 떨어지는 단점이 있다.Existing tensor decomposition techniques omit the time axis update or use the factor matrix of the previous time step for fast execution speed, but these have the disadvantage of drastically decreasing accuracy when the data pattern changes over time.

따라서, 텐서 분해의 정확도 및 처리 속도를 향상시킬 수 있는 데이터 적응형 텐서 분석 기술이 필요하다. Therefore, data adaptive tensor analysis technology that can improve the accuracy and processing speed of tensor decomposition is needed.

참고로, 특허문헌 1은 관찰된 복수의 텐서 간의 보간 방법에 관한 발명이고, 특허문헌 2는 텐서 인자 분해 처리 장치에 관한 발명이고, 특허문헌 3은 데이터 분석 장치에 관한 발명으로, 특허문헌 1 내지 특허문헌 3은 데이터의 분해와 관련된 기술에 대한 일반적인 내용만을 개시하고 있을 뿐, 텐서 분해의 정확도 및 처리 속도를 향상시킬 수 있는 데이터 적응형 텐서 분석 기술을 제공하지 아니한다.For reference, Patent Document 1 is an invention related to an interpolation method between a plurality of observed tensors, Patent Document 2 is an invention related to a tensor factor decomposition processing device, and Patent Document 3 is an invention related to a data analysis device. Patent Documents 1 to 1 Patent Document 3 only discloses general information about technologies related to data decomposition and does not provide data adaptive tensor analysis technology that can improve the accuracy and processing speed of tensor decomposition.

미국등록특허 제8301379호(2012.10.30. 공고)US Patent No. 8301379 (announced on October 30, 2012) 일본공개특허 제2016-173784호(2016.09.29. 공개)Japanese Patent Publication No. 2016-173784 (published on September 29, 2016) 일본등록특허 제6608721호(2019.11.01. 공고)Japanese Registered Patent No. 6608721 (announced on November 1, 2019)

본 명세서에서 개시되는 실시예들은, 텐서 스트림의 텐서 분해 과정에서 보완 행렬을 도입하고 시간축에 대한 변화가 없을 때에만 보완 행렬이 업데이트되도록 설정하여 텐서 분해의 연산량을 감소시키며, 텐서 스트림의 변화점을 탐지하고 이에 따라 데이터 재분해를 수행하여 텐서 분해 결과의 정확도를 향상시키는 데이터 적응형 텐서 분석 방법 및 장치를 제공하는데 그 목적이 있다. Embodiments disclosed in this specification reduce the amount of computation of tensor decomposition by introducing a supplementary matrix in the tensor decomposition process of the tensor stream and setting the supplementary matrix to be updated only when there is no change in the time axis, and detecting change points in the tensor stream. The purpose is to provide a data adaptive tensor analysis method and device that improves the accuracy of tensor decomposition results by detecting and re-decomposing data accordingly.

본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 일 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허청구범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.Other objects and advantages of the present invention can be understood from the following description and will be more clearly understood through an example. In addition, it will be readily apparent that the objects and advantages of the present invention can be realized by means and combinations thereof as indicated in the claims.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 텐서 분석 장치는 데이터를 입력 받고, 이를 연산 처리한 결과를 출력하기 위한 입출력부; 데이터 적응형 텐서 분석 방법을 수행하기 위한 프로그램이 저장되는 저장부; 및 적어도 하나의 프로세스를 포함하며, 상기 프로그램을 실행시킴으로써 상기 입출력부를 통해 수신된 텐서 스트림을 분석하는 제어부를 포함하고, 상기 제어부는, 상기 텐서 스트림의 신규 텐서 슬라이스로부터 보완 행렬을 이용한 텐서 분해 방식을 통해 산출한 텐서 분해 결과를 출력하고, 상기 텐서 분해 결과를 기반으로 상기 텐서 스트림의 변화점을 탐지하고, 상기 텐서 스트림의 변화점에서의 변화 정도에 따라 상이한 방식으로 텐서 분해를 수행한다. As a technical means for achieving the above-described technical problem, a tensor analysis device includes an input/output unit for receiving data as input and outputting the results of processing the data; a storage unit storing a program for performing a data adaptive tensor analysis method; and at least one process, including a control unit that analyzes the tensor stream received through the input/output unit by executing the program, wherein the control unit performs a tensor decomposition method using a complementary matrix from a new tensor slice of the tensor stream. The tensor decomposition result calculated through is output, the change point of the tensor stream is detected based on the tensor decomposition result, and the tensor decomposition is performed in different ways depending on the degree of change in the change point of the tensor stream.

다른 실시예에 따르면, 텐서 분석 장치에 의한 데이터 적응형 텐서 분석 방법에 있어서, 텐서 스트림의 신규 텐서 슬라이스로부터 보완 행렬을 이용한 텐서 분해 방식을 통해 산출한 텐서 분해 결과를 출력하는 단계; 상기 텐서 분해 결과를 기반으로 상기 텐서 스트림의 변화점을 탐지하는 단계; 및 상기 텐서 스트림의 변화점에서의 변화 정도에 따라 상이한 방식으로 텐서 분해를 수행하는 단계를 포함한다.According to another embodiment, a data adaptive tensor analysis method using a tensor analysis device includes the steps of outputting a tensor decomposition result calculated from a new tensor slice of a tensor stream through a tensor decomposition method using a complementary matrix; Detecting a change point in the tensor stream based on the tensor decomposition result; and performing tensor decomposition in different ways depending on the degree of change in the change point of the tensor stream.

또 다른 실시예에 따르면, 기록매체는, 데이터 적응형 텐서 분석 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록 매체이다. According to another embodiment, the recording medium is a computer-readable recording medium on which a program for performing an adaptive data tensor analysis method is recorded.

또 다른 실시예에 따르면, 컴퓨터 프로그램은, 텐서 분석 장치에 의해 수행되며, 데이터 적응형 텐서 분석 방법을 수행하기 위해 기록 매체에 저장된 컴퓨터 프로그램이다.According to another embodiment, the computer program is performed by a tensor analysis device and stored in a recording medium to perform a data adaptive tensor analysis method.

전술한 과제 해결 수단 중 어느 하나에 의하면, 텐서 스트림의 분해 인자 업데이트를 위한 연산 비용을 줄일 수 있는 데이터 적응형 텐서 분석 방법 및 장치를 제시할 수 있다. According to any one of the means for solving the above-described problem, a data adaptive tensor analysis method and device that can reduce the computational cost for updating the decomposition factor of the tensor stream can be proposed.

또한, 전술한 과제 해결 수단 중 어느 하나에 의하면, 텐서 스트림에서 데이터의 주제(themes)를 감지하고 변화점을 검출할 수 있는 데이터 적응형 텐서 분석 방법 및 장치를 제시할 수 있다.In addition, according to any one of the above-mentioned problem solving means, a data adaptive tensor analysis method and device that can detect themes of data and change points in a tensor stream can be presented.

또한, 전술한 과제 해결 수단 중 어느 하나에 의하면, 새로운 주제가 감지되면 텐서 스트림을 다시 분해하여 분해 정확도를 향상시킬 수 있는 데이터 적응형 텐서 분석 방법 및 장치를 제시할 수 있다.In addition, according to any one of the above-mentioned problem solving means, a data adaptive tensor analysis method and device that can improve decomposition accuracy by decomposing the tensor stream again when a new topic is detected can be proposed.

개시되는 실시예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 개시되는 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects that can be obtained from the disclosed embodiments are not limited to the effects mentioned above, and other effects not mentioned are clear to those skilled in the art to which the disclosed embodiments belong from the description below. It will be understandable.

이하, 첨부되는 도면들은 본 명세서에 개시되는 바람직한 실시예를 예시하는 것이며, 발명을 실시하기 위한 구체적인 내용들과 함께 본 명세서에 개시되는 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 명세서에 개시되는 내용은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 일 실시예에 따른 텐서 분석 장치의 블록도이다.
도 2는 비교예에 따른 텐서 스트림의 분해 결과를 예시한 도면이다.
도 3은 일 실시예에 따른 텐서 분석 장치에 의한 텐서 스트림의 분해 결과를 예시한 도면이다.
도 4는 일 실시예에 따른 텐서 분석 장치에 의한 전체 텐서 분해 동작을 예시한 도면이다.
도 5는 다른 실시예에 따른 데이터 적응형 텐서 분석 방법의 흐름도이다.
도 6 내지 도 9는 실시예들에 따라 시뮬레이션한 텐서 스트림의 분해 성능을 예시한 도면이다.Hereinafter, the attached drawings illustrate preferred embodiments disclosed in the present specification, and serve to further understand the technical idea disclosed in the present specification along with specific details for carrying out the invention, and thus the drawings disclosed in the present specification The contents should not be construed as limited to the matters described in such drawings.
1 is a block diagram of a tensor analysis device according to an embodiment.
Figure 2 is a diagram illustrating the results of decomposition of a tensor stream according to a comparative example.
Figure 3 is a diagram illustrating the results of decomposition of a tensor stream by a tensor analysis device according to an embodiment.
Figure 4 is a diagram illustrating an overall tensor decomposition operation by a tensor analysis device according to an embodiment.
Figure 5 is a flowchart of a data adaptive tensor analysis method according to another embodiment.
Figures 6 to 9 are diagrams illustrating decomposition performance of a tensor stream simulated according to embodiments.

아래에서는 첨부한 도면을 참조하여 다양한 실시예들을 상세히 설명한다. 아래에서 설명되는 실시예들은 여러 가지 상이한 형태로 변형되어 실시될 수도 있다. 실시예들의 특징을 보다 명확히 설명하기 위하여, 이하의 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 널리 알려져 있는 사항들에 관해서 자세한 설명은 생략하였다. 그리고, 도면에서 실시예들의 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Below, various embodiments will be described in detail with reference to the attached drawings. The embodiments described below may be modified and implemented in various different forms. In order to more clearly explain the characteristics of the embodiments, detailed descriptions of matters widely known to those skilled in the art to which the following embodiments belong have been omitted. In addition, in the drawings, parts that are not related to the description of the embodiments are omitted, and similar parts are given similar reference numerals throughout the specification.

명세서 전체에서, 어떤 구성이 다른 구성과 "연결"되어 있다고 할 때, 이는 '직접적으로 연결'되어 있는 경우뿐 아니라, '그 중간에 다른 구성을 사이에 두고 연결'되어 있는 경우도 포함한다. 또한, 어떤 구성이 어떤 구성을 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한, 그 외 다른 구성을 제외하는 것이 아니라 다른 구성들을 더 포함할 수도 있음을 의미한다.Throughout the specification, when a configuration is said to be “connected” to another configuration, this includes not only cases where it is “directly connected,” but also cases where it is “connected with another configuration in between.” In addition, when a configuration “includes” a configuration, this means that other configurations may be further included rather than excluding other configurations, unless specifically stated to the contrary.

이하 첨부된 도면을 참고하여 실시예들을 상세히 설명하기로 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings.

다만 이를 설명하기에 앞서, 아래에서 사용되는 용어들의 의미를 먼저 정의한다.However, before explaining this, we first define the meaning of the terms used below.

아래의 표 1은 일 실시예에서 사용되는 용어들을 정의한 것이다. Table 1 below defines terms used in one embodiment.

[표 1][Table 1]

본 명세서에서 는 행렬을 의미하고, 는 전치 행렬을 의미하고, 는 의사역 행렬을 의미한다. 는 텐서의 n차 모드의 길이를 나타내고, 은 텐서의 랭크를 나타낸다. 는 N차 텐서를 나타낸다. 는 텐서의 n차 모드에 대한 분해 인자 행렬을 나타낸다. 는 텐서 의 프로베니우스 놈(Frobenius norm)을 나타낸다. 는 텐서 의 모드-n 언폴드 행렬을 나타낸다. 언폴드는 텐서를 행렬화하는 것을 의미한다. 는 크루스칼 연산자(Kruskal operator)를 나타낸다. 예컨대, 와 같이 표현할 수 있다. 는 크로네커 곱(Kronecker product)을 나타내고, 는 카트리-라오 곱(Khatri-Rao product)을 나타내고, 는 요소별 곱(element-wise product), 즉 아다마르 곱(Hadamard product)을 나타내고, 는 요소별 나눗셈(element-wise division)을 나타낸다.In this specification means a matrix, means the transpose matrix, means a pseudoinverse matrix. represents the length of the nth mode of the tensor, represents the rank of the tensor. represents the Nth order tensor. represents the decomposition factor matrix for the nth mode of the tensor. is a tensor It represents the Frobenius norm of . is a tensor represents the mode-n unfolding matrix of . Unfolding means matrixing a tensor. represents the Kruskal operator. for example, It can be expressed as: represents the Kronecker product, represents the Khatri-Rao product, represents the element-wise product, that is, the Hadamard product, represents element-wise division.

<텐서><tensor>

텐서(tensor)는 벡터(1차 텐서) 및 행렬(2차 텐서)을 더 높은 차수로 일반화하는 다차원 배열이다. 본 명세서에서 벡터는 굵은 소문자(예컨대, a), 행렬은 굵은 대문자(예컨대, A), 텐서를 굵은 서체 문자(예컨대, )로 표시할 수 있다. N차 텐서 는 길이가 I₁, ... , I_N인 N 모드들을 각각 가진다. 모드는 텐서의 각 차원을 의미한다. 텐서는 모드들에 따라 언폴드(unfolded) 또는 행렬화를 할 수 있다. n번째 모드에 따른 텐서 의 언폴드 행렬은 으로 표현될 수 있다. 텐서가 언폴드되면, 요소들은 행렬 형태로 재정렬된다. 아래 수학식 1과 같이, 텐서 의 모드-n 언폴드 행렬 은 텐서 의 (i₁, ... , i_N)번째 요소를 언폴드 행렬 의 (i_n, ... , i_N)번째 요소에 매핑한다. A tensor is a multidimensional array that generalizes vectors (first-order tensors) and matrices (second-order tensors) to higher orders. In this specification, a vector is a bold lowercase letter (e.g., a), a matrix is a bold uppercase letter (e.g., A), and a tensor is a bold font letter (e.g., ) can be displayed. Nth order tensor has N modes of length I ₁ , ... , I _N respectively. Mode refers to each dimension of the tensor. Tensors can be unfolded or matrixed depending on the modes. Tensor according to the nth mode The unfolding matrix of is It can be expressed as When a tensor is unfolded, its elements are rearranged into a matrix. As shown in Equation 1 below, the tensor The mode-n unfold matrix of silver tensor Unfold the (i ₁ , ... , i _N )th element of the matrix Maps to the (i _n , ... , i _N )th element of .

[수학식 1][Equation 1]

텐서의 프로베니우스 놈(Frobenius norm; )을 아래 수학식 2와 같이 표현할 수 있다. Frobenius norm of a tensor; ) can be expressed as Equation 2 below.

[수학식 2][Equation 2]

아래와 같이 중요한 행렬 곱을 정의한다. 행렬들 및 의 크로네커 곱(Kronecker product) 은 크기 를 갖는 행렬이다. We define important matrix multiplications as follows. processions and The Kronecker product of silver size It is a matrix with .

아다마르 곱(Hadamard product) 및 카트리-라오 곱(Khatri-Rao product) 은 텐서 분해에서 필수적으로 사용되는 행렬 곱이다. 아다마르 곱은 동일한 크기를 갖는 행렬 및 의 요소별 곱이다. 아래 수학식 3과 같이 카트리-라오 곱은 열별(column-wise) 크로네커 곱을 나타낸다.Hadamard product and Khatri-Rao product. is a matrix product that is essentially used in tensor decomposition. Hadamard product is a matrix with the same size and It is the element-wise product of . As shown in Equation 3 below, the Khatri-Rao product represents the column-wise Kronecker product.

[수학식 3][Equation 3]

및 는 및 의 열 벡터를 각각 나타낸다. and Is and represents the column vectors of , respectively.

<텐서 분해><Tensor decomposition>

CP(CANDECOMP/PARAFAC) 분해는 텐서 분해 방법에서 잘 알려진 분해 방법으로, 다른 많은 변형에서 핵심 블록으로 취급된다. CP 분해는 아래 수학식 4와 같이 텐서를 랭크-1 텐서의 합으로 인수 분해한다.CP(CANDECOMP/PARAFAC) decomposition is a well-known decomposition method in tensor decomposition methods and is treated as a core block in many other variants. CP decomposition factors the tensor into the sum of rank-1 tensors, as shown in Equation 4 below.

[수학식 4][Equation 4]

여기서 랭크-1 텐서 집합의 숫자 R은 분해된 텐서의 랭크라고 한다. 분해 인자 행렬 은 아래 수학식 5와 같이 랭크-1 구성요소의 벡터 조합을 나타낸다.Here, the number R of the rank-1 tensor set is called the rank of the decomposed tensor. decomposition factor matrix represents a vector combination of rank-1 components as shown in Equation 5 below.

[수학식 5][Equation 5]

텐서 의 CP 분해 결과는 크루스칼 연산자(Kruskal operator) 및 언폴드 행렬을 이용하여 아래와 같이 표현될 수 있다. 크루스칼 연산자는 분해 인자 행렬에서 열들의 외적의 합에 대한 단축표기를 제공한다.tensor The result of CP decomposition is the Kruskal operator. and the unfolding matrix can be expressed as follows. The Kruskal operator provides a shorthand notation for the sum of the cross products of columns in a decomposition factor matrix.

CP 분해는 추정 오차 를 최소화하는 분해 인자 행렬을 찾는 것을 목적으로 한다. 추정 오차는 아래 수학식 6과 같이 정의될 수 있다.CP decomposition is an estimation error The goal is to find a decomposition factor matrix that minimizes . The estimation error can be defined as Equation 6 below.

[수학식 6][Equation 6]

CP-ALS(Alternating Least Square)는 최적화 문제에 광범위하게 사용된다. ALS의 주요 아이디어는 원래 문제를 N개의 하위 문제로 나누는 것으로, 각 하위 문제는 나머지 인자 행렬을 고정된 상태로 유지하는 동안 하나의 인자 행렬을 업데이트하는 것에 대응한다. 아래 수학식 7과 같이 분해 인자에 대해서 적용할 수 있다.Alternating Least Square (CP-ALS) is widely used in optimization problems. The main idea of ALS is to divide the original problem into N subproblems, where each subproblem corresponds to updating one factor matrix while keeping the remaining factor matrices fixed. It can be applied to the decomposition factor as shown in Equation 7 below.

[수학식 7][Equation 7]

텐서 데이터는 크게 시간에 따라 크기와 값이 변하는 동적(dynamic) 유형과 그렇지 않은 정적(static) 유형으로 구분할 수 있다. Tensor data can be broadly divided into dynamic types, whose size and value changes over time, and static types, which do not.

정적 텐서 분해의 예시로는 시간 변화를 고려하지 않고 전체 텐서 데이터를 분해하는 Full-CP가 있다. An example of static tensor decomposition is Full-CP, which decomposes the entire tensor data without considering time changes.

<온라인 텐서 분해><Online tensor decomposition>

시간에 따라 변하는 시간 진전 텐서(time-evolving tensor)를 위한 효율적인 온라인 알고리즘을 텐서 분해에 적용할 수 있다. 텐서를 각 시간 단계에서 주어진 "슬라이스" 집합이라 가정한다. N차 시간 진전 텐서 에 대해서, 의 형태로 확장할 수 있다. 여기서 는 이전 텐서 데이터이고, 는 하나의 시간 단계에서의 신규 텐서 슬라이스이다. 온라인 텐서 분해는 이전 텐서 분해 결과 가 주어진 텐서 를 효율적으로 분해하는 것을 목표로 한다.Efficient online algorithms for time-evolving tensors can be applied to tensor decomposition. Assume the tensor is a set of "slices" given at each time step. Nth time progress tensor about, It can be expanded in the form of . here is the previous tensor data, is a new tensor slice at one time step. Online tensor decomposition is the result of the previous tensor decomposition Given a tensor The goal is to efficiently decompose.

[수학식 8][Equation 8]

여기서 는 이고, 는 이다.here Is ego, Is am.

온라인 텐서 분해는 아래 수학식 9와 같이 추정 오차 를 최소화하는 것을 목적으로 한다.Online tensor decomposition produces an estimation error as shown in Equation 9 below: The purpose is to minimize.

[수학식 9][Equation 9]

기존의 CP 분해 기반의 텐서 스트림 분해는 주로 미리 계산된 보조 행렬로 비-시간 인자만 업데이트하거나 이전 분해 결과를 고려한 전체 인자를 업데이트한다.Tensor stream decomposition based on existing CP decomposition mainly updates only non-temporal factors with pre-computed auxiliary matrices or updates all factors considering previous decomposition results.

동적 텐서 분해를 위한 기존 분해 방식은 다음과 같다.The existing decomposition method for dynamic tensor decomposition is as follows.

<온라인 CP 분해><Online CP disassembly>

온라인 CP(Online CP) 분해는 신규 텐서 슬라이스들을 효율적으로 분해하기 위해 이전 시간 인자를 보존한다. 비-시간 인자와 부분적 시간 인자를 업데이트한 후 단순히 시간 인자 행렬의 일부를 이전 행렬에 추가한다. 온라인 CP 분해는 보조 행렬을 도입하여 카트리-라오 곱 및 아다마르 곱과 중복 계산을 방지한다. ALS 반복 전에 보완 행렬(complementary matrix)을 계산하고 새로운 분해를 생성한다. 낮은 계산 비용에도 데이터의 주제 변경에 대한 고려가 부족하여 정확한 분해를 달성할 수 없다.Online CP decomposition preserves the previous time factor to efficiently decompose new tensor slices. After updating the non-time factors and partial time factors, we simply add part of the time factor matrix to the previous matrix. Online CP decomposition introduces an auxiliary matrix to avoid double computation with the Khatri-Rao product and Hadamard product. Before ALS iteration, the complementary matrix is calculated and a new decomposition is generated. Even with low computational costs, accurate decomposition cannot be achieved due to lack of consideration of subject changes in the data.

온라인 CP 분해는 전체 시간 단계에서 동일한 시간 인자를 적용한다. 가로축에 따른 시간 인자의 길이는 각 시간 단계에 대해 증가한다. 정적 텐서 분해와 달리, 온라인 CP 분해는 추가 제약 조건이 있는 근사치를 사용하여 계산 비용을 줄인다. 비-시간 모드만 업데이트하고, 모든 시간 단계에 대해 동일한 시간 인자를 재사용한다. 만약 텐서 스트림의 주제가 A->B, B'->C, 또는 C->D로 변경되는 것처럼, 들어오는 텐서가 이전 텐서와 다른 주제를 갖는 경우 정확도가 크게 손실된다. 온라인 CP 분해는 데이터의 주제 변경에 대한 고려가 부족하여 정확한 분해를 달성할 수 없다.Online CP decomposition applies the same time factor across all time steps. The length of the time factor along the horizontal axis increases for each time step. Unlike static tensor decomposition, online CP decomposition uses an approximation with additional constraints to reduce computational cost. Update only non-timed modes and reuse the same time argument for all time steps. There is a significant loss of accuracy if the incoming tensor has a different topic than the previous tensor, such as when the topic of the tensor stream changes from A->B, B'->C, or C->D. Online CP decomposition cannot achieve accurate decomposition due to lack of consideration of subject changes in the data.

이러한 문제를 해결하기 위해서 본 실시예는 텐서 스트림의 지역적 오차를 추적하고 데이터 주제의 변화점을 감지하는 방식을 적용하여 데이터에 일관성 없는 시간 패턴이 있는 경우에도 정확한 분해가 가능하다.To solve this problem, this embodiment tracks local errors in the tensor stream and detects change points in the data subject, enabling accurate decomposition even when the data has inconsistent temporal patterns.

<동적 텐서 분해 DTD><Dynamic tensor decomposition DTD>

동적 텐서 분해(Dynamic Tensor Decomposition, DTD)는 불완전한 다중-측면 텐서 스트림의 누락된 항목을 채우는 저-랭크 텐서 완성 방법인 MAST(Multi-Aspect Streaming Tensor)의 일부로 도입되었다. 이 방법은 새로운 슬라이스가 들어올 때까지 누적된 텐서를 근사하는 이전 분해를 재사용하여 시간 복잡성을 줄인다. 특히 N차 텐서 스트림의 경우 DTD는 데이터를 각 시간 단계에 대해 2^N 서브-텐서로 분할하고 이진 튜플 을 사용하여 서브-텐서를 나타낸다. 그러면 을 으로 근사화하고, 온라인 텐서 분해의 추정 오차 을 아래 수학식 10과 같이 다시 공식화할 수 있다. Dynamic Tensor Decomposition (DTD) was introduced as part of Multi-Aspect Streaming Tensor (MAST), a low-rank tensor completion method that fills in missing entries in incomplete multi-aspect tensor streams. This method reduces time complexity by reusing previous decompositions to approximate the accumulated tensor until a new slice comes in. In particular, for an N-order tensor stream, the DTD splits the data into 2 ^N sub-tensors for each time step and binary tuples Use to represent the sub-tensor. then second Approximating to , the estimation error of online tensor decomposition is can be reformulated as in Equation 10 below.

[수학식 10][Equation 10]

여기서 는 이전 분해 오류의 영향을 완화하는 망각 인자(forgetting factor)이다. DTD는 효율적인 방법이지만 들어오는 텐서가 이전 텐서와 완전히 다른 패턴을 가질 때 여전히 이전 분해 결과를 재사용하려고 시도하므로 정확도가 떨어진다.here is a forgetting factor that mitigates the effects of previous decomposition errors. DTD is an efficient method, but it is less accurate as it still tries to reuse previous decomposition results when the incoming tensor has a completely different pattern than the previous tensor.

이러한 문제를 해결하기 위해서 본 실시예는 "재분해" 과정을 거쳐 데이터의 급격한 변화에 빠르게 적응하고 분해 정확도를 크게 향상시킨다.To solve this problem, this embodiment quickly adapts to rapid changes in data through a “re-decomposition” process and greatly improves decomposition accuracy.

본 실시예는 데이터 변경에 적응적으로 정확하게 온라인 텐서 분해를 수행하며, 시간 및 메모리 측면에서 효율적이다. 시간에 따라 변하는 텐서를 분해할 때, 속도와 메모리 사용량을 희생하지 않고 정확도를 높이는 것이 요구된다. 데이터의 주제들(themes)은 시간이 지남에 따라 변한다는 점을 고려하여 주제의 변화점을 감지하고 변화 정도에 따라 다른 전략을 사용한다. This embodiment performs online tensor decomposition adaptively and accurately to data changes, and is efficient in terms of time and memory. When decomposing tensors that change over time, it is necessary to increase accuracy without sacrificing speed and memory usage. Considering that data themes change over time, changes in themes are detected and different strategies are used depending on the degree of change.

본 실시예는 크게 3 가지 측면에서 문제를 개선한다.This embodiment largely improves the problem in three aspects.

첫번째로 계산 비용을 줄인다. 텐서 스트림의 분해 인자 업데이트를 위한 산술 비용을 줄이기 위해서 텐서 스트림을 위한 업데이트 가능한 프레임워크를 구축한다. 보완 행렬 및 이전 분해 결과를 재귀적으로 사용하는데, 보완 행렬은 비-시간 인자의 변화가 있을 때만 업데이트되어 중복 연산을 줄일 수 있다. First, it reduces computational costs. To reduce the arithmetic cost for updating decomposition factors of tensor streams, we build an updatable framework for tensor streams. The complement matrix and previous decomposition results are used recursively, and the complement matrix is updated only when there is a change in the non-temporal factor, which can reduce redundant operations.

두번째로 데이터 스트림에서 주제를 식별한다. 텐서 스트림에서 잠재적인 주제를 감지하고 변경 지점을 검출하기 위해서 오차를 추적한다. 텐서 스트림에 들어오는 데이터 슬라이스들의 오차를 지속적으로 추적하여 주제의 변화점으로 간주하고 표준화점수 분석을 기반으로 급격한 정확도 저하를 감지한다. Second, topics are identified in the data stream. Detect potential topics in the tensor stream and track errors to detect change points. It continuously tracks errors in data slices coming into the tensor stream, considers them as change points in the topic, and detects rapid decreases in accuracy based on standardized score analysis.

세번째로 분해 정확도를 높인다. 감지된 주제 변화점을 활용하고 분해 정확도를 높이기 위해서 새로운 주제가 감지되면 텐서 스트림을 다시 분해한다. 주제의 급격한 변화가 감지되면 변경 정도에 따라 텐서 스트림을 재사용할지 분할할지를 선택한다. 재사용 과정을 개선하기 위해 기억 비율을 도입한다. 이러한 기술요소는 정확도와 속도 사이의 균형을 유지하면서 이전 분해 결과에서 얼마나 많은 정보를 유지해야 하는지를 결정한다.Thirdly, the decomposition accuracy is improved. To utilize detected topic changes and improve decomposition accuracy, the tensor stream is decomposed again when a new topic is detected. When a rapid change in the topic is detected, it selects whether to reuse or split the tensor stream depending on the degree of change. A recall ratio is introduced to improve the reuse process. These technical factors determine how much information should be retained from previous decomposition results while maintaining a balance between accuracy and speed.

도 1은 일 실시예에 따른 텐서 분석 장치의 기능 블록도이다.1 is a functional block diagram of a tensor analysis device according to an embodiment.

도 1을 참조하면, 일 실시예에 따른 텐서 분석 장치(100)는 입출력부(110), 저장부(120) 및 제어부(130)를 포함한다.Referring to FIG. 1, the tensor analysis device 100 according to an embodiment includes an input/output unit 110, a storage unit 120, and a control unit 130.

입출력부(110)는 사용자로부터 입력을 수신하기 위한 입력부와 작업의 수행결과 또는 텐서 분석 장치(100)의 상태 등의 정보를 표시하기 위한 출력부를 포함할 수 있다. 즉, 입출력부(110)는 데이터를 입력받고, 이를 연산 처리한 결과를 출력하기 위한 구성이다. 실시예에 따른 텐서 분석 장치(100)는 입출력부(110)를 통해 입력 텐서 및 입력 텐서의 분석 요청 등을 수신할 수 있다. The input/output unit 110 may include an input unit for receiving input from a user and an output unit for displaying information such as a task performance result or the status of the tensor analysis device 100. In other words, the input/output unit 110 is configured to receive data as input and output the results of processing the data. The tensor analysis device 100 according to the embodiment may receive an input tensor and a request for analysis of the input tensor through the input/output unit 110.

저장부(120)는 파일 및 프로그램이 저장될 수 있는 구성으로서, 다양한 종류의 메모리를 통해 구성될 수 있다. 특히, 저장부(120)에는 후술하는 제어부(130)가 이하에서 제시되는 알고리즘에 따라 텐서 분석을 위한 연산을 수행할 수 있도록 하는 데이터 및 프로그램이 저장될 수 있다. The storage unit 120 is a component in which files and programs can be stored, and can be configured using various types of memory. In particular, the storage unit 120 may store data and programs that enable the control unit 130, which will be described later, to perform operations for tensor analysis according to the algorithm presented below.

제어부(130)는 CPU, GPU, 아두이노 등과 같은 적어도 하나의 프로세서를 포함하는 구성으로서, 텐서 분석 장치(100)의 전체적인 동작을 제어할 수 있다. 즉, 제어부(130)는 텐서 분석을 위한 동작을 수행하도록 텐서 분석 장치(100)에 포함된 다른 구성들을 제어할 수 있다. 제어부(130)는 저장부(120)에 저장된 프로그램을 실행함으로써 이하에서 제시되는 알고리즘에 따라 텐서를 분석하기 위한 연산을 수행할 수 있다. The control unit 130 is a component that includes at least one processor such as CPU, GPU, Arduino, etc., and can control the overall operation of the tensor analysis device 100. That is, the control unit 130 may control other components included in the tensor analysis device 100 to perform operations for tensor analysis. The control unit 130 can perform an operation to analyze a tensor according to the algorithm presented below by executing a program stored in the storage unit 120.

제어부(130)는 텐서 분해 과정에서 텐서 분해 알고리즘의 연산량을 감소시키도록 보완 행렬을 도입하고 시간축에 대한 변화가 없을 때에만 보완 행렬이 업데이트되도록 설정한다. 제어부(130)는 텐서 스트림의 신규 텐서 슬라이스로부터 보완 행렬을 이용한 텐서 분해 방식을 통해 산출한 텐서 분해 결과를 출력한다. The control unit 130 introduces a supplementary matrix to reduce the amount of computation of the tensor decomposition algorithm during the tensor decomposition process and sets the supplementary matrix to be updated only when there is no change in the time axis. The control unit 130 outputs a tensor decomposition result calculated from a new tensor slice of the tensor stream through a tensor decomposition method using a complement matrix.

제어부(130)는 텐서 분해 결과의 정확도를 향상시키도록 텐서 스트림의 이전 분해 결과에 대해서 데이터의 변화점을 탐지하고 이에 따라 데이터 재분해를 수행한다. 제어부(130)는 텐서 분해 결과를 기반으로 텐서 스트림의 변화점을 탐지한다. 제어부(130)는 텐서 스트림의 변화점에서의 변화 정도에 따라 상이한 방식으로 텐서 분해를 수행한다. The control unit 130 detects data change points with respect to the previous decomposition result of the tensor stream and performs data re-decomposition accordingly to improve the accuracy of the tensor decomposition result. The control unit 130 detects change points in the tensor stream based on the tensor decomposition results. The control unit 130 performs tensor decomposition in different ways depending on the degree of change in the change point of the tensor stream.

이하에서는 텐서 분석 장치(100)가 보완 행렬을 이용하여 적응적으로 텐서 분해 결과 및 보완 행렬을 업데이트하는 동작을 설명한다.Hereinafter, an operation of the tensor analysis device 100 to adaptively update the tensor decomposition result and the complement matrix using the complement matrix will be described.

<업데이트 규칙><Update rules>

텐서 를 N차 시간 진전 텐서라고 하면, 여기서 는 이전 텐서 데이터이고, 는 신규 텐서 슬라이스이다. 텐서의 제1 모드를 시간 모드라고 가정할 수 있다. tensor If is the Nth time progress tensor, then is the previous tensor data, is a new tensor slice. It can be assumed that the first mode of the tensor is the time mode.

텐서 분석 장치(100)는 이전 분해 결과 가 주어진 상황에서 텐서 를 효율적으로 분해하기 위한 업데이트 규칙을 설계할 수 있다.The tensor analysis device 100 is a result of the previous decomposition. Given a tensor An update rule can be designed to efficiently decompose.

텐서 분석 장치(100)는 이전 부분 및 신규 부분 을 이용하여, 시간 인자 행렬 을 와 같이 이전 부분과 신규 부분으로 분리할 수 있다. 여기서 은 분해 랭크를 의미한다. The tensor analysis device 100 is similar to the previous part and new part Using the time factor matrix second It can be separated into the old part and the new part as shown. here means the decomposition rank.

텐서 분석 장치(100)는 데이터 주제 변경의 정도를 고려할 수 있도록 기억 비율(memory rate) 을 적용한다. 기억 비율은 이전 텐서 데이터의 분해에 할당된 가중치를 결정한다.The tensor analysis device 100 sets a memory rate to consider the degree of data subject change. Apply. The recall ratio determines the weight assigned to the decomposition of previous tensor data.

추정 오차 을 아래 수학식 11과 같이 정의한다. 텐서 스트림의 비-시간 모드가 고정된 상태로 DTD의 추정 오차가 제한된 형태이다.estimation error is defined as in Equation 11 below. The non-time mode of the tensor stream is fixed and the estimation error of DTD is limited.

[수학식 11][Equation 11]

텐서 분석 장치(100)는 CP-ALS를 기반으로 추정 오차 를 최적화한다. 시간 모드에서만 변경이 있기 때문에 비-시간 모드의 변경을 0으로 설정하여 DTD의 추정 오류를 단순화한다. 각 분해 인자 행렬에 추정 오차 를 최소화하기 위한 업데이트 규칙은 다음과 같이 표현될 수 있다.The tensor analysis device 100 estimates error based on CP-ALS. Optimize. Since there are changes only in time mode, we simplify the estimation error of DTD by setting changes in non-time modes to 0. Estimation error in each decomposition factor matrix The update rule to minimize can be expressed as follows.

텐서 분석 장치(100)는 분해의 정확도를 더욱 높이기 위해 이전 시간 인자를 업데이트할 수 있다. 이전 시간 인자가 업데이트되지 않으면, 데이터의 이전 주제에만 최적화되어 있으므로 주제가 변경될 때마다 정확도를 떨어진다. 이러한 재귀 프로세스를 직접 적용하는 것은 연산량을 증가시킨다. 연산량 증가 문제를 해결하기 위해 아래 수학식 12와 같이 두 개의 보완 행렬 G와 H를 도입한다.The tensor analysis device 100 may update the previous time factor to further increase the accuracy of decomposition. If the previous time factor is not updated, it is only optimized for the previous topic of the data and therefore loses accuracy every time the topic changes. Applying this recursive process directly increases the amount of computation. To solve the problem of increasing the amount of computation, two complementary matrices G and H are introduced as shown in Equation 12 below.

[수학식 12][Equation 12]

여기서 제1 보완 행렬 G는 k차 모드 이전 행렬의 전치 행렬 및 k차 모드 행렬을 기반으로 요소별 곱으로 정의되고, 제2 보완 행렬 H는 k번째 행렬의 전치 행렬 및 k번째 행렬을 기반으로 요소별 곱으로 정의된다. 제1 보완 행렬 G와 제2 보완 행렬 H는 비-시간 인자에 변화가 있을 때만 업데이트되므로 중복 계산이 줄어든다. 아래 수학식 13 내지 15와 같이 수정된 업데이트 규칙으로 표현될 수 있다.Here, the first complementary matrix G is defined as an element-wise product based on the transpose matrix of the kth mode previous matrix and the kth mode matrix, and the second complementary matrix H is defined as an element-wise product based on the transpose matrix of the kth matrix and the kth matrix. It is defined as the star product. The first complement matrix G and the second complement matrix H are updated only when there is a change in the non-temporal factor, thereby reducing redundant calculations. It can be expressed as a modified update rule as shown in Equations 13 to 15 below.

[수학식 13][Equation 13]

시간 인자 행렬, 즉, 1차 모드 행렬의 이전 부분은 1차 모드 이전 행렬, 제1 보완 행렬, 제2 보완 행렬의 전치 행렬을 기반으로 산출될 수 있다.The time factor matrix, that is, the previous part of the first mode matrix, can be calculated based on the transpose matrices of the first mode transfer matrix, the first complement matrix, and the second complement matrix.

[수학식 14][Equation 14]

시간 인자 행렬, 즉, 1차 모드 행렬의 신규 부분은 신규 텐서 슬라이스의 신규 부분, k차 모드 행렬과의 카트리-라오 곱, 제2 보완 행렬의 전치 행렬을 기반으로 산출될 수 있다.The time factor matrix, that is, the new part of the first mode matrix, can be calculated based on the new part of the new tensor slice, the Katri-Rao product with the k order mode matrix, and the transpose matrix of the second complement matrix.

[수학식 15][Equation 15]

비-시간 인자 행렬, 즉, 1이 아닌 i차 모드 행렬에 대해서, 이전 텐서 데이터의 분해에 할당된 가중치에 해당하는 기억 비율을 고려하여 i차 모드 이전 행렬, 1차 모드 행렬의 이전 부분 및 신규 부분, 제1 보완 행렬 및 제2 보완 행렬을 포함하는 관계식을 기반으로 산출될 수 있다.For non-temporal factor matrices, i.e. non-1 i-th mode matrices, the i-th mode old matrix, the old part of the first-order mode matrix and the new one, taking into account the memory ratio corresponding to the weight assigned to the decomposition of the previous tensor data. It can be calculated based on a relational expression including the partial, first complement matrix, and second complement matrix.

보완 행렬을 이용한 텐서 분해 방식은, 텐서 분해 결과를 구성하는 분해 인자를 업데이트하되, 분해 인자가 시간 인자에 해당하면 상기 보완 행렬을 적용하여 산출하고, 분해 인자가 시간 인자에 해당하지 않으면 이전 텐서 분해 결과의 가중치를 고려하여 분해 인자를 업데이트하고 보완 행렬을 업데이트한다.The tensor decomposition method using a supplementary matrix updates the decomposition factors that make up the tensor decomposition result. If the decomposition factor corresponds to the time factor, the supplementary matrix is applied to calculate it, and if the decomposition factor does not correspond to the time factor, the previous tensor decomposition is performed. Update the decomposition factor and update the complement matrix by considering the weight of the result.

업데이트 규칙에 관한 알고리즘은 표 2와 같이 의사코드로 표현 가능하다.The algorithm for the update rule can be expressed in pseudocode as shown in Table 2.

[표 2][Table 2]

업데이트 규칙에 관한 알고리즘은 DAO(Data Adaptive Online)-CP(CANDECOMP/PARAFAC)-ALS(Alternating Least Square)라 칭할 수 있으며, 업데이트 규칙에 관한 알고리즘은 이전 텐서 분해 결과인 분해 인자 , 신규 텐서 슬라이스 , 기억 비율 ρ, 반복 횟수 n_iter를 입력받고, 업데이트된 분해 인자를 출력한다.The algorithm for the update rule can be called DAO (Data Adaptive Online)-CP (CANDECOMP/PARAFAC)-ALS (Alternating Least Square), and the algorithm for the update rule uses the decomposition factor that is the result of the previous tensor decomposition. , new tensor slices , the memory ratio ρ, and the number of repetitions n _iter are input, and the updated decomposition factor is output.

수학식 12를 이용하여 제1 보완 행렬 G 및 제2 보완 행렬 H를 초기화한다.Initialize the first complementary matrix G and the second complementary matrix H using Equation 12.

반복 횟수에 따라 업데이트를 진행한다. 수학식 14를 이용하여 시간 인자 행렬의 신규 부분 을 업데이트한다. 비-시간 인자 행렬 에 대해서 수학식 15를 이용하여 비-시간 인자 행렬 , 제1 보완 행렬 G, 및 제2 보완 행렬 H를 업데이트한다. 수학식 13을 이용하여 시간 인자 행렬의 이전 부분 을 업데이트한다.Updates are performed according to the number of iterations. New part of the time factor matrix using Equation 14 Update . Non-temporal factor matrix The non-time factor matrix using Equation 15 for , update the first complementary matrix G , and the second complementary matrix H. Previous part of the time factor matrix using Equation 13 Update .

이하에서는 텐서 분석 장치(100)가 텐서 스트림의 데이터 변화점을 탐지하는 동작을 설명한다.Below, an operation of the tensor analysis device 100 to detect data change points in a tensor stream will be described.

<변화점 탐지><Change point detection>

텐서 분석 장치(100)는 신규 텐서 슬라이스 및 텐서 분해 결과 간의 오차를 추적하고, 추적한 오차를 점수로 변환하고, 점수를 기반으로 텐서 스트림의 변화점을 탐지한다.The tensor analysis device 100 tracks errors between new tensor slices and tensor decomposition results, converts the tracked errors into scores, and detects change points in the tensor stream based on the scores.

텐서 분석 장치(100)는 텐서 스트림에서 주제의 변화점을 감지하여 데이터의 급격한 변화에 빠르게 적응한다. 이를 위해 텐서 스트림의 분해 오차를 지속적으로 추적하고 주제의 변화점으로 간주하는 급격한 정확도 저하를 감지한다. 이러한 정확도 하락은 아래 수학식 16과 같이 신규 텐서 슬라이스 및 신규 텐서 슬라이스의 분해 결과에 대한 지역 오차 놈(local error norm) 을 측정하는 방식을 통해 캡처될 수 있다. The tensor analysis device 100 detects change points in the subject in the tensor stream and quickly adapts to rapid changes in data. To achieve this, we continuously track the decomposition error of the tensor stream and detect any sudden decrease in accuracy, which we consider to be a change point in the topic. This decrease in accuracy is due to the local error norm for the new tensor slice and the decomposition results of the new tensor slice, as shown in Equation 16 below: It can be captured by measuring .

[수학식 16][Equation 16]

텐서 스트림의 분해 결과에 대한 국소 오차(local error)를 지속적으로 추적하고 이에 대한 점수 산출을 위해 표준화점수 분석을 적용할 수 있다. 표준화점수 분석은 실시간 데이터를 빠르게 처리할 수 있는 장점이 있고, 임계치를 변화해가며 미세한 변화 수준을 탐지할 수 있는 장점이 있다.Standardized score analysis can be applied to continuously track the local error of the decomposition result of the tensor stream and calculate the score. Standardized score analysis has the advantage of being able to quickly process real-time data and detecting minute levels of change by changing the threshold.

텐서 분석 장치(100)는 지역 오차 놈 이 정규 분포 를 따른다고 가정한다. 평균 및 분산 을 추적하여 이상치를 검출한다. 온라인 방식으로 평균과 분산을 업데이트해야 하므로, 전체 데이터를 유지할 필요 없이 평균과 분산의 정확한 추정치를 제공하는 웰포드(Welford) 알고리즘을 사용할 수 있다. 알려진 웰포드 알고리즘은 샘플 평균 및 분산을 계산하기 위해 주어진 데이터를 단일 패스(one pass)만 하면 된다. 웰포드 알고리즘을 사용하여 아래 수학식 17과 같은 표준화점수(z-score) 분석을 통해 현재의 지역 오차 놈에서 이상치를 감지한다.The tensor analysis device 100 is a local error norm. This normal distribution Assume that follows. average and variance Detect outliers by tracking. Since we need to update the mean and variance online, we can use the Welford algorithm, which provides accurate estimates of the mean and variance without having to maintain the entire data. The known Welford algorithm requires only one pass through the given data to calculate the sample mean and variance. Using the Welford algorithm, outliers are detected in the current local error norm through standardized score (z-score) analysis as shown in Equation 17 below.

[수학식 17][Equation 17]

여기서 은 이상 임계치이다. 값을 변경하면 신규 텐서 슬라이스가 이전 텐서와 유사한지 여부에 대한 기준을 미세 조정할 수 있다.here is the ideal threshold. By changing the value, you can fine-tune the criteria for whether a new tensor slice is similar to a previous tensor.

이하에서는 텐서 분석 장치(100)가 텐서 스트림의 데이터 변화점을 기반으로 재분해여부를 결정하고 복수의 텐서 분해 전략 중에서 선택적으로 텐서 분해를 수행하는 동작을 설명한다.Hereinafter, an operation in which the tensor analysis device 100 determines whether to re-decompose based on the data change point of the tensor stream and selectively performs tensor decomposition among a plurality of tensor decomposition strategies will be described.

<재분해><Re-decomposition>

텐서 분석 장치(100)는 표준화점수 분석을 통해 데이터 주제의 급격한 변화가 감지되면 변화 정보를 활용하여 분해 정확도를 높인다. 신규 텐서 슬라이스가 각 시간 단계에 대해 누적됨에 따라 업데이트 규칙에 관한 알고리즘(표 2 참조)에 설명된 최적화 방식에 따라 분해 인자 행렬을 업데이트한다. 다음 지역 오차 놈 분포에서 표준화점수를 계산하고 점수에 따라 "재분해"를 수행한다.When a rapid change in a data subject is detected through standardized score analysis, the tensor analysis device 100 uses change information to increase decomposition accuracy. As new tensor slices are accumulated for each time step, the decomposition factor matrix is updated according to the optimization method described in the algorithm for update rules (see Table 2). Next, calculate the standardized score from the local error norm distribution and perform a "redecomposition" according to the score.

텐서 분석 장치(100)는 지역 오차 놈의 분포를 추적하고 표준화점수 기준을 설정하고 변경 정도에 따라 제1 텐서 분해 전략 또는 제2 텐서 분해 전략 간에 최상의 전략을 자동으로 선택한다. 제1 텐서 분해 전략은 텐서 분해 초기화 전략으로 분할(Split) 프로세스에 해당하고, 제2 텐서 분해 전략은 이전 분해 결과 재사용 전략으로 정제(Refinement) 프로세스에 해당한다.The tensor analysis device 100 tracks the distribution of local error norms, sets a standardization score standard, and automatically selects the best strategy between the first tensor decomposition strategy or the second tensor decomposition strategy depending on the degree of change. The first tensor decomposition strategy is a tensor decomposition initialization strategy and corresponds to the split process, and the second tensor decomposition strategy is a strategy for reusing previous decomposition results and corresponds to the refinement process.

도 2는 비교예에 따른 텐서 스트림의 분해 결과를 예시한 도면이고, 도 3은 일 실시예에 따른 텐서 분석 장치에 의한 텐서 스트림의 분해 결과를 예시한 도면이다.FIG. 2 is a diagram illustrating the results of decomposition of a tensor stream according to a comparative example, and FIG. 3 is a diagram illustrating the results of decomposition of a tensor stream by a tensor analysis device according to an embodiment.

도 2를 참조하면, 기존의 온라인 CP 분해는 텐서 스트림의 주제가 A에서 B로 변경될 때(210), B에서 B'로 변경될 때(220), B'에서 C로 변경될 때(230), C에서 D로 변경될 때(240) 모두 동일한 시간 인자를 재사용하므로, 주제 변화에 취약한 것을 확인할 수 있다.Referring to Figure 2, the existing online CP decomposition is performed when the subject of the tensor stream changes from A to B (210), when it changes from B to B' (220), and when it changes from B' to C (230). , When changing from C to D (240), the same time factor is reused, so it can be confirmed that it is vulnerable to topic changes.

도 3을 참조하면, 본 실시예에 따른 텐서 분석 장치는 A에서 B로 변경될 때(310), B'에서 C로 변경될 때(330), C에서 D로 변경될 때(340), 제1 텐서 분해 전략인 분할(Split) 프로세스를 진행하고, B에서 B'로 변경될 때(220) 제2 텐서 분해 전략인 정제(Refinement) 프로세스를 진행한다. 본 실시예에 따른 텐서 분석 장치는 복수의 텐서 분해 전략을 적응적으로 선택하므로, 주제 변화에 강인한 것을 확인할 수 있다.Referring to Figure 3, the tensor analysis device according to this embodiment is changed from A to B (310), when changed from B' to C (330), and when changed from C to D (340). 1 The Split process, which is a tensor decomposition strategy, is performed, and when B is changed to B' (220), the second tensor decomposition strategy, the Refinement process, is performed. Since the tensor analysis device according to this embodiment adaptively selects a plurality of tensor decomposition strategies, it can be confirmed that it is robust to subject changes.

표 3은 복수의 텐서 분해 전략을 적응적으로 선택하기 위한 점수 기준을 예시한다. Table 3 illustrates the scoring criteria for adaptively selecting multiple tensor decomposition strategies.

[표 3][Table 3]

텐서 분석 장치(100)는 텐서 스트림의 데이터 변화 정도를 점수화하고 변화점의 점수가 제1 임계 범위 또는 제2 임계 범위를 만족하는지 여부를 판단한다. 제1 임계 범위는 제1 임계치 보다 큰 경우를 의미하며, 제1 임계 범위를 만족하면 분할 프로세스를 진행한다. 제2 임계 범위는 제2 임계치 보다 크고 제1 임계치 보다 작거나 같은 경우를 의미하며, 제2 임계 범위를 만족하면 정제 프로세스를 진행한다. 제2 임계치 보다 작거나 동일한 경우는 분할 프로세스 및 정제 프로세스를 진행하지 아니한다.The tensor analysis device 100 scores the degree of data change in the tensor stream and determines whether the score of the change point satisfies the first critical range or the second critical range. The first threshold range is the first threshold This means that it is larger, and if the first critical range is satisfied, the division process proceeds. The second threshold range is the second threshold greater than the first threshold This means that it is less than or equal to, and if the second critical range is satisfied, the purification process proceeds. second threshold If it is smaller or equal, the division process and purification process are not performed.

<분할 프로세스><Split process>

들어오는 텐서 슬라이스가 이전 텐서들과 비교하여 완전히 다른 주제를 가진 경우 분해의 이전 결과를 재사용하면 정확도가 크게 손실된다. 이러한 문제를 해결하기 위해 제1 임계치 를 사용하여 텐서 스트림을 상이한 주제를 갖는 텐서들로 나누는 분할 프로세스를 설계한다. 재초기화로 인하여 공간 및 시간의 추가 비용이 필요하나 분할 프로세스를 통해 예기치 않은 정확도 저하를 완전히 피할 수 있다.If the incoming tensor slices have completely different topics compared to previous tensors, reusing previous results from the decomposition will result in significant loss of accuracy. To solve this problem, the first threshold We design a partitioning process that divides the tensor stream into tensors with different topics. Reinitialization requires additional costs in space and time, but unexpected accuracy degradation can be completely avoided through the segmentation process.

<정제 프로세스><Purification process>

정제 프로세스는 이전 텐서와 비교하여 주제가 약간의 차이만 있을 경우 분해 결과를 업데이트하는데 사용된다. 하이퍼파라미터인 제2 임계치 를 사용하여 들어오는 텐서 슬라이스가 이전 텐서와 유사한지 여부에 대한 정제 기준을 미세 조정한다. The refinement process is used to update the decomposition results if there are only slight differences in the topics compared to the previous tensor. Second threshold as hyperparameter Use to fine-tune the refinement criteria for whether an incoming tensor slice is similar to the previous tensor.

정제 프로세스는 신규 슬라이스에 더 집중해야 하기 때문에 기억 비율 (1-ρ)를 사용한다. 기억 비율 (1-ρ)은 이므로 범위를 갖는다. Since the refinement process needs to focus more on new slices, we use a memory ratio (1-ρ). The memory ratio (1-ρ) is Because of It has a range.

ALS 과정에서 반복 횟수에 변화점 탐지에 사용된 점수(예컨대, 표준화점수)를 적용한다. 표준화점수가 높을수록 데이터에 더 급격한 변화가 있음을 의미하기 때문에, 그에 따라 추가로 더 많이 반복을 수행하도록 반복 횟수를 설정한다. In the ALS process, the score (e.g., standardized score) used to detect change points is applied to the number of repetitions. Because a higher standardized score means there is a more drastic change in the data, the number of iterations is set to perform additional iterations accordingly.

결과적으로 이러한 기술요소들은 정확도와 실행 시간 간의 균형을 제어하면서 이전 분해에서 얼마나 많은 정보를 유지해야 하는지를 결정한다.Ultimately, these technical factors determine how much information should be retained from previous decompositions, controlling the trade-off between accuracy and execution time.

도 4는 일 실시예에 따른 텐서 분석 장치에 의한 전체 텐서 분해 동작을 예시한 도면이다.Figure 4 is a diagram illustrating an overall tensor decomposition operation by a tensor analysis device according to an embodiment.

S410 단계에서 텐서 분석 장치(100)는 텐서 스트림, 기억 비율, 반복 횟수, 제1 임계치, 및 제2 임계치를 입력한다. 텐서 스트림은 시간에 따라 변하는 텐서로 각 시간 단계에서 주어진 슬라이스 집합에 해당할 수 있다. 기억 비율은 이전 텐서 데이터의 분해에 할당된 가중치를 의미하며 신규 텐서 슬라이스를 텐서 분해할 때와 정제 프로세스를 진행할 때, 기억 비율은 상이한 값을 적용할 수 있다. 신규 텐서 슬라이스를 텐서 분해할 때보다 정제 프로세스를 진행할 때, 반복 횟수는 증가된 값을 적용할 수 있다. 제1 임계치는 분할 프로세스 진행 여부를 판단하기 위한 기준으로 사용되며, 제1 임계치 및 제2 임계치는 정제 프로세스 진행 여부를 판단하기 위한 기준으로 사용된다.In step S410, the tensor analysis device 100 inputs the tensor stream, memory ratio, number of repetitions, first threshold, and second threshold. A tensor stream is a tensor that changes over time and can correspond to a given set of slices at each time step. The memory ratio refers to the weight assigned to the decomposition of previous tensor data, and different values can be applied to the memory ratio when decomposing a new tensor slice and performing the refining process. When performing a refinement process rather than tensor decomposition of a new tensor slice, an increased number of iterations can be applied. The first threshold is used as a standard for determining whether to proceed with the division process, and the first and second thresholds are used as standards for determining whether to proceed with the refining process.

S420 단계에서 텐서 분석 장치(100)는 텐서 스트림의 신규 텐서 슬라이스를 텐서 분해한다. 텐서 분석 장치(100)는 보완 행렬을 이용한 텐서 분해 방식을 적용하여 신규 텐서 슬라이스를 텐서 분해한다.In step S420, the tensor analysis device 100 tensor decomposes a new tensor slice of the tensor stream. The tensor analysis device 100 decomposes a new tensor slice by applying a tensor decomposition method using a complement matrix.

S430 단계에서 텐서 분석 장치(100)는 신규 텐서 슬라이스 및 텐서 분해 결과 간의 오차를 산출하고 변화 정도를 산출한다. 텐서 분석 장치(100)는 신규 텐서 슬라이스 및 텐서 분해 결과 간의 오차를 추적하고, 추적한 오차를 점수로 변환하고, 점수를 기반으로 텐서 스트림의 변화점을 탐지한다.In step S430, the tensor analysis device 100 calculates the error between the new tensor slice and the tensor decomposition result and calculates the degree of change. The tensor analysis device 100 tracks errors between new tensor slices and tensor decomposition results, converts the tracked errors into scores, and detects change points in the tensor stream based on the scores.

S440 단계에서 텐서 분석 장치(100)는 변화 정도가 제1 임계치보다 큰지 비교한다. 제1 임계치보다 큰 범위를 제1 임계 범위라 하고, 제1 임계 범위를 만족하면 S445 단계에서 제1 텐서 분해 전략인 분할 프로세스를 진행한다.In step S440, the tensor analysis device 100 compares whether the degree of change is greater than the first threshold. The range larger than the first threshold is called the first critical range, and if the first critical range is satisfied, the division process, which is the first tensor decomposition strategy, is performed in step S445.

S450 단계에서 텐서 분석 장치(100)는 변화 정도가 제2 임계치보다 큰지 비교한다. 제2 임계치보다 크고 제1 임계치보다 작거나 같은 범위를 제2 임계 범위라 하고, 제2 임계 범위를 만족하면 S455 단계에서 정제 프로세스를 진행한다.In step S450, the tensor analysis device 100 compares whether the degree of change is greater than the second threshold. A range that is greater than the second threshold and less than or equal to the first threshold is called the second critical range, and if the second critical range is satisfied, the refining process proceeds in step S455.

S460 단계에서 텐서 분석 장치(100)는 분해 인자를 분해 인자 집합에 저장한다. 세가지 경로를 따라 산출된 분해 인자를 분해 인자 집합에 저장한다. 즉, (i) 제1 임계 범위에 따라 분할 프로세스를 진행하여 산출된 분해 인자, (ii) 제2 임계 범위에 따라 정제 프로세서를 진행하여 산출된 분해 인자, 또는 (iii) 제1 임계 범위 및 제2 임계 범위를 만족하지 않아 분할 및 정제 프로세스를 진행하지 않고 산출된 분해 인자를 분해 인자 집합에 저장한다.In step S460, the tensor analysis device 100 stores the decomposition factor in a decomposition factor set. The decomposition factors calculated along the three paths are stored in the decomposition factor set. That is, (i) a decomposition factor calculated by performing a splitting process according to the first critical range, (ii) a decomposition factor calculated by performing a refining process according to the second critical range, or (iii) the first critical range and the second critical range. 2 Because the critical range is not satisfied, the division and purification process is not performed and the calculated decomposition factors are stored in the decomposition factor set.

S470 단계에서 텐서 분석 장치(100)는 모든 슬라이스에 대해서 분해 완료 여부를 판단한다. 모든 텐서 슬라이스의 분해가 완료되지 않으면 단계 S420부터 반복 수행한다. 모든 텐서 슬라이스의 분해가 완료되면 S480 단계에서 텐서 분석 장치(100)는 저장된 분해 인자 집합을 출력한다.In step S470, the tensor analysis device 100 determines whether decomposition is complete for all slices. If decomposition of all tensor slices is not completed, the process is repeated starting from step S420. When decomposition of all tensor slices is completed, the tensor analysis device 100 outputs a set of stored decomposition factors in step S480.

전체 텐서 분해 동작에 관한 알고리즘은 표 4와 같이 의사코드로 표현 가능하다.The algorithm for the entire tensor decomposition operation can be expressed in pseudocode as shown in Table 4.

[표 4][Table 4]

전체 텐서 분해 동작에 관한 알고리즘은 DAO(Data Adaptive Online)-CP(CANDECOMP/PARAFAC) 분해라 칭할 수 있으며, 전체 텐서 분해 동작에 관한 알고리즘은 텐서 스트림 , 기억 비율 ρ, ALS의 반복 횟수 n_iter를 입력하고, 분해 인자 집합 을 출력한다. DAO(Data Adaptive Online)-CP(CANDECOMP/PARAFAC)-ALS(Alternating Least Square)는 업데이트 규칙에 관한 알고리즘으로 표 2에서 설명한 바 있다.The algorithm for the entire tensor decomposition operation can be called DAO (Data Adaptive Online)-CP (CANDECOMP/PARAFAC) decomposition, and the algorithm for the entire tensor decomposition operation can be called tensor stream , the memory ratio ρ, the number of iterations of ALS n _iter , and the set of decomposition factors. Outputs . DAO (Data Adaptive Online)-CP (CANDECOMP/PARAFAC)-ALS (Alternating Least Square) is an algorithm for update rules and is described in Table 2.

신규 텐서 슬라이스 를 텐서 분해하여 텐서 분해 결과인 분해 인자 를 초기화한다. 신규 텐서 슬라이스 및 분해 인자 간의 오차 놈 을 산출한다. 오차 놈 을 기반으로 평균/분산에 관한 웰포드 알고리즘의 파라미터를 초기화한다.New Tensor Slice By tensor decomposition, the decomposition factor is the result of tensor decomposition. Initialize . New Tensor Slice and decomposition factor error between the liver Calculate . error bastard Initialize the parameters of the Welford algorithm regarding mean/variance based on .

텐서 스트림 에 대해서 텐서 분해를 진행한다. 이전 텐서 분해 결과 , 신규 텐서 슬라이스 , 기억 비율 ρ, 및 반복 횟수 n_iter에 따라 업데이트 규칙에 관한 알고리즘을 이용하여 분해 인자 를 산출한다. 신규 텐서 슬라이스 및 분해 인자 간의 오차 놈 을 산출한다.tensor stream Tensor decomposition is performed on . Previous tensor decomposition results , new tensor slices , the memory ratio ρ, and the number of iterations n _iter using an algorithm for update rules to decompose factors. Calculate . New Tensor Slice and decomposition factor error between the liver Calculate .

오차 놈 의 표준화점수와 제1 임계치 를 비교하고, 오차 놈 의 표준화점수가 제1 임계치 보다 크면 분할 프로세스를 진행한다. 이전 분해 인자를 분해 인자 집합 에 저장한다. 신규 텐서 슬라이스 를 텐서 분해하여 이전 텐서 분해 결과인 분해 인자 를 초기화한다. 신규 텐서 슬라이스 및 분해 인자 간의 오차 놈 을 산출한다. 오차 놈 을 기반으로 평균/분산에 관한 웰포드 알고리즘의 파라미터를 초기화한다.error bastard Standardized score and first threshold and compare the error norm The standardized score of is the first threshold If it is larger than that, a division process is performed. set the previous decomposition factor to the decomposition factor Save it to New Tensor Slice By tensor decomposing , the decomposition factor is the result of the previous tensor decomposition. Initialize . New Tensor Slice and decomposition factor error between the liver Calculate . error bastard Initialize the parameters of the Welford algorithm regarding mean/variance based on .

오차 놈 의 표준화점수를 제1 임계치 및 제2 임계치 과 비교하고, 오차 놈 의 표준화점수가 제1 임계치 보다 작거나 같고, 제2 임계치 보다 크면 정제 프로세스를 진행한다. 이전 텐서 분해 결과 , 신규 텐서 슬라이스 , 기억 비율 1-ρ, 및 반복 횟수 (1+표준화점수)*n_iter에 따라 업데이트 규칙에 관한 알고리즘을 이용하여 분해 인자 를 산출한다. 신규 텐서 슬라이스 및 분해 인자 간의 오차 놈 을 산출한다. error bastard The first threshold is the standardized score of and a second threshold Compare with and the error norm The standardized score of is the first threshold Less than or equal to the second threshold If it is larger than that, proceed with the purification process. Previous tensor decomposition results , new tensor slices , the memory ratio 1-ρ, and the number of iterations (1+standardized score)*n _iter using an algorithm for update rules to decompose factors. Calculate . New Tensor Slice and decomposition factor error between the liver Calculate .

오차 놈 을 기반으로 평균/분산에 관한 웰포드 알고리즘의 파라미터를 초기화한다. 이전 분해 인자 결과인 분해 인자 를 분해 인자 으로 대체한다.error bastard Initialize the parameters of the Welford algorithm regarding mean/variance based on . Decomposition factors that are the result of previous decomposition factors decompose into factors Replace with

이전 분해 인자를 분해 인자 집합 에 저장한다. 저장된 분해 인자 집합 을 출력한다.set the previous decomposition factor to the decomposition factor Save it to Set of stored decomposition factors Outputs .

도 5는 다른 실시예에 따른 데이터 적응형(Data-Adaptive Online; DAO) 텐서 분석 방법의 흐름도이다.Figure 5 is a flowchart of a data-adaptive online (DAO) tensor analysis method according to another embodiment.

도 5에 도시된 실시예에 따른 데이터 적응형 텐서 분석 방법은 도 2에 도시된 텐서 분석 장치에서 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하에서 생략된 내용이라고 하더라도, 도 2에 도시된 텐서 분석 장치에 관하여 이상에서 기술한 내용은 도 5에 도시된 실시예에 따른 데이터 적응형 텐서 분석 방법에도 적용될 수 있다. The data adaptive tensor analysis method according to the embodiment shown in FIG. 5 includes steps processed in a time-series manner in the tensor analysis device shown in FIG. 2. Therefore, even if the content is omitted below, the content described above regarding the tensor analysis device shown in FIG. 2 can also be applied to the data adaptive tensor analysis method according to the embodiment shown in FIG. 5.

S510 단계에서 텐서 분석 장치(100)는 텐서 스트림의 신규 텐서 슬라이스로부터 보완 행렬을 이용한 텐서 분해 방식을 통해 산출한 텐서 분해 결과를 출력한다. 보완 행렬을 이용한 텐서 분해 방식은, 텐서 분해 결과를 구성하는 분해 인자를 업데이트하되, 분해 인자가 시간 인자에 해당하면 보완 행렬을 적용하여 산출하고, 분해 인자가 시간 인자에 해당하지 않으면 이전 텐서 분해 결과의 가중치를 고려하여 분해 인자를 업데이트하고 보완 행렬을 업데이트한다.In step S510, the tensor analysis device 100 outputs a tensor decomposition result calculated through a tensor decomposition method using a complementary matrix from a new tensor slice of the tensor stream. The tensor decomposition method using a complementary matrix updates the decomposition factors that make up the tensor decomposition result. If the decomposition factor corresponds to the time factor, the complementary matrix is applied to calculate it. If the decomposition factor does not correspond to the time factor, the previous tensor decomposition result is calculated. Update the decomposition factor and update the complement matrix by considering the weights of .

S520 단계에서 텐서 분석 장치(100)는 텐서 분해 결과를 기반으로 텐서 스트림의 변화점을 탐지한다. 텐서 스트림의 변화점을 탐지하는 단계(S520)는 신규 텐서 슬라이스 및 텐서 분해 결과 간의 오차를 추적하고, 추적한 오차를 점수로 변환하고, 점수를 기반으로 텐서 스트림의 변화점을 탐지한다.In step S520, the tensor analysis device 100 detects a change point in the tensor stream based on the tensor decomposition result. In the step of detecting change points in the tensor stream (S520), the error between the new tensor slice and the tensor decomposition result is tracked, the tracked error is converted into a score, and the change point in the tensor stream is detected based on the score.

S530 단계에서 텐서 분석 장치(100)는 텐서 스트림의 변화점에서의 변화 정도에 따라 상이한 방식으로 텐서 분해를 수행한다. 상이한 방식으로 텐서 분해를 수행하는 단계(S530)는 변화 정도가 제1 임계 범위를 만족하면 제1 텐서 분해 전략인 텐서 분해 초기화 전략을 선택하여 재분해에 따른 분할 프로세스를 진행하고, 변화 정도가 제2 임계 범위를 만족하면 제2 텐서 분해 전략인 이전 분해 결과 재사용 전략을 선택하여 이전 텐서 분해 결과를 이용하여 텐서 분해 결과를 업데이트하는 정제 프로세스를 진행한다.In step S530, the tensor analysis device 100 performs tensor decomposition in different ways depending on the degree of change in the change point of the tensor stream. In the step of performing tensor decomposition in a different way (S530), if the degree of change satisfies the first critical range, the tensor decomposition initialization strategy, which is the first tensor decomposition strategy, is selected to proceed with the division process according to re-decomposition, and the degree of change is 2 If the critical range is satisfied, the second tensor decomposition strategy, the reuse of previous decomposition results, is selected and a refinement process is performed to update the tensor decomposition results using the previous tensor decomposition results.

도 6 내지 도 9는 실시예들에 따라 시뮬레이션한 텐서 스트림의 분해 성능을 예시한 도면이다.Figures 6 to 9 are diagrams illustrating decomposition performance of a tensor stream simulated according to embodiments.

도 6은 텐서 스트림의 주제 변화 식별 성능에 관한 도면이다.Figure 6 is a diagram related to topic change identification performance of a tensor stream.

텐서 스트림의 주제는 스트림 데이터를 포괄하거나 공통된 요인을 의미한다. 주제 변화는 스트림 데이터의 특정 시간 구간을 아우르는 주된 요인이 다른 특정 시점부터 다른 요인으로 변경됨을 의미한다. 예컨대, 주제 변화는 CCTV 데이터에서 객체 이동의 시작, 영화 데이터에서 장면(scene)의 변화 등이 있다. 영상을 위주로 실험하였으나 텐서 데이터의 형태는 영상이 아닌 다른 유형도 가능하다.The subject of a tensor stream refers to factors that encompass or are common to the stream data. Topic change means that the main factor covering a specific time section of stream data changes to a different factor from another specific point in time. For example, subject changes include the start of object movement in CCTV data and scene changes in movie data. Although the experiment was conducted mainly on images, other types of tensor data other than images are also possible.

도면 부호 610은 원본 CCTV 데이터를 나타내고, 도면 부호 620은 CCTV 데이터에 대해서 본 실시예에 따른 주제 변화 검출 결과를 나타내고, 도면 부호 630, 640, 및 650은 CCTV 데이터에 대해서 비교예에 해당하는 Full-CP, DTD, 및 온라인 CP에 따른 주제 변화 검출 결과를 각각 나타낸다. Reference numeral 610 represents original CCTV data, reference numeral 620 represents subject change detection results for CCTV data according to this embodiment, and reference numerals 630, 640, and 650 represent full-data corresponding to comparative examples for CCTV data. The topic change detection results according to CP, DTD, and online CP are shown, respectively.

도면 부호 615은 샘플 비디오 데이터를 나타내고, 도면 부호 625는 샘플 비디오 데이터에 대해서 본 실시예에 따른 주제 변화 검출 결과를 나타내고, 도면 부호 635, 645, 및 655는 샘플 비디오 데이터에 대해서 비교예에 해당하는 Full-CP, DTD, 및 온라인 CP에 따른 주제 변화 검출 결과를 각각 나타낸다.Reference numeral 615 represents sample video data, reference numeral 625 represents the subject change detection result for the sample video data according to the present embodiment, and reference numerals 635, 645, and 655 correspond to comparative examples for the sample video data. The topic change detection results according to Full-CP, DTD, and Online CP are shown, respectively.

본 실시예에 따른 주제 변화 검출 결과를 살펴보면, 주제 변화를 자동으로 검출하고 변화 정도에 따라 재분해를 수행하므로, 선명한 영상을 제공하는 것을 확인할 수 있다.Looking at the subject change detection results according to this embodiment, it can be seen that a clear image is provided because subject changes are automatically detected and re-disassembly is performed according to the degree of change.

도 7은 텐서 스트림의 재구성 오차 성능에 관한 도면이다.Figure 7 is a diagram regarding reconstruction error performance of a tensor stream.

분해 정확도에 관한 재구성 오차 측정은 지역 오차 놈 및 전역 오차 놈을 사용하되 아래와 같은 식을 통해 지역 적합 점수(local fitness score) 및 전역 적합 점수(global fitness score)를 활용한다.To measure reconstruction error regarding decomposition accuracy, local error norm and global error norm are used, but local fitness score and global fitness score are used through the equation below.

지역 적합 점수 는 각 시간 단계에서 들어오는 데이터 슬라이스에 대한 적합도를 나타내고, 전역 적합 점수 는 전체 텐서들에 대한 적합도를 나타낸다. 지역 적합 점수 및 전역 적합 점수는 크기가 다른 여러 데이터 세트에 대한 분해 정확도를 비교하도록 설계된 데이터 크기에 대한 오차 놈의 정규화된 버전이다. 평균 지역 적합 점수는 매 시간 단계에서 계산된다.Regional Fit Score represents the goodness of fit for the incoming data slice at each time step, and is the global fit score. represents the goodness of fit for all tensors. Local fit score and global fit score are normalized versions of the error norm for data size, designed to compare decomposition accuracy for multiple data sets of different sizes. The average local fitness score is calculated at each time step.

도면 부호 710및 715는 샘플 비디오 데이터에 대해 수행한 전역 적합 점수 및 평균 지역 적합 점수를 각각 나타낸다. 도면 부호 720및 725는 주식 가격 데이터에 대해 수행한 전역 적합 점수 및 평균 지역 적합 점수를 각각 나타낸다. 도면 부호 730및 735는 공항 홀 데이터에 대해 수행한 전역 적합 점수 및 평균 지역 적합 점수를 각각 나타낸다. 도면 부호 740및 745는 국내 공기 질 데이터에 대해 수행한 전역 적합 점수 및 평균 지역 적합 점수를 각각 나타낸다. 도면 부호 750및 755는 가상 데이터에 대해 수행한 전역 적합 점수 및 평균 지역 적합 점수를 각각 나타낸다. Reference numerals 710 and 715 represent global fitting scores and average local fitting scores performed on sample video data, respectively. Reference numerals 720 and 725 represent global fitting scores and average local fitting scores performed on stock price data, respectively. Reference numerals 730 and 735 represent the global fitting score and the average local fitting score performed on the airport hall data, respectively. Reference numerals 740 and 745 represent the global fit score and average regional fit score performed on domestic air quality data, respectively. Reference numerals 750 and 755 represent global fitting scores and average local fitting scores performed on virtual data, respectively.

본 실시예에 따른 분해 정확도 결과를 살펴보면, 주제의 변화점을 검출하므로 분해 랭크가 다변화했음에도 분해 정확도가 우수하고 증가하는 것을 확인할 수 있다.Looking at the decomposition accuracy results according to this embodiment, it can be seen that the decomposition accuracy is excellent and increases even though the decomposition rank is diversified because the change point of the subject is detected.

도 8은 텐서 스트림의 처리 시간 성능에 관한 도면이다.Figure 8 is a diagram regarding processing time performance of a tensor stream.

도면 부호 810, 820, 830, 840, 및 850은 샘플 비디오 데이터, 주식 가격 데이터, 공항 홀 데이터, 국내 공기 질 데이터, 및 가상 데이터 순으로 각각에 대해 수행한 처리 속도를 각각 나타낸다. 정적 분해 방법(Full-CP)은 온라인 방법이 아니므로 신규 데이터 슬라이스가 들어올 때마다 전체 텐서를 분해하는 것으로 가정한다.Reference numerals 810, 820, 830, 840, and 850 indicate processing speeds performed on sample video data, stock price data, airport hall data, domestic air quality data, and virtual data, respectively, in that order. Since the static decomposition method (Full-CP) is not an online method, it is assumed that the entire tensor is decomposed each time a new data slice comes in.

본 실시예에 따른 처리 시간 결과를 살펴보면, 정확한 텐서 분해가 가능하도록 데이터의 특성을 이용하고 변화점을 감지하며, 재분해 프로세스로 인해 실행 시간이 약간 더 길어진다. 정적 분해 방법과 동적 분해 방법의 중간 정도의 실행 시간을 가지며, 다른 동적 알고리즘(DTD 및 온라인 CP)에 필적하는 처리 속도를 보여주고 정적 방법(Full-CP)보다 훨씬 빠른 것을 확인할 수 있다.Looking at the processing time results according to this embodiment, the characteristics of the data are used and change points are detected to enable accurate tensor decomposition, and the execution time is slightly longer due to the re-decomposition process. It has a running time between the static and dynamic decomposition methods, and shows processing speeds comparable to other dynamic algorithms (DTD and online CP), and is much faster than the static method (Full-CP).

도 9는 텐서 스트림의 재분해에 따른 성능 개선에 관한 도면이다.Figure 9 is a diagram showing performance improvement according to re-decomposition of the tensor stream.

도면 부호 910, 940, 및 960은 분할 프로세스를 각각 나타내고, 도면 부호 920, 930, 및 950은 정제 프로세스를 각각 나타낸다.Reference numerals 910, 940, and 960 represent division processes, respectively, and reference numerals 920, 930, and 950 represent purification processes, respectively.

본 실시예에 따른 재분해 결과를 살펴보면, 영상 데이터에서 동굴에서 객체가 나오는 장면(910)은 비교예들(Full-CP, DTD 및 온라인 CP)과 유사한 결과를 나타낸다. 하지만 배경 전환 장면(940) 및 다른 객체가 등장하는 장면(960)에서 다른 비교예들과 달리 정확하게 주제 변화를 검출하는 것을 확인할 수 있다. 다른 동적 알고리즘(DTD 및 온라인 CP)보다 약간의 처리 시간이 더 소요되지만, 다른 동적 알고리즘(DTD 및 온라인 CP)과 달리 분할 프로세스를 통해 분해 정확도를 향상시킨다.Looking at the re-decomposition results according to this embodiment, the scene 910 in which an object emerges from a cave in the image data shows similar results to the comparative examples (Full-CP, DTD, and Online CP). However, it can be seen that subject changes are accurately detected in the background transition scene 940 and the scene 960 in which another object appears, unlike other comparative examples. It takes slightly more processing time than other dynamic algorithms (DTD and online CP), but unlike other dynamic algorithms (DTD and online CP), it improves decomposition accuracy through the segmentation process.

이상의 실시예들에서 사용되는 '~부'라는 용어는 소프트웨어 또는 FPGA(field programmable gate array) 또는 ASIC 와 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만 '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램특허 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다.The term '~unit' used in the above embodiments refers to software or hardware components such as FPGA (field programmable gate array) or ASIC, and the '~unit' performs certain roles. However, '~part' is not limited to software or hardware. The '~ part' may be configured to reside in an addressable storage medium and may be configured to reproduce on one or more processors. Therefore, as an example, '~ part' refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays and variables.

구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로부터 분리될 수 있다.The functions provided within the components and 'parts' may be combined into a smaller number of components and 'parts' or may be separated from additional components and 'parts'.

뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU 들을 재생시키도록 구현될 수도 있다.In addition, the components and 'parts' may be implemented to regenerate one or more CPUs within the device or secure multimedia card.

한편, 본 명세서를 통해 설명된 일 실시예에 따른 텐서 분석 방법은 컴퓨터에 의해 실행 가능한 명령어 및 데이터를 저장하는, 컴퓨터로 판독 가능한 매체의 형태로도 구현될 수 있다. 이때, 명령어 및 데이터는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 소정의 프로그램 모듈을 생성하여 소정의 동작을 수행할 수 있다. 또한, 컴퓨터로 판독 가능한 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터로 판독 가능한 매체는 컴퓨터 기록 매체일 수 있는데, 컴퓨터 기록 매체는 컴퓨터 판독 가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함할 수 있다. 예를 들어, 컴퓨터 기록 매체는 HDD 및 SSD 등과 같은 마그네틱 저장 매체, CD, DVD 및 블루레이 디스크 등과 같은 광학적 기록 매체, 또는 네트워크를 통해 접근 가능한 서버에 포함되는 메모리일 수 있다.Meanwhile, the tensor analysis method according to an embodiment described through this specification may also be implemented in the form of a computer-readable medium that stores instructions and data executable by a computer. At this time, instructions and data can be stored in the form of program code, and when executed by a processor, they can generate a certain program module and perform a certain operation. Additionally, computer-readable media can be any available media that can be accessed by a computer and includes both volatile and non-volatile media, removable and non-removable media. Additionally, computer-readable media may be computer recording media, which are volatile and non-volatile implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. It can include both volatile, removable and non-removable media. For example, computer recording media may be magnetic storage media such as HDDs and SSDs, optical recording media such as CDs, DVDs, and Blu-ray discs, or memory included in servers accessible through a network.

또한, 본 명세서를 통해 설명된 일 실시예에 따른 텐서 분석 방법은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 컴퓨터 프로그램(또는 컴퓨터 프로그램 제품)으로 구현될 수도 있다. 컴퓨터 프로그램은 프로세서에 의해 처리되는 프로그래밍 가능한 기계 명령어를 포함하고, 고레벨 프로그래밍 언어(High-level Programming Language), 객체 지향 프로그래밍 언어(Object-oriented Programming Language), 어셈블리 언어 또는 기계 언어 등으로 구현될 수 있다. 또한 컴퓨터 프로그램은 유형의 컴퓨터 판독가능 기록매체(예를 들어, 메모리, 하드디스크, 자기/광학 매체 또는 SSD(Solid-State Drive) 등)에 기록될 수 있다. Additionally, the tensor analysis method according to an embodiment described through this specification may be implemented as a computer program (or computer program product) including instructions executable by a computer. A computer program includes programmable machine instructions processed by a processor and may be implemented in a high-level programming language, object-oriented programming language, assembly language, or machine language. . Additionally, the computer program may be recorded on a tangible computer-readable recording medium (eg, memory, hard disk, magnetic/optical medium, or solid-state drive (SSD)).

따라서, 본 명세서를 통해 설명된 일 실시예에 따른 텐서 분석 방법은 상술한 바와 같은 컴퓨터 프로그램이 컴퓨팅 장치에 의해 실행됨으로써 구현될 수 있다. 컴퓨팅 장치는 프로세서와, 메모리와, 저장 장치와, 메모리 및 고속 확장포트에 접속하고 있는 고속 인터페이스와, 저속 버스와 저장 장치에 접속하고 있는 저속 인터페이스 중 적어도 일부를 포함할 수 있다. 이러한 성분들 각각은 다양한 버스를 이용하여 서로 접속되어 있으며, 공통 마더보드에 탑재되거나 다른 적절한 방식으로 장착될 수 있다. Accordingly, the tensor analysis method according to an embodiment described through this specification can be implemented by executing the above-described computer program by a computing device. The computing device may include at least some of a processor, memory, a storage device, a high-speed interface connected to the memory and a high-speed expansion port, and a low-speed interface connected to a low-speed bus and a storage device. Each of these components is connected to one another using various buses and may be mounted on a common motherboard or in some other suitable manner.

여기서 프로세서는 컴퓨팅 장치 내에서 명령어를 처리할 수 있는데, 이런 명령어로는, 예컨대 고속 인터페이스에 접속된 디스플레이처럼 외부 입력, 출력 장치상에 GUI(Graphic User Interface)를 제공하기 위한 그래픽 정보를 표시하기 위해 메모리나 저장 장치에 저장된 명령어를 들 수 있다. 다른 실시예로서, 다수의 프로세서 및(또는) 다수의 버스가 적절히 다수의 메모리 및 메모리 형태와 함께 이용될 수 있다. 또한 프로세서는 독립적인 다수의 아날로그 및(또는) 디지털 프로세서를 포함하는 칩들이 이루는 칩셋으로 구현될 수 있다. Here, the processor can process instructions within the computing device, such as displaying graphical information to provide a graphic user interface (GUI) on an external input or output device, such as a display connected to a high-speed interface. These may include instructions stored in memory or a storage device. In other embodiments, multiple processors and/or multiple buses may be utilized along with multiple memories and memory types as appropriate. Additionally, the processor may be implemented as a chipset consisting of chips including multiple independent analog and/or digital processors.

또한, 메모리는 컴퓨팅 장치 내에서 정보를 저장한다. 일례로, 메모리는 휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 다른 예로, 메모리는 비휘발성 메모리 유닛 또는 그들의 집합으로 구성될 수 있다. 또한 메모리는 예컨대, 자기 혹은 광 디스크와 같이 다른 형태의 컴퓨터 판독 가능한 매체일 수도 있다. Additionally, memory stores information within a computing device. In one example, memory may be comprised of volatile memory units or sets thereof. As another example, memory may consist of non-volatile memory units or sets thereof. The memory may also be another type of computer-readable medium, such as a magnetic or optical disk.

그리고, 저장장치는 컴퓨팅 장치에게 대용량의 저장공간을 제공할 수 있다. 저장 장치는 컴퓨터 판독 가능한 매체이거나 이런 매체를 포함하는 구성일 수 있으며, 예를 들어 SAN(Storage Area Network) 내의 장치들이나 다른 구성도 포함할 수 있고, 플로피 디스크 장치, 하드 디스크 장치, 광 디스크 장치, 혹은 테이프 장치, 플래시 메모리, 그와 유사한 다른 반도체 메모리 장치 혹은 장치 어레이일 수 있다.Additionally, the storage device can provide a large amount of storage space to the computing device. A storage device may be a computer-readable medium or a configuration that includes such media, and may include, for example, devices or other components within a storage area network (SAN), such as a floppy disk device, a hard disk device, an optical disk device, Or it may be a tape device, flash memory, or other similar semiconductor memory device or device array.

상술한 실시예들은 예시를 위한 것이며, 상술한 실시예들이 속하는 기술분야의 통상의 지식을 가진 자는 상술한 실시예들이 갖는 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above-described embodiments are for illustrative purposes, and those of ordinary skill in the technical field to which the above-described embodiments belong will recognize that they can be easily modified into other specific forms without changing the technical idea or essential features of the above-described embodiments. You will understand. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive. For example, each component described as unitary may be implemented in a distributed manner, and similarly, components described as distributed may also be implemented in a combined form.

본 명세서를 통해 보호받고자 하는 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope sought to be protected through this specification is indicated by the patent claims described later rather than the detailed description above, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts are included in the scope of the present invention. It should be interpreted as being

100 : 텐서 분석 장치
110 : 입출력부
120 : 저장부
130 : 제어부100: tensor analysis device
110: input/output unit
120: storage unit
130: control unit

Claims

An input/output unit for receiving data and outputting the results of processing the data;
a storage unit storing a program for performing a data adaptive tensor analysis method; and
It includes at least one process, and includes a control unit that analyzes the tensor stream received through the input/output unit by executing the program,
The control unit,
Output the tensor decomposition result calculated through a tensor decomposition method using a complement matrix from the new tensor slice of the tensor stream,
Detect change points in the tensor stream based on the tensor decomposition results,
A tensor analysis device that performs tensor decomposition in different ways depending on the degree of change in the change point of the tensor stream.

According to claim 1,
The tensor decomposition method using the complement matrix is,
Update the decomposition factors constituting the tensor decomposition result,
If the decomposition factor corresponds to the time factor, it is calculated by applying the supplementary matrix,
If the decomposition factor does not correspond to the time factor, the tensor analysis device updates the decomposition factor and updates the complement matrix by considering the weight of the previous tensor decomposition result.

According to claim 1,
The control unit,
A tensor analysis device that tracks the error between the new tensor slice and the tensor decomposition result, converts the tracked error into a score, and detects a change point in the tensor stream based on the score.

According to claim 1,
The control unit,
If the degree of change satisfies the first critical range, select the tensor decomposition initialization strategy, which is the first tensor decomposition strategy, and proceed with the division process according to re-decomposition;
If the degree of change satisfies the second critical range, the tensor analysis device selects the previous decomposition result reuse strategy, which is the second tensor decomposition strategy, and performs a refining process of updating the tensor decomposition result using the previous tensor decomposition result.

In a data adaptive tensor analysis method using a tensor analysis device,
Outputting a tensor decomposition result calculated from a new tensor slice of the tensor stream through a tensor decomposition method using a complement matrix;
Detecting a change point in the tensor stream based on the tensor decomposition result; and
A data adaptive tensor analysis method comprising performing tensor decomposition in different ways depending on the degree of change in the change point of the tensor stream.

According to claim 5,
The tensor decomposition method using the complement matrix is,
Update the decomposition factors constituting the tensor decomposition result,
If the decomposition factor corresponds to the time factor, it is calculated by applying the supplementary matrix,
If the decomposition factor does not correspond to the time factor, the data adaptive tensor analysis method updates the decomposition factor and updates the complement matrix by considering the weight of the previous tensor decomposition result.

According to claim 5,
The step of detecting change points in the tensor stream is,
A data adaptive tensor analysis method that tracks the error between the new tensor slice and the tensor decomposition result, converts the tracked error into a score, and detects change points in the tensor stream based on the score.

According to claim 5,
The step of performing tensor decomposition in the above different ways is,
If the degree of change satisfies the first critical range, select the tensor decomposition initialization strategy, which is the first tensor decomposition strategy, and proceed with the division process according to re-decomposition;
If the degree of change satisfies the second critical range, the second tensor decomposition strategy, the previous decomposition result reuse strategy, is selected and a refinement process is performed to update the tensor decomposition result using the previous tensor decomposition result. A data adaptive tensor analysis method. .

A computer-readable recording medium on which a program for performing the method according to claim 5 is recorded.

A computer program stored in a recording medium for performing the method according to claim 5, which is performed by a tensor analysis device.