KR102367181B1

KR102367181B1 - Method for data augmentation based on matrix factorization

Info

Publication number: KR102367181B1
Application number: KR1020200022532A
Authority: KR
Inventors: 박영택; 박현규; 신원철
Original assignee: 숭실대학교산학협력단
Priority date: 2019-11-28
Filing date: 2020-02-24
Publication date: 2022-02-25
Also published as: KR20210066681A

Abstract

본 발명은 행렬 인수분해 기법에 기초하여, 딥러닝 알고리즘을 위한 데이터 확대 방법을 개시한다. 본 발명의 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 방법은 사용자에 관한 제1 항목과 제2 항목에 대한 데이터가 수집되면, 제1 항목 및 제2 항목에 관한 제1 행렬을 생성하는 단계, 제1 행렬을 인수분해하여 제1 항목을 행으로 갖는 제2 행렬과 제2 항목을 열로 갖는 제3 행렬을 생성하는 단계, 제1 항목의 항목값 및 제2 항목의 항목값 중 적어도 하나를 제3 항목에 기초하여 치환하는 단계를 포함하고, 이때 제1 행렬의 각 요소 값은 제1 항목의 항목값과 제2 항목의 항목값의 조합에 대응하는 제3 항목의 항목값이다.The present invention discloses a data augmentation method for a deep learning algorithm based on a matrix factorization technique. In a data enlargement method based on a matrix factorization technique according to an embodiment of the present invention, when data on a first item and a second item about a user are collected, a first matrix about the first item and the second item is generated step, factoring the first matrix to generate a second matrix having first items as rows and a third matrix having second items as columns, at least one of an item value of the first item and an item value of the second item permuting based on the third item, wherein each element value of the first matrix is an item value of the third item corresponding to a combination of the item value of the first item and the item value of the second item.

Description

Method for data augmentation based on matrix factorization technique

본 발명은 행렬 인수분해 기법에 기초한 데이터 확대 방법에 관한 것이다.The present invention relates to a data augmentation method based on a matrix factorization technique.

딥러닝 알고리즘을 이용하면 이미지 데이터 또는 텍스트 데이터의 특징을 추출하고 이에 기초하여 데이터의 불분명한 부분을 복원할 수 있다. 이때 일반적으로 딥러닝 알고리즘에 입력되는 훈련 데이터가 많을수록 딥러닝 알고리즘의 성능이 향상될 수 있다. 그런데 훈련 데이터의 양은 한정적이기 때문에, 기존의 훈련 데이터를 가공하여 훈련 데이터의 수를 늘리는 데이터 확대(data augmentation)에 관한 연구가 진행 중에 있다. 여기서 데이터 확대는 데이터의 클래스 레이블을 변경하지 않고 데이터를 변환시키는 방법을 의미한다.By using a deep learning algorithm, it is possible to extract features of image data or text data and reconstruct an ambiguous part of the data based on this. In general, the more training data input to the deep learning algorithm, the better the performance of the deep learning algorithm. However, since the amount of training data is limited, research on data augmentation to increase the number of training data by processing existing training data is in progress. Data augmentation here refers to a method of transforming data without changing the class label of the data.

예를 들어, 이미지 처리를 수행하는 컨볼루션 신경망(CNN, Convolution Neural Network)의 경우, 원래의 이미지 데이터를 상하/좌우 반전(flipping)시키거나, 회전(rotation) 또는 크기를 조절(rescaling)하는 등의 변형을 통해 이미지 데이터의 수를 확대시킨 후, 확대된 데이터를 이용하면 학습 모델의 성능을 향상시킬 수 있다. For example, in the case of a convolutional neural network (CNN) that performs image processing, the original image data is flipped up/down/left/right, rotated or scaled, etc. After expanding the number of image data through the transformation of

만약, 데이터의 유형이 이미지가 아닌 시퀀스(sequence)인 경우, 상술된 이미지 데이터의 확대 방법과 상이한 데이터 확대 방법이 요구될 수 있다. 예를 들어 텍스트(text) 유형의 데이터는 데이터에 포함된 각 문자들에 순서가 있으며, 단어들이 갖는 의미가 존재하기 때문에 이미지 처리와 같은 방식의 변형 방법을 사용할 수 없다. 따라서, 텍스트 데이터의 경우 데이터 치환(data replacement) 및 데이터 셔플링(data shuffling)이 데이터 확대 방법으로 활용하고 있다. If the type of data is a sequence rather than an image, a data enlargement method different from the above-described image data enlargement method may be required. For example, text-type data cannot use a transformation method such as image processing because each character in the data has an order and meanings exist. Accordingly, in the case of text data, data replacement and data shuffling are used as data expansion methods.

여기서 데이터 치환은 데이터에 포함된 단어나 구절을 유의어(synonyms)로 바꾸는 형식으로 데이터를 확대하는 방법이며, 데이터 셔플링은 단어 및 구절의 순서에 변화를 주는 방법이다. 한편 텍스트 데이터는 상술한 바와 같이 문자 간의 순서가 중요하기 때문에, 텍스트 데이터의 데이터 확대를 위해 데이터 치환 방법에 대한 연구가 주로 진행 중이다.Here, data substitution is a method of enlarging data in a format that replaces words or phrases included in the data with synonyms, and data shuffling is a method of changing the order of words and phrases. On the other hand, since the order between characters is important for text data as described above, research on a data substitution method is mainly in progress for data expansion of text data.

대한민국 등록특허공보 제10-1872811호(2018.6.29)Republic of Korea Patent Publication No. 10-1872811 (2018.6.29)

상기의 문제점을 해결하기 위해 본 발명에서는 행렬 인수분해 기법에 기초한 데이터 확대 방법을 제공하는데 그 목적이 있다.In order to solve the above problems, an object of the present invention is to provide a data expansion method based on a matrix factorization technique.

상술한 목적을 달성하기 위한 본 발명의 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 방법은 사용자에 관한 제1 항목과 제2 항목에 대한 데이터가 수집되면, 제1 항목 및 제2 항목에 관한 제1 행렬을 생성하는 단계, 제1 행렬을 인수분해하여 제1 항목을 행으로 갖는 제2 행렬과 제2 항목을 열로 갖는 제3 행렬을 생성하는 단계, 제1 항목의 항목값 및 제2 항목의 항목값 중 적어도 하나를 제3 항목에 기초하여 치환하는 단계를 포함하고, 이때 제1 행렬의 각 요소 값은 제1 항목의 항목값과 제2 항목의 항목값의 조합에 대응하는 제3 항목의 항목값이다.In a data enlargement method based on a matrix factorization technique according to an embodiment of the present invention for achieving the above object, when data on the first and second items about the user are collected, the first and second items are generating a first matrix about permuting at least one of the item values of the item based on the third item, wherein each element value of the first matrix is a third item corresponding to a combination of the item value of the first item and the item value of the second item; The item value of the item.

바람직하게, 제1 항목과 제2 항목에 대한 데이터는, 센서를 통해 수집된 데이터일 수 있다.Preferably, the data for the first item and the second item may be data collected through a sensor.

바람직하게, 제1 항목은 사용자의 행동 정보를 분류한 항목이고, 제2 항목은 사용자의 자세 정보를 분류한 항목이고, 제3 항목은 사용자의 행위 의도를 분류한 항목일 수 있다.Preferably, the first item may be an item obtained by classifying the user's behavior information, the second item may be an item obtained by classifying the user's posture information, and the third item may be an item obtained by classifying the user's action intention.

바람직하게, 제2 행렬의 열과 제3 행렬의 행이 모두 시간에 관한 것인 경우, 제2 행렬은 시간에 따른 사용자의 행동 빈도를 나타내는 것이고, 제3 행렬은 시간에 따른 사용자의 자세 빈도를 나타내는 것일 수 있다.Preferably, when both the columns of the second matrix and the rows of the third matrix relate to time, the second matrix represents the frequency of actions of the user over time, and the third matrix represents the frequency of postures of the user over time. it could be

바람직하게, 제1 행렬은 사용자에 따라 달라지는 것일 수 있다.Preferably, the first matrix may vary depending on the user.

바람직하게, 제1 항목의 항목값 및 제2 항목의 항목값 중 적어도 하나를 제3 항목에 기초하여 치환하는 단계는, 제1 항목의 항목값인 제1 데이터를 치환하는 경우, 제1 데이터와 대응하는 제3 항목의 항목값에 기초하여 제2 데이터를 결정하는 단계 및 제1 데이터 대신 제2 데이터를 제1 항목의 항목값으로 치환하는 단계를 포함할 수 있다.Preferably, the step of substituting at least one of the item value of the first item and the item value of the second item based on the third item comprises: when replacing the first data that is the item value of the first item, the first data and The method may include determining second data based on the corresponding item value of the third item, and substituting the second data with the item value of the first item instead of the first data.

바람직하게, 제1 데이터와 대응되는 제3 항목의 항목값은 제2 데이터와 대응되는 제3 항목의 항목값과 일치할 수 있다.Preferably, the item value of the third item corresponding to the first data may be identical to the item value of the third item corresponding to the second data.

바람직하게, 제2 데이터를 결정하는 단계는 시간 정보 및 제2 항목의 항목값을 추가로 고려하여 제2 데이터를 결정하는 단계일 수 있다.Preferably, the determining of the second data may be a step of determining the second data by additionally considering the time information and the item value of the second item.

바람직하게, 제1 항목의 항목값 및 제2 항목의 항목값 중 적어도 하나를 제3 항목에 기초하여 치환하는 단계는, 제2 항목의 항목값인 제3 데이터를 치환하는 경우, 제3 데이터와 대응하는 제3 항목의 항목값에 기초하여 제4 데이터를 결정하는 단계 및 제3 데이터 대신 제4 데이터를 제2 항목의 항목값으로 치환하는 단계를 포함할 수 있다.Preferably, the step of substituting at least one of the item value of the first item and the item value of the second item based on the third item comprises: when the third data that is the item value of the second item is replaced, the third data and The method may include determining fourth data based on the corresponding item value of the third item, and replacing the fourth data with the item value of the second item instead of the third data.

바람직하게, 제3 데이터와 대응되는 제3 항목의 항목값은 제4 데이터와 대응되는 제3 항목의 항목값과 일치할 수 있다.Preferably, the item value of the third item corresponding to the third data may be identical to the item value of the third item corresponding to the fourth data.

바람직하게, 제4 데이터를 결정하는 단계는 시간 정보 및 제1 항목의 항목값을 추가로 고려하여 결정하는 단계일 수 있다.Preferably, the step of determining the fourth data may be a step of determining by additionally considering the time information and the item value of the first item.

본 발명의 다른 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 장치는 사용자에 관한 제1 항목과 제2 항목에 대한 데이터가 수집되면, 제1 항목 및 제2 항목에 관한 제1 행렬을 생성하는 전처리부, 제1 행렬을 인수분해하여 제1 항목을 행으로 갖는 제2 행렬과 제2 항목을 열로 갖는 제3 행렬을 생성하는 행렬 인수분해부, 제1 항목의 항목값 및 제2 항목의 항목값 중 적어도 하나를 제3 항목에 기초하여 치환하는 데이터 치환부를 포함하고, 이때 제1 행렬의 각 요소 값은 제1 항목의 항목값과 제2 항목의 항목값의 조합에 대응하는 제3 항목의 항목값일 수 있다.The apparatus for expanding data based on a matrix factorization technique according to another embodiment of the present invention generates a first matrix for the first item and the second item when data on the first item and the second item regarding the user are collected a preprocessing unit that factorizes the first matrix to generate a second matrix having the first item as rows and a third matrix having the second item as columns; and a data permutation unit for permuting at least one of the item values based on a third item, wherein each element value of the first matrix is a third item corresponding to a combination of the item value of the first item and the item value of the second item It can be an item value of .

본 발명의 또 다른 일 실시예에 따른 컴퓨터-판독가능 저장 매체는 컴퓨터 프로그램이 프로세서에 의해 실행될 때, 상술된 방법 중 어느 하나의 방법이 수행되는 컴퓨터 프로그램을 저장한 것일 수 있다.A computer-readable storage medium according to another embodiment of the present invention may store a computer program in which any one of the methods described above is performed when the computer program is executed by a processor.

본 발명의 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 방법 및 데이터 확대 장치는 데이터 수집이 어려운 분야에서 일부의 데이터로부터 데이터 확대가 가능하기 때문에, 기존의 딥러닝을 통한 모델 학습의 성능을 향상시킬 수 있는 효과가 있다.Since the data enlargement method and data enlargement apparatus based on the matrix factorization technique according to an embodiment of the present invention can enlarge data from some data in a field where data collection is difficult, the performance of model learning through existing deep learning is improved. There is an effect that can be improved.

도 1은 본 발명의 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 방법을 설명하기 위한 흐름도이다.
도 2는 고령자의 행위 의도를 학습하는 딥러닝 알고리즘을 위해 본 발명의 일 실시예에 따라 생성된 제1 행렬을 설명하기 위한 도면이다.
도 3은 고령자의 행위 의도를 학습하는 딥러닝 알고리즘을 위해 활용되는 행위 의도 레이블을 설명하기 위한 도면이다.
도 4는 고령자의 행위 의도를 학습하는 딥러닝 알고리즘을 위해 본 발명의 일 실시예에 따라 제1 행렬을 인수분해하는 방법을 설명하기 위한 도면이다.
도 5는 고령자의 행위 의도를 학습하는 딥러닝 알고리즘을 위해 본 발명의 일 실시예에 따라 특징 벡터를 추출하는 방법을 설명하기 위한 도면이다.
도 6은 고령자의 행위 의도를 학습하는 딥러닝 알고리즘을 위해 본 발명의 일 실시예에 따라 데이터를 치환하는 방법을 설명하기 위한 도면이다.
도 7은 본 발명의 일 실시예에 따른 데이터 확대 장치를 설명하기 위한 블록도이다.1 is a flowchart illustrating a data enlargement method based on a matrix factorization technique according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining a first matrix generated according to an embodiment of the present invention for a deep learning algorithm for learning an action intention of an elderly person.
3 is a diagram for explaining an action intention label used for a deep learning algorithm for learning the action intention of an elderly person.
4 is a diagram for explaining a method of factoring a first matrix according to an embodiment of the present invention for a deep learning algorithm for learning an action intention of an elderly person.
5 is a diagram for explaining a method of extracting a feature vector according to an embodiment of the present invention for a deep learning algorithm for learning the behavioral intention of an elderly person.
6 is a diagram for explaining a method of substituting data according to an embodiment of the present invention for a deep learning algorithm for learning the behavioral intention of an elderly person.
7 is a block diagram illustrating an apparatus for enlarging data according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면을 참조하여 상세하게 설명하도록 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.Since the present invention can have various changes and can have various embodiments, specific embodiments will be described in detail with reference to the drawings. However, this is not intended to limit the present invention to specific embodiments, and it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing each figure, like reference numerals have been used for like elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재 항목들의 조합 또는 복수의 관련된 기재 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. and/or includes a combination of a plurality of related description items or any of a plurality of related description items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급될 때에는 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When a component is referred to as “connected” or “connected” to another component, it may be directly connected or connected to the other component, but it should be understood that other components may exist in between. something to do. On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

명세서 및 청구범위 전체에서, 어떤 부분이 어떤 구성 요소를 포함한다고 할때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있다는 것을 의미한다.Throughout the specification and claims, when a part includes a certain element, it means that other elements may be further included, rather than excluding other elements, unless otherwise stated.

종래의 딥러닝 알고리즘에 활용되는 데이터 확대 기법으로는 GAN(Generative Adversarial Networks) 알고리즘이 알려져 있으며, 이를 활용하여 이미지 데이터의 불분명한 부분을 복원하거나 생성할 수 있다. GAN 알고리즘은 2014년 Ian Goodfellow에 의해 제안되었으며, 생성자(generator)와 구별자(discriminator)로 구성된 알고리즘으로 데이터로부터 추출한 특징을 학습하여 실제와 유사한 형태의 데이터를 생성하는데 사용된다. 하지만 GAN 알고리즘은 데이터 기반의 학습 방법을 사용하므로 데이터의 양이 많을수록 효과적인 학습이 가능하다. 즉, GAN 알고리즘은 많은 양의 데이터가 확보된 경우에 활용하는 것이 효율적이다. 또한 GAN 알고리즘은 그 구조에 따라 연산의 복잡도가 증가할 수 있는 문제점이 존재한다.As a data augmentation technique used in a conventional deep learning algorithm, a Generative Adversarial Networks (GAN) algorithm is known and can be used to restore or generate ambiguous parts of image data. The GAN algorithm was proposed by Ian Goodfellow in 2014, and is an algorithm consisting of a generator and a discriminator, and is used to learn features extracted from data and generate data in a form similar to reality. However, since the GAN algorithm uses a data-based learning method, the larger the amount of data, the more effective learning is possible. That is, it is efficient to use the GAN algorithm when a large amount of data is secured. In addition, the GAN algorithm has a problem in that the complexity of operation may increase depending on its structure.

이에 따라 본 발명은 행렬 인수분해 기법에 기초하여 데이터를 확대하는 방법을 제안하고자 한다. 구체적으로 본 발명의 일 실시예에 따른 데이터 확대 방법은 행렬 인수분해(matrix factorization)방식을 활용하여 수집된 데이터로부터 특징 벡터를 추출하고 추출된 특징에 따라 각 클래스 레이블에 맞게 치환(replacement)을 수행한다. Accordingly, the present invention intends to propose a method of expanding data based on a matrix factorization technique. Specifically, in the data enlargement method according to an embodiment of the present invention, a feature vector is extracted from collected data using a matrix factorization method, and replacement is performed according to each class label according to the extracted features. do.

이하, 첨부된 도면을 참조하여 본 발명에 대해 상세하게 설명한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 방법을 설명하기 위한 흐름도이다. 1 is a flowchart illustrating a data expansion method based on a matrix factorization technique according to an embodiment of the present invention.

단계 110에서, 본 발명의 일 실시예에 따른 방법은 사용자에 관한 제1 항목과 제2 항목에 대한 데이터가 수집되면, 제1 항목 및 제2 항목에 관한 제1 행렬을 생성할 수 있다. 이때 제1 행렬의 각 요소 값은 제1 항목의 항목값과 제2 항목의 항목값의 조합에 대응하는 제3 항목의 항목값이다. In step 110, the method according to an embodiment of the present invention may generate a first matrix regarding the first item and the second item when data on the first item and the second item regarding the user are collected. In this case, each element value of the first matrix is an item value of the third item corresponding to a combination of the item value of the first item and the item value of the second item.

여기서, 제1 행렬은 사용자에 따라 달라질 수 있으며, 제1 항목과 제2 항목에 대한 데이터는, 센서를 통해 수집된 데이터일 수 있다.Here, the first matrix may vary depending on the user, and data for the first item and the second item may be data collected through a sensor.

단계 120에서, 일 실시예에 따른 방법은 제1 행렬을 인수분해하여 제1 항목을 행으로 갖는 제2 행렬과 제2 항목을 열로 갖는 제3 행렬을 생성할 수 있다.In operation 120 , the method according to an embodiment may factorize the first matrix to generate a second matrix having a first item as a row and a third matrix having a second item as a column.

이때, 제1 항목은 사용자의 행동을 분류한 항목이고, 제2 항목은 사용자의 자세 정보를 분류한 항목이고, 제3 항목은 사용자의 행위 의도를 분류한 항목일 수 있다. 여기서 만약 제2 행렬의 열과 제3 행렬의 행이 모두 시간에 관한 것인 경우, 제2 행렬은 시간에 따른 사용자의 행동 빈도를 나타내는 것이고, 제3 행렬은 시간에 따른 사용자의 자세 빈도를 나타내는 것일 수 있다.In this case, the first item may be an item obtained by classifying the user's behavior, the second item may be an item obtained by classifying the user's posture information, and the third item may be an item obtained by classifying the user's action intention. Here, if both the columns of the second matrix and the rows of the third matrix relate to time, the second matrix represents the frequency of the user's behavior over time, and the third matrix represents the frequency of the user's posture over time. can

단계 130에서, 제1 항목의 항목값 및 제2 항목의 항목값 중 적어도 하나를 제3 항목에 기초하여 치환할 수 있다.In operation 130, at least one of the item value of the first item and the item value of the second item may be substituted based on the third item.

이때 제1 항목의 항목값인 제1 데이터를 치환하는 경우, 단계 130은 제1 데이터와 대응하는 제3 항목의 항목값에 기초하여 제2 데이터를 결정하는 단계 및 제1 데이터 대신 제2 데이터를 제1 항목의 항목값으로 치환하는 단계를 포함할 수 있다.In this case, if the first data that is the item value of the first item is substituted, step 130 is a step of determining the second data based on the item value of the third item corresponding to the first data and replacing the second data with the first data It may include the step of substituting the item value of the first item.

나아가 제1 데이터와 대응되는 제3 항목의 항목값은 제2 데이터와 대응되는 제3 항목의 항목값과 일치하는 것일 수 있다.Furthermore, the item value of the third item corresponding to the first data may be identical to the item value of the third item corresponding to the second data.

또한, 제2 데이터를 결정하는 단계는, 시간 정보 및 제2 항목의 항목값을 추가로 고려하여 제2 데이터를 결정하는 단계일 수 있다.Also, the determining of the second data may include determining the second data by additionally considering time information and the item value of the second item.

한편, 제2 항목의 항목값인 제3 데이터를 치환하는 경우, 단계 130은 제3 데이터와 대응하는 제3 항목의 항목값에 기초하여 제4 데이터를 결정하는 단계 및 제3 데이터 대신 제4 데이터를 제2 항목의 항목값으로 치환하는 단계를 포함할 수 있다.Meanwhile, when the third data that is the item value of the second item is substituted, step 130 includes determining fourth data based on the item value of the third item corresponding to the third data and fourth data instead of the third data It may include the step of substituting the item value of the second item.

나아가 제3 데이터와 대응되는 제3 항목의 항목값은 제4 데이터와 대응되는 제3 항목의 항목값과 일치할 수 있다.Furthermore, the item value of the third item corresponding to the third data may be identical to the item value of the third item corresponding to the fourth data.

또한, 제4 데이터를 결정하는 단계는, 시간 정보 및 제1 항목의 항목값을 추가로 고려하여 제4 데이터를 결정하는 단계일 수 있다.Also, the determining of the fourth data may be a step of determining the fourth data by additionally considering the time information and the item value of the first item.

이하에서는 상술된 방법의 구체적인 실시예를 서술하기로 한다.Hereinafter, specific examples of the above-described method will be described.

본 발명의 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 방법은 고령자의 행위 의도를 추론하는 데 활용될 수 있다. 이를 위해서는 훈련 데이터를 이용하여, 딥러닝 알고리즘을 학습시키는 과정이 필요하며, 본 발명의 일 실시예에 따른 방법은 훈련 데이터의 수를 늘리는 데이터 확대 방법을 제공한다.The data expansion method based on the matrix factorization technique according to an embodiment of the present invention may be utilized to infer the behavioral intention of an elderly person. To this end, a process of learning a deep learning algorithm using training data is required, and the method according to an embodiment of the present invention provides a data expansion method for increasing the number of training data.

예를 들어, 본 발명의 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 방법은 고령자의 시간에 따라 행동 및 자세 정보로 구성된 지각 정보 데이터를 3차원 행렬 형태로 변환할 수 있다. 다시말해 도 1의 제1 항목은 고령자의 행동 정보, 제2 항목은 고령자의 자세 정보일 수 있고, 제3 항목은 고령자의 행위 의도를 나타내는 레이블일 수 있다. 이러한 경우, 제1 행렬은 특정 시점에서의 고령자의 행동 및 자세 정보의 각 조합에 고령자의 행위 의도를 요소값으로 하는 행렬일 수 있다.For example, the data enlargement method based on the matrix factorization technique according to an embodiment of the present invention may convert perceptual information data composed of behavior and posture information according to time of an elderly person into a three-dimensional matrix form. In other words, the first item of FIG. 1 may be behavior information of the elderly person, the second item may be posture information of the elderly person, and the third item may be a label indicating the behavioral intention of the elderly person. In this case, the first matrix may be a matrix in which the behavior intention of the elderly person is an element value in each combination of the behavior and posture information of the elderly person at a specific time point.

또한, 변환된 3차원 행렬을 행렬 인수분해(matrix factorization)하여 각 의도 레이블(label)의 특징을 나타내는 지각 정보에 대한 행렬을 추출할 수 있다. 여기서 의도 레이블은 고령자가 특정 행동 및 자세를 취한 의도들이 정의된 레이블을 의미한다. 즉 도 1의 제2 행렬 및 제3 행렬은 제1 행렬을 인수분해하여 얻은 행렬들로, 제2 행렬은 고령자의 행동 정보를 행으로 갖는 행렬이고, 제3 행렬은 고령자의 자세 정보를 열로 갖는 행렬일 수 있다. 그리고 제2 행렬의 각 요소값은 특정 시점에 특정 행동을 할때의 고령자의 행위 의도를 나타내는 것이고, 제3 행렬의 각 요소값은 특정 시점에 특정 자세를 취할때의 고령자의 행위 의도를 나타내는 것일 수 있다.In addition, the transformed 3D matrix may be subjected to matrix factorization to extract a matrix for perceptual information indicating the characteristics of each intention label. Here, the intention label refers to a label in which the intentions of the elderly person taking a specific action and posture are defined. That is, the second and third matrices of FIG. 1 are matrices obtained by factoring the first matrix, the second matrix is a matrix having behavior information of the elderly as rows, and the third matrix is a matrix having the posture information of the elderly as columns It can be a matrix. In addition, each element value of the second matrix represents the behavioral intention of the elderly when performing a specific action at a specific time, and each element value of the third matrix represents the behavioral intention of the elderly when taking a specific posture at a specific time can

한편 본 발명의 일 실시예에 따른 방법은 제2 행렬 및 제3 행렬의 요소값 중 적어도 하나를 치환(replacement)하여, 딥러닝 알고리즘에 활용되는 훈련 데이터의 수를 확대시킬 수 있다. 여기서 본 발명의 데이터는 시퀀스 형태의 데이터로 시간에 따른 의도의 순서가 유지되어야하기 때문에, 치환(replacement)방식으로 데이터 확대를 수행한다. 예를 들어, 본 발명의 일 실시예에 따른 방법은 도 1의 고령자의 특정 행동 정보를, 해당 행동 정보와 대응되는 고령자의 행위 의도값과 동일한 의도값을 갖는 다른 행동 정보로 치환할 수 있다. Meanwhile, in the method according to an embodiment of the present invention, at least one of the element values of the second matrix and the third matrix may be replaced to expand the number of training data used in the deep learning algorithm. Here, the data of the present invention is data in the form of a sequence, and since the order of intentions according to time must be maintained, data expansion is performed by a replacement method. For example, the method according to an embodiment of the present invention may replace the specific behavior information of the elderly person of FIG. 1 with other behavior information having the same intention value as the behavior intention value of the elderly person corresponding to the corresponding behavior information.

고령자의 주거 공간으로부터 획득될 수 있는 일상행동 정보는 보통 일상생활을 정상적으로 수행할 수 있는 고령자에 대한 데이터가 대다수이며, 거동이 불편하거나 이상 생활에 대한 데이터 수집이 어려운 문제점이 있다. 따라서 딥러닝 알고리즘을 이용하여 모델 학습을 수행할 때 정상적인 데이터에 비해 비정상적인 데이터의 양이 적어 데이터 불균형의 문제가 발생할 수 있으며, 이러한 데이터를 통해 학습된 모델은 한쪽으로 편향되어 전체 학습 모델의 정확도가 떨어질 수 있다. The daily behavior information that can be obtained from the residential space of the elderly is mostly data about the elderly who can normally perform daily life, and there is a problem in that it is difficult to collect data about an abnormal life or have inconvenient movement. Therefore, when performing model training using a deep learning algorithm, the amount of abnormal data may be small compared to normal data, so data imbalance may occur. can fall

그런데 본 발명의 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 방법은 고령자의 각 클래스에 해당되는 의도에 대한 특징을 추출하여, 추출된 특징인 행동, 자세를 치환하는 방법으로 데이터를 확대하기 때문에, 상술한 데이터의 불균형 문제를 해결할 수 있고 나아가 학습 모델의 정확도를 향상시키는 효과가 있다.However, the data expansion method based on the matrix factorization technique according to an embodiment of the present invention extracts the features of intention corresponding to each class of the elderly, and expands the data by substituting the extracted features, such as behavior and posture. Therefore, it is possible to solve the above-described data imbalance problem, and furthermore, there is an effect of improving the accuracy of the learning model.

이하에서는 고령자의 지각 정보 데이터를 활용하여 일상 생활에서의 의도 레이블을 학습하는 학습 모델을 위한 데이터 확대 기법에 대해 설명하기로 한다. 그러나, 본 발명에 따른 행렬 인수분해 기법에 기초한 데이터 확대 방법은 고령자의 행동에 관한 특징을 추출하는 것 뿐만 아니라, 딥러닝 알고리즘에 활용하기 위해 텍스트 데이터를 확대해야 모든 방법 및 장치에 사용 범위를 확장시킬 수 있음은 해당 기술분야의 통상의 기술자에게 자명하다.Hereinafter, a data augmentation technique for a learning model that learns intention labels in daily life by using perceptual information data of the elderly will be described. However, the data expansion method based on the matrix factorization technique according to the present invention not only extracts features related to the behavior of the elderly, but also expands the text data for use in deep learning algorithms to expand the range of use for all methods and devices It is obvious to those skilled in the art that it can be done.

도 2는 고령자의 행위 의도를 학습하는 딥러닝 알고리즘을 위해 본 발명의 일 실시예에 따라 생성된 제1 행렬을 설명하기 위한 도면이고, 도 3은 고령자의 행위 의도를 학습하는 딥러닝 알고리즘을 위해 활용되는 행위 의도 레이블을 설명하기 위한 도면이다.2 is a diagram for explaining a first matrix generated according to an embodiment of the present invention for a deep learning algorithm for learning the behavioral intention of the elderly, and FIG. 3 is for a deep learning algorithm for learning the behavioral intention of the elderly It is a diagram for explaining a used action intention label.

도 2를 참고하면, 고령자의 지각 정보 데이터는 실제 고령자의 실생활 환경에서 수집한 정보로써 55종의 행동 정보(A1 내지 A55)와 10종의 자세 정보(P1 내지 P10)로 구성된 행렬일 수 있다. 수집한 데이터는 각 시점마다 한가지의 행동 정보와 한가지의 자세 정보로 구성되며, 해당 고령자의 행위 의도는 해당 행동 정보과 자세 정보의 조합에 기초하여 결정될 수 있다. Referring to FIG. 2 , the perception information data of the elderly is information collected in the real life environment of the elderly, and may be a matrix composed of 55 types of behavior information (A1 to A55) and 10 types of posture information (P1 to P10). The collected data consists of one behavioral information and one posture information at each time point, and the behavioral intention of the elderly person may be determined based on the combination of the corresponding behavioral information and the posture information.

도 3을 참고하면, "식사하기(meal)" 의도는 식사를 하기 위해 준비하는 과정까지 포함된 의도로 정의하여 "수저/포크로 음식 집어먹기", "물/음료 마시기", "가스레인지 음식 데우기" 등과 같은 행동 정보와 "앉다"와 같은 자세 정보의 조합으로 결정될 수 있다. 반면에 "청소하기(cleaning)" 의도는 "빨래 널기", "진공청소기 돌리기" 등과 같은 행동 정보 및 "서있다", "앉다"와 같은 자세 정보의 조합으로 결정될 수 있다. Referring to FIG. 3 , the "meal" intention is defined as an intention that includes the process of preparing for a meal, "picking food with a spoon/fork", "drinking water/drink", "gas stove food It may be determined by a combination of behavior information such as “warming” and posture information such as “sit”. On the other hand, the "cleaning" intention may be determined by a combination of behavioral information such as "spreading laundry" and "running the vacuum cleaner" and posture information such as "standing" and "sit".

따라서 본 발명의 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 방법은 도 3과 같이 고령자가 생활 중에 가질 수 있는 행위 의도를 5 가지로 분류하고 분류된 각 행위 의도에 대하여 특징이 되는 행동 정보, 자세 정보의 조합을 정의할 수 있으며, 이는 행렬 인수분해를 통해 획득된 행렬들에 기초하여 결정될 수 있다. 한편 고령자가 생활 중에 가질 수 있는 행위 의도의 수, 행동 정보 및 자세 정보의 수는 이에 제한되지 않으며, 본 발명이 구현되는 시스템의 성능 및 요구사항에 따라 달라질 수 있음은 해당 기술분야의 통상의 기술자에게 자명하다.Accordingly, in the data expansion method based on the matrix factorization technique according to an embodiment of the present invention, as shown in FIG. 3 , an action intention that an elderly person can have in life is classified into 5 types, and behavior information characterized by each classified action intention , a combination of posture information may be defined, which may be determined based on matrices obtained through matrix factorization. On the other hand, those skilled in the art know that the number of action intentions, action information, and posture information that an elderly person can have in life is not limited thereto, and may vary depending on the performance and requirements of a system in which the present invention is implemented. self-evident to

다시 도 2를 참고하면, 55종에 대한 행동(A1 내지 A55)을 행으로 표현하고, 10종에 대한 자세(P1 내지 P10)를 열로 표현하면 55 x 10 크기를 갖는 제1 행렬(matrix)이 생성될 수 있다. 이때 제1 행렬의 각 요소값은 제3 항목의 항목값일 수 있다. 다시 말해, 제1 행렬의 550개의 각 요소값은 5 가지의 행위 의도에 대한 클래스 레이블로서 1부터 5까지의 정수값 중 하나일 수 있다. Referring back to FIG. 2 , when the behaviors (A1 to A55) for 55 types are expressed in rows and postures (P1 to P10) for 10 types are expressed as columns, a first matrix having a size of 55 x 10 is can be created In this case, each element value of the first matrix may be an item value of the third item. In other words, each of the 550 element values of the first matrix may be one of integer values from 1 to 5 as class labels for five behavioral intentions.

예를 들어, 도 3의 5 가지의 의도("식사하기" 내지 "여가생활") 중에서 "식사하기" 의도를 나타내는 정수 값은 3이라고 정의한다면, 도 2의 제1 행렬의 (A1, P1) 좌표의 요소값은 3이므로, 이를 통해 고령자로부터"A1"인 "수저/포크로 음식 집어먹기" 행동과 "P1"인 "앉다" 자세가 센싱되면, "식사하기" 의도가 있는 행동과 자세임을 추론할 수 있다.For example, if the integer value representing the "meal" intention among the five intentions ("eating" to "leisure life") of FIG. 3 is defined as 3, (A1, P1) of the first matrix of FIG. Since the element value of the coordinates is 3, if the "Picking food with spoon/fork" action and "P1" "sit" posture are sensed from the elderly through this, it is an action and posture with the intention of "eating". can be inferred.

도 4는 고령자의 행위 의도를 학습하는 딥러닝 알고리즘을 위해 본 발명의 일 실시예에 따라 제1 행렬을 인수분해하는 방법을 설명하기 위한 도면이다.4 is a diagram for explaining a method of factoring a first matrix according to an embodiment of the present invention for a deep learning algorithm for learning the behavioral intention of an elderly person.

본 발명의 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 방법은 제1 행렬을 인수분해하여 제1 항목을 행으로 갖는 제2 행렬과 제2 항목을 열로 갖는 제3 행렬을 생성할 수 있다. 예를 들면 각 사용자마다 행동, 자세에 대한 의도값을 지닌 제2 행렬 및 제3 행렬을 생성하며, 하루에 대한 시간 단위로 행동 및 자세에 대한 빈도수에 대한 특징 벡터를 생성할 수 있다.The data enlargement method based on the matrix factorization technique according to an embodiment of the present invention may generate a second matrix having a first item as a row and a third matrix having a second item as a column by factoring a first matrix. . For example, a second matrix and a third matrix having intention values for actions and postures for each user may be generated, and feature vectors for frequencies of actions and postures may be generated in units of time per day.

도 4를 참고하면, 행렬 인수분해 방식을 통해 시간(k)에 해당되는 특징 벡터에 따라 M x N의 제1 행렬(411)을 학습하여 데이터에 대한 클래스 레이블의 값을 가지는 모델을 학습시킬 수 있다. 여기서 M은 제1 항목의 항목값의 수를 의미하며 N은 제2 항목의 항목값의 수를 의미한다. 예를 들어, M은 고령자의 행동 정보 항목의 총 수를 의미하고, N은 고령자의 자세 정보 항목의 총 수를 의미하는 것일 수 있다. 만약 하루에 1시간마다 고령자의 행동 정보 및 자세 정보를 센싱하는 경우, k 값은 총 24가 되며, 인수분해를 통해 매일 시간마다 행동 및 자세의 빈도수에 대한 제2 행렬(412) 및 제3 행렬(413)을 생성할 수 있다. 이렇게 학습된 모델에서 동일한 클래스 레이블에 대하여 그룹(group)하여 정리하면 도 3과 같이 각 의도에 대한 행동, 자세를 모을 수 있게 된다. Referring to FIG. 4 , a model having a value of a class label for data can be trained by learning the first matrix 411 of M x N according to a feature vector corresponding to time k through the matrix factorization method. there is. Here, M means the number of item values of the first item, and N means the number of item values of the second item. For example, M may mean the total number of behavior information items of the elderly, and N may mean the total number of posture information items of the elderly. If the behavior information and posture information of the elderly are sensed every hour a day, the value of k becomes 24 in total, and the second matrix 412 and the third matrix for the frequency of the behavior and posture every hour through factorization (413) can be created. In this learned model, if the same class label is grouped and organized, actions and postures for each intention can be collected as shown in FIG. 3 .

한편, 본 발명의 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 방법에서 제1 행렬은 사용자에 따라 달라질 수 있다. 예를 들어, 각 고령자마다 시간에 따라 행동 및 자세에 대한 빈도수가 달라질 수 있으므로, 본 발명의 일 실시예예 따른 방법은 사용자별로 제1 행렬을 생성하고 인수분해할 수 있다. Meanwhile, in the data enlargement method based on the matrix factorization technique according to an embodiment of the present invention, the first matrix may vary depending on the user. For example, since the frequency of actions and postures may vary according to time for each elderly person, the method according to an embodiment of the present invention may generate and factor the first matrix for each user.

도 5는 고령자의 행위 의도를 학습하는 딥러닝 알고리즘을 위해 본 발명의 일 실시예에 따라 특징 벡터를 추출하는 방법을 설명하기 위한 도면이다.5 is a diagram for explaining a method of extracting a feature vector according to an embodiment of the present invention for a deep learning algorithm for learning the behavioral intention of an elderly person.

본 발명의 일 실시예에 따른 방법에서 제1 행렬 및 인수분해 기법에 기초하여 각 의도에 대하여 관련있는 행동과 자세 정보가 추출 되었다면, 해당 정보를 통해 데이터 확대(data augmentation)를 수행할 수 있다. 다시말해 각 제1 행렬을 인수분해하여, 제2 행렬 및 제 3행렬을 생성하였다면, 제2 행렬 및 제3 행렬에 기초하여 데이터 확대를 수행할 수 있다.In the method according to an embodiment of the present invention, if behavior and posture information related to each intention are extracted based on the first matrix and the factorization technique, data augmentation may be performed using the corresponding information. In other words, if the second matrix and the third matrix are generated by factoring each first matrix, data expansion may be performed based on the second matrix and the third matrix.

예를 들어, 고령자의 행위 의도를 학습하는 딥러닝 알고리즘의 경우, 하루 동안 발생한 고령자의 행동과 자세 정보로부터 의도에 대한 수집 데이터가 입력될 수 있으며, 상기 수집 데이터는 발생한 시간 순서에 따라 의도의 순서가 유지되어야 한다. 이에 따라, 도 5와 같이 해당 고령자의 행위 의도는 순서에 맞게 유지가 되어야 하며, 각 행위 의도에는 행동과 자세에 대한 정보가 포함되어야 한다. 따라서 데이터 확대를 위해서는 먼저 행렬 인수분해를 통해 각 행위 의도에서 발생하는 행동과 자세에 관한 행렬들을 획득한 후, 행렬들에 기초하여 결정된 도 3을 참조하여 관련 있는 다른 행동 및 자세에 대한 데이터로 원래 수집된 데이터를 치환할 수 있다.For example, in the case of a deep learning algorithm for learning the behavioral intentions of an elderly person, collected data about the intention may be input from the behavior and posture information of the elderly person that occurred during the day, and the collected data is the order of the intention according to the time sequence in which it occurred should be maintained Accordingly, as shown in FIG. 5 , the behavioral intentions of the elderly should be maintained in order, and information on behaviors and postures should be included in each behavioral intention. Therefore, in order to expand the data, first obtain matrices related to actions and postures occurring in each action intention through matrix factorization, and then refer to FIG. Collected data can be replaced.

도 6은 고령자의 행위 의도를 학습하는 딥러닝 알고리즘을 위해 본 발명의 일 실시예에 따라 데이터를 치환하는 방법을 설명하기 위한 도면이다.6 is a diagram for explaining a method of substituting data according to an embodiment of the present invention for a deep learning algorithm for learning the behavioral intention of an elderly person.

도 6의 (a)는 고령자의 행위 의도를 학습하기 위해 센서를 통해 수집한 데이터를 나타낸 것이고, 도 6의 (b)는 동일한 행위 의도와 대응되는 여러 행동 정보를 나타낸 것이다. 도 6의 (c)는 도 6의 (a) 및 (b)에 따라 데이터를 치환한 결과를 나타낸 것이다.Fig. 6 (a) shows data collected through a sensor to learn the behavioral intention of the elderly, and Fig. 6 (b) shows various behavioral information corresponding to the same behavioral intention. Figure 6 (c) shows the result of replacing the data according to Figure 6 (a) and (b).

도 6의 (a)를 참고하면, 해당 고령자가 "식사하기"라는 행위 의도(620)를 가질 때 센싱된 행동 정보(610)는 "가스레인지 음식 데우기"를 의미할 수 있다. Referring to FIG. 6A , when the corresponding elderly person has an action intention 620 of "eating", the sensed action information 610 may mean "warming up food on the gas stove."

이러한 경우, 일 실시예에 따라 행렬 인수분해을 수행하여, "식사하기" 의도와 관련 있는 행동 정보를 도 6의 (b)와 같이 추출할 수 있다. 다시 말해, 도 6의 (a)에 포함된 정보들로 구성된 제1 행렬을 인수분해함으로써 제2 행렬 및 제3 행렬을 생성하면, 이에 따라 동일한 제3 항목값(630)을 갖는 제1 항목의 항목값들을 결정할 수 있다.In this case, by performing matrix factorization according to an embodiment, behavioral information related to the intention of "eating" may be extracted as shown in FIG. 6(b). In other words, when the second matrix and the third matrix are generated by factoring the first matrix composed of the information included in (a) of FIG. You can determine the item values.

이후 도 6의 (c)와 같이 행동 정보를 동일한 행위 위도를 갖는 다른 행동 정보로 치환할 수 있다. 예를 들어, "가스레인지 음식 데우기"라는 행동 정보는 "식사하기" 의도와 관련 있으므로, "식사하기" 의도와 관련이 있는 또 다른 행동 정보인 "물/음료 마시기"(640)로 행동 정보가 치환될 수 있다. 이러한 데이터 치환 과정을 통해 데이터는 확대될 수 있다.Thereafter, as shown in (c) of FIG. 6 , the behavior information may be substituted with other behavior information having the same behavior latitude. For example, since the behavioral information “warming food on the gas stove” is related to the “eat” intention, the behavioral information is transferred to “drink water/drink” (640), which is another behavioral information related to the “eat” intention. may be substituted. Data can be expanded through this data substitution process.

도 7은 본 발명의 일 실시예에 따른 데이터 확대 장치를 설명하기 위한 블록도이다.7 is a block diagram illustrating an apparatus for enlarging data according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 행렬 인수분해 기법에 기초한 데이터 확대 장치(700)는 딥러닝 알고리즘에 입력되는 데이터를 확대하는 장치로서, 전처리부(710), 행렬 인수분해부(720) 및 데이터 치환부(730)를 포함할 수 있으며, 도 7에서는 상술된 구성요소만을 포함하는 것으로 도시되었으나, 다른 구성요소도 포함될 수 있음은 당해 기술분야의 통상의 기술자에게 자명한 사실이다. Data enlargement apparatus 700 based on a matrix factorization technique according to an embodiment of the present invention is an apparatus for enlarging data input to a deep learning algorithm, and includes a preprocessor 710 , a matrix factorization unit 720 , and data substitution. The unit 730 may be included, and although illustrated as including only the above-described components in FIG. 7 , other components may also be included, which is apparent to those skilled in the art.

전처리부(710)는 사용자에 관한 제1 항목과 제2 항목에 대한 데이터가 수집되면, 제1 항목 및 제2 항목에 관한 제1 행렬을 생성할 수 있고, 이때 제1 행렬의 각 요소 값은 제1 항목의 항목값과 제2 항목의 항목값의 조합에 대응하는 제3 항목의 항목값일 수 있다.When data on the first and second items about the user are collected, the preprocessor 710 may generate a first matrix about the first and second items, wherein each element value of the first matrix is It may be an item value of the third item corresponding to a combination of the item value of the first item and the item value of the second item.

여기서 제1 행렬은 사용자에 따라 달라지는 것일 수 있다. Here, the first matrix may vary according to a user.

또한, 제1 항목과 제2 항목에 대한 데이터는, 센서를 통해 수집된 데이터일 수 있다.Also, the data for the first item and the second item may be data collected through a sensor.

행렬 인수분해부(720)는 제1 행렬을 인수분해하여 제1 항목을 행으로 갖는 제2 행렬과 제2 항목을 열로 갖는 제3 행렬을 생성할 수 있다.The matrix factorization unit 720 may factorize the first matrix to generate a second matrix having the first item as a row and a third matrix having the second item as a column.

여기서, 제1 항목은 사용자의 행동을 분류한 항목이고, 제2 항목은 사용자의 자세 정보를 분류한 항목이고, 제3 항목은 사용자의 행위 의도를 분류한 항목일 수 있다. 이때 만약 제2 행렬의 열과 제3 행렬의 행이 모두 시간에 관한 것인 경우, 제2 행렬은 시간에 따른 사용자의 행동 빈도를 나타내는 것이고, 제3 행렬은 시간에 따른 사용자의 자세 빈도를 나타내는 것일 수 있다.Here, the first item may be an item obtained by classifying the user's behavior, the second item may be an item obtained by classifying the user's posture information, and the third item may be an item obtained by classifying the user's action intention. At this time, if both the columns of the second matrix and the rows of the third matrix relate to time, the second matrix represents the frequency of user behavior over time, and the third matrix represents the frequency of the user's posture over time. can

데이터 치환부(730)는 제1 항목의 항목값 및 제2 항목의 항목값 중 적어도 하나를 제3 항목에 기초하여 치환할 수 있다.The data replacement unit 730 may replace at least one of the item value of the first item and the item value of the second item based on the third item.

여기서 데이터 치환부(730)는 제1 항목의 항목값인 제1 데이터를 치환하는 경우, 제1 데이터와 대응하는 제3 항목의 항목값에 기초하여 제2 데이터를 결정하고, 제1 데이터 대신 제2 데이터를 제1 항목의 항목값으로 치환할 수 있다. 이때, 제1 데이터와 대응되는 제3 항목의 항목값은 제2 데이터와 대응되는 제3 항목의 항목값과 일치할 수 있다. 또한, 데이터 치환부(730)는 제2 데이터를 결정할 때 시간 정보 및 제2 항목의 항목값을 추가로 고려하여 제2 데이터를 결정할 수 있다.Here, when the data replacement unit 730 replaces the first data that is the item value of the first item, the data replacement unit 730 determines the second data based on the item value of the third item corresponding to the first data, and replaces the first data with the second data. 2 data may be substituted with the item value of the first item. In this case, the item value of the third item corresponding to the first data may match the item value of the third item corresponding to the second data. Also, when determining the second data, the data replacement unit 730 may determine the second data by additionally considering the time information and the item value of the second item.

만약 데이터 치환부(730)가 제2 항목의 항목값인 제3 데이터를 치환하는 경우, 데이터 치환부(730)는 제3 데이터와 대응하는 제3 항목의 항목값에 기초하여 제4 데이터를 결정하고, 제3 데이터 대신 제4 데이터를 제2 항목의 항목값으로 치환할 수 있다. 이때, 제3 데이터와 대응되는 제3 항목의 항목값은 제4 데이터와 대응되는 제3 항목의 항목값과 일치할 수 있다. 또한, 데이터 치환부(730)는 제4 데이터를 결정할 때 시간 정보 및 제1 항목의 항목값을 추가로 고려하여 제4 데이터를 결정할 수 있다.If the data replacement unit 730 replaces the third data that is the item value of the second item, the data replacement unit 730 determines the fourth data based on the item value of the third item corresponding to the third data and the fourth data may be substituted with the item value of the second item instead of the third data. In this case, the item value of the third item corresponding to the third data may be identical to the item value of the third item corresponding to the fourth data. Also, when determining the fourth data, the data replacement unit 730 may determine the fourth data by additionally considering the time information and the item value of the first item.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 사람이라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and various modifications and variations may be made by those of ordinary skill in the art to which the present invention pertains without departing from the essential characteristics of the present invention. Accordingly, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

Claims

In the data expansion method based on the matrix factorization technique,
when data on the first and second items about the user are collected, generating a first matrix for the first and second items, wherein each element value of the first matrix is the value of the first item an item value of the third item corresponding to the combination of the item value and the item value of the second item;
factoring the first matrix to generate a second matrix having the first items as rows and a third matrix having the second items as columns; and
substituting at least one of the item value of the first item and the item value of the second item based on the third item;
includes,
The first item is an item classified the user's behavior, the second item is an item classified the user's posture information, and the third item is an item classified the user's action intention How to zoom in on your data.

According to claim 1,
The method of enlarging data, characterized in that the data for the first item and the second item are data collected through a sensor.

delete

The method of claim 1,
When the columns of the second matrix and the rows of the third matrix are both related to time,
wherein the second matrix represents the frequency of the user's behavior over time, and the third matrix represents the frequency of the user's posture over time.

According to claim 1,
The method of enlarging data, characterized in that the first matrix varies according to a user.

The method of claim 1,
substituting at least one of the item value of the first item and the item value of the second item based on the third item,
determining second data based on the item value of the third item corresponding to the first data when replacing first data that is the item value of the first item; and
replacing the second data with an item value of the first item instead of the first data;
A method of enlarging data, comprising:

7. The method of claim 6,
The method of enlarging data, characterized in that the item value of the third item corresponding to the first data is identical to the item value of the third item corresponding to the second data.

7. The method of claim 6,
The method of enlarging data, wherein the determining of the second data comprises determining the second data by additionally considering time information and an item value of the second item.

The method of claim 1,
substituting at least one of the item value of the first item and the item value of the second item based on the third item,
determining fourth data based on the item value of the third item corresponding to the third data when third data that is the item value of the second item is replaced; and
replacing the fourth data with the item value of the second item instead of the third data;
A method of enlarging data, comprising:

10. The method of claim 9,
The method of enlarging data, characterized in that the item value of the third item corresponding to the third data coincides with the item value of the third item corresponding to the fourth data.

10. The method of claim 9,
The method of enlarging data, characterized in that the determining of the fourth data comprises determining the fourth data by additionally considering time information and the item value of the first item.

In the data expansion device based on the matrix factorization technique,
When data on the first item and the second item about the user are collected, a preprocessor generating a first matrix regarding the first item and the second item - Each element value of the first matrix is the first item is the item value of the third item corresponding to the combination of the item value of the second item and the item value of the second item;
a matrix factorization unit for factoring the first matrix to generate a second matrix having the first item as a row and a third matrix having the second item as a column; and
a data replacement unit for replacing at least one of the item value of the first item and the item value of the second item based on the third item;
includes,
The first item is an item classified the user's behavior, the second item is an item classified the user's posture information, and the third item is an item classified the user's action intention data enlargement device.

A computer-readable storage medium storing a computer program in which the method according to any one of claims 1, 2 and 4 to 11 is performed when the computer program is executed by a processor.