KR20200044189A

KR20200044189A - System and Method for Preprocessing and Data Set Augmentation for training AIwith 3D Data Processing

Info

Publication number: KR20200044189A
Application number: KR1020180119819A
Authority: KR
Inventors: 김현진; 김동섭; 송지원
Original assignee: 단국대학교 산학협력단
Priority date: 2018-10-08
Filing date: 2018-10-08
Publication date: 2020-04-29
Also published as: KR102194303B1

Abstract

The present invention relates to an apparatus and a method for data set expansion generation and preprocessing for artificial intellectual (AI) training used in a three-dimensional (3D) data processing, capable of enhancing the utilization of data and the precision of an output result by generating a larger number of data sets for training AI from a smaller number of data. The apparatus includes: a data input unit to receive a first number of 3D data; a sub-data selecting unit to randomly select sub-data from one piece of data including several pieces of sub-data within a predetermined range; a sub-data sequence generating unit to perform expansion and generation of 3D data in second number larger than the first number by generating a sequence of the sub-data; a motion history image (MHI) two-dimensional (2D) data generating unit to transform a plurality of 3D data, which is generated in the sub-data sequence generating unit, into two-dimensional (2D) data through an MHI procedure; and a machine learning processing unit to extract a feature map by extracting a feature of the 2D data by receiving, as an input, the 2D of the MHI 2D data generating unit, applying a filter to the input, outputting a feature map having a data amount reduced by extracting representative values, and connecting all made features with each other for learning the same.

Description

System and Method for Preprocessing and Data Set Augmentation for training ＡＩwith 3D Data Processing}

본 발명은 기계학습을 위한 시퀀스 데이터 생성에 관한 것으로, 구체적으로 소수의 데이터로부터 인공지능의 학습을 위한 다수의 데이터 셋을 생성하여 데이터의 활용성 및 출력 결과의 정밀도를 높일 수 있도록 한 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치 및 방법에 관한 것이다.The present invention relates to the generation of sequence data for machine learning. Specifically, 3D data processing to increase the usability of data and the accuracy of output results by generating multiple data sets for AI learning from a small number of data It relates to a device and method for data set extension generation and pre-processing for AI training used in.

머신 러닝(machine learning) 환경에서, 트레이닝 입력(training input)과 트레이닝 레이블(training label)을 포함하는 트레이닝 데이터가 학습 함수(learned function)를 결정하는 데에 사용될 수 있다.In a machine learning environment, training data, including training inputs and training labels, can be used to determine a learned function.

학습 함수는 트레이닝 입력과 트레이닝 레이블 사이의 관계를 나타내는 데에 효과가 있을 수 있다.The learning function can be effective in representing the relationship between the training input and the training label.

학습 함수는 머신 러닝 시스템에 활용될 수 있다. 머신 러닝 시스템은 테스트 입력을 수신할 수 있고 학습 함수를 테스트 입력에 적용하여 테스트 레이블을 생산할 수 있다.The learning function can be utilized in a machine learning system. Machine learning systems can receive test inputs and apply learning functions to test inputs to produce test labels.

또한, 이미지 프로세싱에서 많은 애플리케이션들은 상이한 이미지 영역들에 대해 리소스들의 유동적인 할당을 요구한다.In addition, many applications in image processing require flexible allocation of resources for different image regions.

예를 들어, 압축 파라미터들이 이미지의 특정 특성들에 기초하여 선택될 수 있거나, 또는 이미지 영역들은 전송 신뢰도와 효율성 사이에서 최적의 트레이드-오프를 달성하기 위해 유동적인 에러 정정의 대상이 될 수 있다.For example, compression parameters may be selected based on certain characteristics of the image, or image regions may be subject to flexible error correction to achieve an optimal trade-off between transmission reliability and efficiency.

특정 이미지 영역에 할당되어야 하는 리소스들의 양을 결정하기 위해 이미지 영역들의 관련 레벨들을 자동으로 식별하는 것은 중요하고 그리고 그러한 알고리즘을 실행하는 것은 또한 가치 있는 CPU 시간을 요구할 수 있다.It is important to automatically identify the relevant levels of image regions to determine the amount of resources that should be allocated to a particular image region, and executing such an algorithm can also require valuable CPU time.

이는, 예를 들어, 많은 프로그램들이 모니터링 카메라와 같은 임베딩된 플랫폼에서 제한된 리소스들을 위해 경쟁하는 경우 문제들을 야기할 수 있다.This can cause problems if, for example, many programs compete for limited resources on an embedded platform such as a monitoring camera.

특히, 인공지능 학습을 위해서는 기존에 수집된 데이터와 이의 분류 결과를 가지고 학습을 수행하여야 한다.In particular, for artificial intelligence learning, it is necessary to perform learning with existing collected data and classification results.

이 경우 데이터 셋을 확보하기 위하여 공공기관, 인터넷 포탈과 같이 다수의 데이터 수집이 용이한 단체나 회사에서 데이터를 수집할 수 밖에 없다.In this case, in order to secure a data set, it is inevitable to collect data from organizations or companies that can easily collect multiple data, such as public institutions and Internet portals.

이는 데이터 수집의 비용이 많이 들고 기존에 많은 데이터가 수집된 상태가 아니거나 불가능한 경우 학습에 충분한 데이터의 개수를 확보할 수 없다.It is expensive to collect data, and if a large amount of data is not already collected or impossible, it is impossible to secure a sufficient number of data for learning.

또한, 시퀀스 데이터는 하나의 데이터 셋의 용량이 크고 그 형태가 다양하기 때문에 충분한 데이터를 확보하기 어렵다.In addition, it is difficult to secure sufficient data for sequence data because the capacity of one data set is large and the form is various.

이러한 시퀀스 데이터로 대표적인 예는 동영상이다.A representative example of such sequence data is a video.

동영상의 경우 여러개의 이미지인 프레임(frame)이 연결되어 이를 순차적으로 빠르게 화면에 주사함으로써 사람은 동영상으로 인식하게 된다.In the case of a video, a frame, which is an image, is connected, and this is sequentially and rapidly scanned on a screen, so that a person recognizes it as a video.

궁극적으로 프레임이라는 서브 데이터가 순서대로 연결된 시퀀스가 된다.Ultimately, a sub data called a frame becomes a sequence connected in order.

이러한 시퀀스 데이터는 수집, 저장 등이 쉽지않다는 특징이 있다. Such sequence data has a feature that it is not easy to collect and store.

따라서, 소수의 데이터으로부터 인공지능의 학습을 위한 다수의 데이터 셋을 생성하는 새로운 기술의 개발이 요구되고 있다.Accordingly, there is a need to develop a new technology that generates a large number of data sets for AI learning from a small number of data.

대한민국 공개특허 제10-2018-0037593호Republic of Korea Patent Publication No. 10-2018-0037593 대한민국 공개특허 제10-2018-0035633호Republic of Korea Patent Publication No. 10-2018-0035633 대한민국 공개특허 제10-2017-0006281호Republic of Korea Patent Publication No. 10-2017-0006281

본 발명은 종래 기술의 기계학습을 위한 시퀀스 데이터 생성의 문제점을 해결하기 위한 것으로, 소수의 데이터로부터 인공지능의 학습을 위한 다수의 데이터 셋을 생성하여 데이터의 활용성 및 출력 결과의 정밀도를 높일 수 있도록 한 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention is to solve the problem of generating sequence data for machine learning in the prior art, it is possible to increase the usability of data and the precision of the output result by generating multiple data sets for AI learning from a small number of data. The aim is to provide an apparatus and method for data set extension generation and preprocessing for AI training used in 3D data processing.

본 발명은 시퀀스 데이터를 샘플링하여 다수의 독립적인 데이터를 생성 확보해서 소수의 시퀀스 데이터로부터 학습에 필요한 다수의 데이터를 확보할 수 있도록 한 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention is a data set extension generation and pre-processing for AI training used in 3D data processing so that a plurality of independent data can be secured by sampling sequence data and generating a plurality of independent data. It is an object to provide an apparatus and method for the purpose.

본 발명은 하나의 데이터가 여러 개의 서브 데이터로 이루어져 있을 때, 이 중 정해진 범위 내에서 랜덤하게 서브 데이터를 선택하여 서브 데이터의 시퀀스를 생성하여 생성된 서브 데이터 시퀀스 각각이 학습을 위한 데이터로 사용될 수 있도록 한 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치 및 방법을 제공하는데 그 목적이 있다.In the present invention, when one data is composed of several sub-data, each sub-data sequence generated by randomly selecting sub-data within a predetermined range to generate a sequence of sub-data can be used as data for learning. The aim is to provide an apparatus and method for data set extension generation and preprocessing for AI training used in 3D data processing.

본 발명은 소수의 데이터로부터 인공지능의 학습을 위한 다수의 데이터 셋을 생성하여 데이터 수집 비용을 줄이고, 많은 데이터가 수집된 상태가 아니거나 불가능한 경우에도 학습에 필요한 충분한 데이터의 개수를 확보할 수 있도록 한 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention reduces the cost of data collection by generating a large number of data sets for learning AI from a small number of data, and ensures that a sufficient number of data necessary for learning can be secured even when a lot of data is not collected or impossible. The aim is to provide an apparatus and method for generating and preprocessing data set extensions for AI training used in 3D data processing.

본 발명의 다른 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.Other objects of the present invention are not limited to those mentioned above, and other objects not mentioned will be clearly understood by those skilled in the art from the following description.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치는 제 1 개수의 3D 데이터를 입력받는 데이터 입력부;여러 개의 서브 데이터로 이루어진 하나의 데이터에서 정해진 범위 내에서 랜덤하게 서브 데이터를 선택하는 서브 데이터 선택부;서브 데이터의 시퀀스를 생성하여 제 1 개수보다 많은 제 2 개수의 3D 데이터로 확장 생성하는 서브 데이터 시퀀스 생성부;상기 서브 데이터 시퀀스 생성부에서 생성된 다수의 3D 데이터를 MHI(motion history image) 과정을 거쳐 2D 데이터로 변환하는 MHI 2D 데이터 생성부;상기 MHI 2D 데이터 생성부의 2D 데이터를 입력으로 받고 필터를 적용하여 2D 데이터의 특징을 추출하여 특징맵(feature map) 추출을 하고, 대표적인 값들을 추출하여 데이터 량을 줄인 특징맵을 출력하고 만들어진 특징들을 모두 연결하여 학습하는 기계학습 처리부;를 포함하는 것을 특징으로 한다.An apparatus for generating and preprocessing a data set for AI training used for 3D data processing according to the present invention for achieving the above object is a data input unit receiving a first number of 3D data; consisting of several sub-data A sub-data sequence generator that randomly selects sub-data within a predetermined range from one data; a sub-data sequence generator that generates a sequence of sub data and expands and generates a second number of 3D data that is greater than the first number; the sub MHI 2D data generator that converts a large number of 3D data generated by the data sequence generator into 2D data through a motion history image (MHI) process; receives the 2D data of the MHI 2D data generator as input and applies filters to 2D data Feature maps are extracted by extracting the features of. Machine learning processing for learning output by the characteristic maps with reduced amount, and connect all the features created; in that it comprises the features.

여기서, 하나의 데이터가 여러 개의 서브 데이터로 이루어져 있을 때, 이 중 정해진 범위 내에서 랜덤하게 서브 데이터를 선택하여 서브 데이터의 시퀀스를 생성하여 생성된 서브 데이터 시퀀스 각각이 학습을 위한 데이터로 사용될 수 있도록 하는 것을 특징으로 한다.Here, when one data is composed of several sub data, a sub data sequence is generated by randomly selecting sub data within a predetermined range so that each generated sub data sequence can be used as data for learning. It is characterized by.

그리고 상기 서브 데이터 선택부에서 x,y,z로 이루어진 3D Data가 입력되면 t축의 크기가 k+1이고, 간격(range)가 n이고 이 중 하나의 데이터를 랜덤하게 선택하고, A는

보다 작거나 같은 최대 정수라 하면, 서브 데이터 시퀀스 생성부에서

개의 서브 데이터 시퀀스 생성을 하는 것을 특징으로 한다.And when 3D Data consisting of x, y, z is input from the sub-data selection unit, the size of the t-axis is k + 1, the range is n, and one of them is randomly selected, and A is

If it is less than or equal to the maximum integer, the sub data sequence generation unit

Characterized in that it generates a sequence of sub data.

그리고 서브 데이터 시퀀스 생성부는,

으로 이루어진 3D Data를

과 같이 n개 간격마다 하나씩 랜덤 추출하여 새로운 3D Data들을 생성하는 것을 특징으로 한다.And the sub data sequence generation unit,

3D Data consisting of

It is characterized by generating new 3D data by randomly extracting one at every n intervals.

그리고 생성된 서브 3D Data의 t축의 길이는 A이고, t에 따라 순차적인

로 구성되는 것을 특징으로 한다.And the length of the t-axis of the generated sub 3D Data is A, and the sequential

It is characterized by consisting of.

그리고 생성된 데이터들을 t의 진행에 따라 (x,y)의 모션 그레디언트인 t의 변화에 따른 움직임 변화를 측정한 후, 이를 t의 진행 순서에 따라 픽셀의 밝기 정도의 정보를 이용하여 전후 관계 정보를 가지는 하나의 MHI 2D Data로 만드는 것을 특징으로 한다.Then, after the generated data is measured according to the change in t, which is the motion gradient of (x, y) as the progress of t, the relationship between the front and rear information using the information of the brightness level of the pixel according to the proceeding order of t Characterized in that it is made of one MHI 2D Data having.

그리고 기계학습 처리부는, 2D Data를 입력으로 받고 필터를 적용하여 2D Data의 특징을 추출하여 특징맵을 만드는 C 레이어(Convolutional layer),C 레이어(Convolutional layer)에서 추출된 특징맵 내에서 대표적인 값들을 추출하여 데이터 량을 줄인 특징맵을 출력하는 S 레이어(subsampling layer), C 레이어(Convolutional layer) 및 S 레이어(subsampling layer)를 통하여 만들어진 특징들을 모두 연결하여 학습시키는 FC 레이어(Fully Connected layer)를 포함하는 것을 특징으로 한다.And the machine learning processing unit receives 2D data as input, applies filters and extracts the characteristics of 2D data to create characteristic maps, and generates representative values from feature maps extracted from C layer (Convolutional layer) and C layer (Convolutional layer). Includes an S layer (subsampling layer), C layer (Convolutional layer) and S layer (subsampling layer), which output feature maps that reduce the amount of data by extracting them. It is characterized by.

다른 목적을 달성하기 위한 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 방법은 제 1 개수의 3D 데이터를 입력받는 데이터 입력 단계;여러 개의 서브 데이터로 이루어진 하나의 데이터에서 정해진 범위 내에서 랜덤하게 서브 데이터를 선택하는 서브 데이터 선택 단계;서브 데이터의 시퀀스를 생성하여 제 1 개수보다 많은 제 2 개수의 3D 데이터로 확장 생성하는 서브 데이터 시퀀스 생성 단계;상기 서브 데이터 시퀀스 생성 단계에서 생성된 다수의 3D 데이터를 MHI(motion history image) 과정을 거쳐 2D 데이터로 변환하는 MHI 2D 데이터 생성 단계;를 포함하는 것을 특징으로 한다.A method for data set extension generation and pre-processing for AI training used in 3D data processing according to the present invention for achieving another object includes a data input step of receiving a first number of 3D data; one consisting of multiple sub-data A sub data selection step of randomly selecting sub data within a predetermined range from the data of the sub data sequence generating step of generating a sequence of sub data and expanding and generating a second number of 3D data larger than the first number; It characterized in that it comprises a; MHI 2D data generation step of converting a plurality of 3D data generated in the sequence generation step to 2D data through a MHI (motion history image) process.

여기서, 상기 MHI 2D 데이터 생성 단계의 2D 데이터를 입력으로 받고 필터를 적용하여 2D 데이터의 특징을 추출하여 특징맵(feature map) 추출을 하고, 대표적인 값들을 추출하여 데이터 량을 줄인 특징맵을 출력하고 만들어진 특징들을 모두 연결하여 학습하는 기계학습 처리 단계;를 더 포함하는 것을 특징으로 한다.Here, receiving the 2D data of the MHI 2D data generation step as an input, applying a filter to extract the features of the 2D data, extracting a feature map, extracting representative values and outputting a feature map that reduces the amount of data, It characterized in that it further comprises a; machine learning processing step of learning by connecting all the created features.

그리고 상기 서브 데이터 선택 단계에서 x,y,z로 이루어진 3D Data가 입력되면 t축의 크기가 k+1이고, 간격(range)가 n이고 이 중 하나의 데이터를 랜덤하게 선택하고, A는

보다 작거나 같은 최대 정수라 하면, 서브 데이터 시퀀스 생성 단계에서

개의 서브 데이터 시퀀스 생성을 하는 것을 특징으로 한다.And when the 3D Data consisting of x, y, z is input in the sub-data selection step, the size of the t-axis is k + 1, the range is n, and one of them is randomly selected, and A is

If it is less than or equal to the maximum integer, the sub data sequence generation step

Characterized in that it generates a sequence of sub data.

그리고 서브 데이터 시퀀스 생성 단계는,

으로 이루어진 3D Data를

과 같이 n개 간격마다 하나씩 랜덤 추출하여 새로운 3D Data들을 생성하는 것을 특징으로 한다.And the sub data sequence generation step,

3D Data consisting of

It is characterized by consisting of.

이상에서 설명한 바와 같은 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치 및 방법은 다음과 같은 효과가 있다.The apparatus and method for data set extension generation and preprocessing for AI training used for 3D data processing according to the present invention as described above has the following effects.

첫째, 소수의 데이터로부터 인공지능의 학습을 위한 다수의 데이터 셋을 생성하여 데이터의 활용성 및 출력 결과의 정밀도를 높일 수 있도록 한다.First, a plurality of data sets for artificial intelligence learning are generated from a small number of data, so that the usability of the data and the precision of the output result can be increased.

둘째, 시퀀스 데이터를 샘플링하여 다수의 독립적인 데이터를 생성 확보해서 소수의 시퀀스 데이터로부터 학습에 필요한 다수의 데이터를 효과적으로 확보할 수 있도록 한다.Second, it is possible to effectively secure a large number of data necessary for learning from a small number of sequence data by sampling the sequence data and generating and securing a large number of independent data.

셋째, 하나의 데이터가 여러 개의 서브 데이터로 이루어져 있을 때, 이 중 정해진 범위 내에서 랜덤하게 서브 데이터를 선택하여 서브 데이터의 시퀀스를 생성하여 생성된 서브 데이터 시퀀스 각각이 학습을 위한 데이터로 사용될 수 있도록 한다.Third, when one data is composed of several sub data, a sub data sequence is generated by randomly selecting sub data within a predetermined range so that each generated sub data sequence can be used as data for learning. do.

넷째, 소수의 데이터로부터 인공지능의 학습을 위한 다수의 데이터 셋을 생성하여 데이터 수집 비용을 줄이고, 많은 데이터가 수집된 상태가 아니거나 불가능한 경우에도 학습에 필요한 충분한 데이터의 개수를 확보할 수 있도록 한다.Fourth, it is possible to reduce the data collection cost by generating a large number of data sets for AI learning from a small number of data, and to secure a sufficient number of data necessary for learning even when a lot of data is not collected or impossible. .

도 1은 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리 과정을 나타낸 구성도
도 2는 본 발명에 데이터 셋 확장 생성과 전처리 과정을 적용한 일 예를 나타낸 구성도
도 3은 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치의 구성도
도 4는 본 발명에 따른 서브 데이터 시퀀스 생성의 일 예를 나타낸 구성도
도 5는 본 발명에 적용되는 인공 지능 학습의 일 예를 나타낸 구성도
도 6은 본 발명에 적용되는 인공 지능 학습 과정에서의 필터 적용의 일 예를 나타낸 구성도
도 7은 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 방법을 나타낸 플로우 차트1 is a configuration diagram showing a data set extension generation and pre-processing for AI training used in 3D data processing according to the present invention
Figure 2 is a block diagram showing an example of applying the data set extension generation and pre-processing process in the present invention
3 is a configuration diagram of an apparatus for generating and preprocessing a data set for AI training used for 3D data processing according to the present invention
4 is a block diagram showing an example of sub data sequence generation according to the present invention
5 is a block diagram showing an example of artificial intelligence learning applied to the present invention
6 is a configuration diagram showing an example of applying a filter in the artificial intelligence learning process applied to the present invention
7 is a flow chart showing a method for data set extension generation and preprocessing for AI training used in 3D data processing according to the present invention

이하, 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치 및 방법의 바람직한 실시 예에 관하여 상세히 설명하면 다음과 같다.Hereinafter, a preferred embodiment of an apparatus and method for generating and preprocessing data sets for AI training used for 3D data processing according to the present invention will be described in detail as follows.

본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치 및 방법의 특징 및 이점들은 이하에서의 각 실시 예에 대한 상세한 설명을 통해 명백해질 것이다.The features and advantages of the apparatus and method for data set extension generation and preprocessing for AI training used for 3D data processing according to the present invention will become apparent through detailed description of each embodiment below.

도 1은 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리 과정을 나타낸 구성도이고, 도 2는 본 발명에 데이터 셋 확장 생성과 전처리 과정을 적용한 일 예를 나타낸 구성도이다.1 is a configuration diagram showing a data set extension generation and preprocessing process for AI training used for 3D data processing according to the present invention, and FIG. 2 is a configuration showing an example of applying the data set extension generation and preprocessing process to the present invention It is.

본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치 및 방법은 하나의 데이터가 여러 개의 서브 데이터로 이루어져 있을 때, 이 중 정해진 범위 내에서 랜덤하게 서브 데이터를 선택하여 서브 데이터의 시퀀스를 생성하여 생성된 서브 데이터 시퀀스 각각이 학습을 위한 데이터로 사용될 수 있도록 한 것이다.The apparatus and method for data set extension generation and pre-processing for AI training used for 3D data processing according to the present invention, when one data consists of several sub data, randomly sub data within a predetermined range among them By selecting and generating a sequence of sub data, each of the generated sub data sequences can be used as data for learning.

이와 같은 본 발명은 소수의 데이터로부터 인공지능의 학습을 위한 다수의 데이터 셋을 생성하여 데이터 수집 비용을 줄이고, 많은 데이터가 수집된 상태가 아니거나 불가능한 경우에도 학습에 필요한 충분한 데이터의 개수를 확보할 수 있도록 한다.The present invention reduces the cost of data collection by generating a large number of data sets for learning AI from a small number of data, and secures a sufficient number of data necessary for learning even when a lot of data is not collected or impossible. Make it possible.

이를 위하여 본 발명은 도 1에서와 같이 소수의 3D 데이터를 다수의 3D 데이터로 확장 생성하는 구성 및 생성된 다수의 3D 데이터를 MHI(motion history image) 과정을 거쳐 2D 데이터로 변환하는 구성을 포함할 수 있다.To this end, the present invention includes a configuration for expanding and generating a small number of 3D data into a plurality of 3D data as shown in FIG. 1 and a configuration for converting the generated 3D data into 2D data through a motion history image (MHI) process. You can.

MHI(Motion History Image) 처리는 도 2에서와 같이, 정적 이미지 템플릿으로 모션 위치와 경로의 진행을 표시한다.MHI (Motion History Image) processing, as shown in FIG. 2, displays the motion location and the progress of the path as a static image template.

MHI 픽셀 강도는 그 위치에서 모션 히스토리의 함수이며, 여기서 더 밝은 값은 보다 최근의 모션에 대응한다.The MHI pixel intensity is a function of the motion history at that location, where a brighter value corresponds to the more recent motion.

MHI로 모션 흐름과 동작을 예측할 수 있는 단일 이미지로 만들 수 있다.With MHI, you can create a single image that predicts motion flow and motion.

본 발명은 동영상 및 기타 시퀀스 데이터에 있어서 서브 데이터들에서 적절한 데이터를 샘플링함으로써 또 다른 시퀀스 데이터를 생성하는 방식으로 다수의 데이터를 확보하는 것이다.The present invention secures a plurality of data in a manner of generating another sequence data by sampling appropriate data from sub data in video and other sequence data.

실제로 서브 데이터는 센서를 통해 입력된 단위 결과물이다.Actually, the sub data is a unit result inputted through a sensor.

원본 데이터내에서 포함하는 모든 서브 데이터는 지능의 영역에서 과다한 경우가 다수 존재한다. 또한 기계적인 센서의 경우 나노초 (nano second)의 영역에서 데이터를 수집하고 전달하게 되는데, 인간을 모방한 인공지능의 영역에서는 눈, 귀와 같은 센서는 시간에는 둔감한 상당한 오차를 가지고 있다.There are many cases where all sub data included in the original data are excessive in the area of intelligence. In addition, in the case of mechanical sensors, data is collected and transmitted in the nanosecond domain. In the domain of artificial intelligence that mimics humans, sensors such as the eyes and ears have significant errors that are insensitive to time.

본 발명은 부분적으로 적절한 샘플링을 통해 강제적으로 인간의 센서에 해당되는 오차를 단순하게 강제로 삽입하여 다수의 데이터를 확보하는 매우 단순한 직관적인 방법을 제시한다.The present invention proposes a very simple and intuitive method to secure a large number of data by simply forcibly inserting an error corresponding to a human sensor through partially appropriate sampling.

또한, 각 서브 데이터들 간의 관계를 분석하여 추가적인 데이터를 추가해 데이터를 가공하여 서브 데이터 시퀀스를 만드는 것도 가능하다. In addition, it is also possible to create a sub data sequence by analyzing the relationship between each sub data and adding additional data to process the data.

이와 같은 본 발명은 충분한 Data set 물량을 확보하기 어려운 문제, 2D가 아닌 3D이므로 처리 속도가 느린 문제, 2D가 아닌 3D이므로 용량이 커서 보관하기가 어려운 문제들을 해결하기 위한 것이다.The present invention is to solve a problem that it is difficult to secure a sufficient data set quantity, a problem that processing speed is slow because it is 3D rather than 2D, and a problem that is difficult to store because it is 3D rather than 2D.

이를 위한 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치는 다음과 같다.An apparatus for generating and preprocessing a data set for AI training used for 3D data processing according to the present invention for this purpose is as follows.

도 3은 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치의 구성도이다.3 is a configuration diagram of an apparatus for generating and preprocessing a data set for AI training used for 3D data processing according to the present invention.

본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치는 정해진 개수의 3D 데이터를 입력받는 데이터 입력부(30)와, 여러 개의 서브 데이터로 이루어진 하나의 데이터에서 정해진 범위 내에서 랜덤하게 서브 데이터를 선택하는 서브 데이터 선택부(31)와, 서브 데이터의 시퀀스를 생성하여 다수의 3D 데이터로 확장 생성하는 서브 데이터 시퀀스 생성부(32)와, 서브 데이터 시퀀스 생성부(32)에서 생성된 다수의 3D 데이터를 MHI(motion history image) 과정을 거쳐 2D 데이터로 변환하는 MHI 2D 데이터 생성부(33)와, MHI 2D 데이터 생성부(33)의 2D 데이터를 입력으로 받고 필터를 적용하여 2D 데이터의 특징을 추출하여 특징맵(feature map) 추출을 하고, 특징맵(feature map) 내에서 대표적인 값들을 추출하여 데이터 량을 줄인 특징맵을 출력하고 만들어진 특징들을 모두 연결하여 학습하는 기계학습 처리부(34)를 포함한다.The apparatus for generating and preprocessing a data set for AI training used for 3D data processing according to the present invention is determined by a data input unit 30 that receives a predetermined number of 3D data and one data consisting of several sub-data. A sub data selection unit 31 for randomly selecting sub data within a range, a sub data sequence generation unit 32 for generating a sequence of sub data and expanding and generating a plurality of 3D data, and a sub data sequence generation unit ( 32) The MHI 2D data generator 33 converts a plurality of 3D data generated in the 3D data into 2D data through a motion history image (MHI) process, and receives 2D data from the MHI 2D data generator 33 as input and filters Applying to extract the features of 2D data to extract a feature map, and extracting representative values from the feature map to reduce the amount of data. And a machine learning processing unit 34 that outputs a feature map and connects and learns all the created features.

이와 같은 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치는 외부로부터 입수하거나 직접 생성한 소수의 x,y,z로 이루어진 3D Data를 1 out of N으로 추출하여 다수의 새로운 Data set들을 확보한다. The apparatus for data set extension generation and pre-processing for AI training used for 3D data processing according to the present invention is a 3D Data consisting of a small number of x, y, z obtained from the outside or directly generated as 1 out of N Extract and secure a number of new data sets.

도 4는 본 발명에 따른 서브 데이터 시퀀스 생성의 일 예를 나타낸 구성도이다.4 is a configuration diagram showing an example of sub data sequence generation according to the present invention.

본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치는 도 4에서와 같이, 소수의 3D Data를 다수의 3D Data로 확장 생성하는 과정(가) 및, 생성된 다수의 3D Data를 MHI(motion history image) 과정을 거쳐 2D Data로 변환하는 과정(나)을 수행한다.The apparatus for generating and pre-processing a data set for AI training used for 3D data processing according to the present invention, as shown in FIG. 4, extends and generates a small number of 3D data into a plurality of 3D data It converts a large number of 3D data into 2D data through MHI (motion history image) process (B).

도 4는 서브 데이터 시퀀스 생성의 예이다.4 is an example of sub data sequence generation.

하나의 데이터가 여러 개의 서브 데이터로 이루어져 있을 때, 이 중 정해진 범위 내에서 랜덤하게 서브 데이터를 선택하여 서브 데이터의 시퀀스를 생성한다. When one data consists of several sub data, a sequence of sub data is generated by randomly selecting the sub data within a predetermined range.

이렇게 생성된 서브 데이터 시퀀스 각각이 학습을 위한 데이터로 사용된다. Each of the generated sub data sequences is used as data for learning.

도 4에서와 같이, t축의 크기가 k+1이고, 간격(range)가 n이고 이 중 하나의 데이터를 랜덤하게 선택한다면 위의 경우는

개의 서브 데이터 시퀀스 생성이 가능하다.(단, A는

보다 작거나 같은 최대 정수)As shown in FIG. 4, if the size of the t-axis is k + 1, the range is n, and one of the data is randomly selected, the above case

It is possible to generate sub data sequences.

Maximum integer less than or equal to)

으로 이루어진 3D Data를

과 같이 n개 간격마다 하나씩 랜덤 추출하여 새로운 3D Data들을 생성한다.

3D Data consisting of

As described above, random 3D data is extracted at every n intervals to generate new 3D data.

이와 같이 새롭게 생성된 서브 3D Data의 t축의 길이는 A이고, t에 따라 순차적인

로 구성된다.The length of the t-axis of the newly generated sub 3D Data is A, and is sequentially according to t.

It consists of.

이 데이터들을 t의 진행에 따라 (x,y)의 모션 그레디언트(t의 변화에 따른 움직임 변화)를 측정한 후, 이를 t의 진행 순서에 따라 픽셀의 밝기 정도의 정보를 이용하여 전후 관계 정보를 가지는 하나의 MHI 2D Data로 만든다.After measuring these data with (x, y) motion gradient as t progresses (movement change with t change), this is used to obtain information about the front-rear relationship using the information of the brightness of the pixels in the order of t progress. Branch is made of one MHI 2D Data.

이와 같이 하나의 MHI 2D Data로 만든 후에 필터를 적용하여 2D 데이터의 특징을 추출하여 특징맵(feature map) 추출을 하고, 특징맵(feature map) 내에서 대표적인 값들을 추출하여 데이터 량을 줄인 특징맵을 출력하고 만들어진 특징들을 모두 연결하여 학습하는 기계학습 처리 단계를 수행한다.In this way, after making a single MHI 2D data, a feature map is extracted by applying a filter to extract feature maps, and feature maps are extracted by extracting representative values within the feature map. And performs the machine learning processing step of learning by connecting all the created features.

도 5는 본 발명에 적용되는 인공 지능 학습의 일 예를 나타낸 구성도이다.5 is a configuration diagram showing an example of artificial intelligence learning applied to the present invention.

도 5에서 C는 Convolutional layer로, 2D Data를 입력으로 받고 filter를 적용하여 2D Data의 특징을 추출하여 feature map이 만들어지는 과정이다.In FIG. 5, C is a convolutional layer, and is a process in which a feature map is generated by extracting features of 2D data by receiving 2D data as an input and applying a filter.

여러 filter들을 적용하여 여러 경우의 feature map들을 추출해낸다.Multiple filters are applied to extract feature maps in various cases.

S는 subsampling layer로, C에서 추출된 feature map 내에서 대표적인 값들로 추려서 데이터 량을 줄인 feature map을 출력한다.S is a subsampling layer, and outputs a feature map that reduces the amount of data by selecting representative values from the feature map extracted from C.

C, S를 통하여 만들어진 특징들을 모두 연결하여 학습시키는 과정이 FC(Fully Connected layer)이다.The process of connecting and learning all the features created through C and S is the FC (Fully Connected layer).

도 6은 본 발명에 적용되는 인공 지능 학습 과정에서의 필터 적용의 일 예를 나타낸 구성도로, Convolutional layer에서 Input Data에 Filter를 적용하여 Input Data의 특징이 포함된 Feature map을 만드는 과정을 나타낸 것이다.FIG. 6 is a block diagram showing an example of applying a filter in an artificial intelligence learning process applied to the present invention, and shows a process of creating a feature map including the characteristics of input data by applying a filter to input data in a convolutional layer.

본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 방법을 구체적으로 설명하면 다음과 같다.The method for generating and preprocessing data set extension for AI training used for 3D data processing according to the present invention will be described in detail as follows.

도 7은 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 방법을 나타낸 플로우 차트이다.7 is a flowchart illustrating a method for data set extension generation and preprocessing for AI training used in 3D data processing according to the present invention.

먼저, 데이터 입력부(30)를 통하여 소수의 3D 데이터를 입력받는다.(S701)First, a small number of 3D data is input through the data input unit 30 (S701).

이어, 서브 데이터 선택부(31)에서 여러 개의 서브 데이터로 이루어진 하나의 데이터에서 정해진 범위 내에서 랜덤하게 서브 데이터를 선택한다.(S702)Subsequently, the sub-data selection unit 31 randomly selects sub-data within a predetermined range from one data composed of several sub-data. (S702)

그리고 서브 데이터 시퀀스 생성부(32)에서 서브 데이터의 시퀀스를 생성하여 다수의 3D Data로 확장 생성한다.(S703)Then, the sub data sequence generation unit 32 generates a sequence of sub data to expand and generate a plurality of 3D data. (S703)

이어, MHI 2D 데이터 생성부(33)에서 생성된 다수의 3D Data를 MHI(motion history image)과정을 거쳐 2D Data로 변환한다.(S704)Subsequently, a plurality of 3D data generated by the MHI 2D data generation unit 33 is converted into 2D data through a motion history image (MHI) process (S704).

그리고 기계학습 처리부(34)에서 2D Data를 입력으로 받고 filter를 적용하여 2D Data의 특징을 추출하여 feature map 추출한다.(S705)Then, the machine learning processing unit 34 receives 2D data as an input, applies a filter, extracts features of the 2D data, and extracts a feature map (S705).

추출된 feature map 내에서 대표적인 값들로 추려서 데이터 량을 줄인 feature map을 출력하고 만들어진 특징들을 모두 연결하여 학습한다.(S706)Select a representative value from the extracted feature map, output a feature map that reduces the amount of data, and connect all the created features to learn. (S706)

이상에서 설명한 본 발명에 따른 본 발명에 따른 3D 데이터 프로세싱에 이용되는 AI 트레이닝을 위한 데이터 셋 확장 생성과 전처리를 위한 장치 및 방법은 하나의 데이터가 여러 개의 서브 데이터로 이루어져 있을 때, 이 중 정해진 범위 내에서 랜덤하게 서브 데이터를 선택하여 서브 데이터의 시퀀스를 생성하여 생성된 서브 데이터 시퀀스 각각이 학습을 위한 데이터로 사용될 수 있도록 한 것이다.The apparatus and method for data set extension generation and pre-processing for AI training used for 3D data processing according to the present invention described above according to the present invention, when one data consists of several sub-data, a defined range of them The sub-data sequence is generated by randomly selecting sub-data from within to generate a sequence of sub-data so that each generated sub-data sequence can be used as data for learning.

이를 통하여 본 발명은 소수의 데이터로부터 인공지능의 학습을 위한 다수의 데이터 셋을 생성하여 데이터 수집 비용을 줄이고, 많은 데이터가 수집된 상태가 아니거나 불가능한 경우에도 학습에 필요한 충분한 데이터의 개수를 확보할 수 있도록 한다.Through this, the present invention reduces the cost of data collection by generating a plurality of data sets for learning AI from a small number of data, and secures a sufficient number of data necessary for learning even when a lot of data is not collected or impossible. Make it possible.

이상에서의 설명에서와 같이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 본 발명이 구현되어 있음을 이해할 수 있을 것이다.It will be understood that the present invention is implemented in a modified form without departing from the essential characteristics of the present invention as described above.

그러므로 명시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 하고, 본 발명의 범위는 전술한 설명이 아니라 특허청구 범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.Therefore, the specified embodiments should be considered in terms of explanation rather than limitation, and the scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the equivalent range are included in the present invention. Should be interpreted.

30. 데이터 입력부 31. 서브 데이터 선택부
32. 서브 데이터 시퀀스 생성부 33. MHI 2D 데이터 생성부
34. 기계학습 처리부30. Data input section 31. Sub data selection section
32. Sub data sequence generator 33. MHI 2D data generator
34. Machine learning processing unit

Claims

A data input unit that receives a first number of 3D data;
A sub-data selector which randomly selects sub-data within a predetermined range from one data composed of several sub-data;
A sub data sequence generation unit generating a sequence of sub data and expanding and generating a second number of 3D data having a greater number than the first number;
An MHI 2D data generator that converts a plurality of 3D data generated by the sub data sequence generator into 2D data through a motion history image (MHI) process;
The MHI 2D data generation unit receives 2D data as an input, applies a filter to extract the features of the 2D data, extracts a feature map, extracts representative values, outputs a feature map that reduces the amount of data, and outputs the created features. Machine learning processing unit for learning by connecting all of them; a device for generating and pre-processing a data set for AI training used in 3D data processing, characterized in that it comprises a.

The method of claim 1, wherein when one data consists of several sub data,
Among them, AI data is used for 3D data processing, characterized in that each sub data sequence generated by selecting sub data randomly within a predetermined range to generate a sequence of sub data can be used as data for learning. Device for creating and preprocessing data set extensions.

According to claim 1, When the 3D Data consisting of x, y, z is input from the sub-data selector, the size of the t-axis is k + 1, the range is n, and one of the data is randomly selected, ,
A is

If it is less than or equal to the largest integer,
In the sub data sequence generator

Apparatus for data set extension generation and pre-processing for AI training used in 3D data processing, characterized in that the sub data sequence is generated.

The method of claim 3, wherein the sub data sequence generation unit,

3D Data consisting of

Apparatus for data set expansion generation and pre-processing for AI training used in 3D data processing, characterized in that new 3D data are generated by randomly extracting them every n intervals.

According to claim 4, The length of the t-axis of the generated sub 3D Data is A, sequential according to t

A device for generating and preprocessing a data set for AI training, which is used for 3D data processing, comprising:

The method of claim 5, after measuring the motion change according to the change in t, which is the motion gradient of (x, y) as the progress of the generated data t,
This is to create one set of MHI 2D data with information on the relationship between the brightness of the pixels according to the progression order of t, and to create and preprocess the data set for AI training used in 3D data processing. Device.

According to claim 1, Machine learning processing unit,
C layer (Convolutional layer) that receives 2D data as input and applies filters to extract features of 2D data to create feature maps.
S layer (subsampling layer) that extracts representative values from the feature map extracted from the C layer (Convolutional layer) and outputs a feature map with a reduced data volume,
Data set extension for AI training used in 3D data processing, including an FC layer (Fully Connected layer) that connects and learns all the features created through the C layer (Convolutional layer) and the S layer (subsampling layer). Device for creation and pretreatment.

A data input step of receiving a first number of 3D data;
A sub-data selection step of randomly selecting sub-data within a predetermined range from one data consisting of multiple sub-data;
A sub data sequence generation step of generating a sequence of sub data and expanding and generating a second number of 3D data that is greater than the first number;
MAI 2D data generation step of converting a plurality of 3D data generated in the sub-data sequence generation step to 2D data through a motion history image (MHI) process; AI training used in 3D data processing comprising the For data set extension generation and preprocessing.

The method of claim 8, wherein the MHI 2D data generation step receives the 2D data as input and applies a filter to extract the features of the 2D data to extract a feature map,
A machine learning processing step of extracting representative values and outputting a feature map with a reduced data amount and connecting and learning all the created features; further comprising: generating a data set extension for AI training used in 3D data processing, and Method for pretreatment.

The method of claim 8, wherein in the sub-data selection step, when 3D data consisting of x, y, and z is input, the size of the t-axis is k + 1, the range is n, and one of the data is randomly selected. ,
A is

If it is less than or equal to the largest integer,
In the sub data sequence generation step

A method for generating and pre-processing a data set for AI training used for 3D data processing, characterized in that three sub data sequences are generated.

The method of claim 10, wherein the sub-data sequence generation step,

3D Data consisting of

A method for generating and preprocessing a data set for AI training used in 3D data processing, characterized in that new 3D data are generated by randomly extracting them at every n intervals as described above.

The length of the t-axis of the generated sub 3D Data is A, and the sequential order is based on t.

A method for generating and preprocessing a data set for AI training, which is used for 3D data processing.

The method of claim 12, after measuring the motion change according to the change in t, which is the motion gradient of (x, y), as the progress of the generated data t.
This is to create one set of MHI 2D data with information on the relationship between the brightness of the pixels according to the progression order of t, and to create and preprocess the data set for AI training used in 3D data processing. Way.