KR102559574B1

KR102559574B1 - Data conversion method and system for improving the accuracy of artificial intelligence algorithms

Info

Publication number: KR102559574B1
Application number: KR1020200133663A
Authority: KR
Inventors: 이상엽; 이재규; 조인표
Original assignee: 한국전자기술연구원
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2023-07-26
Also published as: KR20220049932A

Abstract

인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 방법이 제공된다. 상기 방법은 원본 데이터로부터 특성(feature) 및 정답(target)으로 구성된 제1 데이터 셋을 설정하는 단계; 상기 제1 데이터 셋을 미리 준비된 머신러닝 알고리즘에 적용시키는 단계; 상기 제1 데이터 셋에 기반한 상기 머신러닝 알고리즘에서의 실제 값과 예측 값 간의 정확도가 미리 설정된 정확도 이상을 만족하는지 여부를 판단하는 단계; 상기 판단 결과 미리 설정된 정확도 미만인 경우, 상기 원본 데이터로부터 이미지 형태의 제2 데이터 셋을 생성하는 단계; 상기 제2 데이터 셋를 이미지 분류 딥러닝 알고리즘에 적용시키는 단계; 상기 제2 데이터 셋에 기반한 상기 이미지 분류 딥러닝 알고리즘에서의 실제 값과 예측 값 간의 정확도가 상기 설정된 정확도 이상을 만족하는지 여부를 판단하는 단계; 및 상기 판단 결과에 기초하여 상기 예측 값을 추가하여 상기 제1 데이터 셋을 갱신하는 단계를 포함한다.A data conversion method for improving the accuracy of an artificial intelligence algorithm is provided. The method may include setting a first data set composed of features and targets from original data; applying the first data set to a previously prepared machine learning algorithm; determining whether an accuracy between an actual value and a predicted value in the machine learning algorithm based on the first data set satisfies a preset accuracy or higher; generating a second data set in the form of an image from the original data when the accuracy is less than a preset accuracy as a result of the determination; applying the second data set to an image classification deep learning algorithm; determining whether an accuracy between an actual value and a predicted value in the image classification deep learning algorithm based on the second data set satisfies the set accuracy or higher; and updating the first data set by adding the prediction value based on the determination result.

Description

Data conversion method and system for improving the accuracy of artificial intelligence algorithm {DATA CONVERSION METHOD AND SYSTEM FOR IMPROVING THE ACCURACY OF ARTIFICIAL INTELLIGENCE ALGORITHMS}

본 발명은 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 방법 및 시스템에 관한 것이다.The present invention relates to a data conversion method and system for improving the accuracy of an artificial intelligence algorithm.

도 1은 종래 센서 데이터에 대한 딥러닝 적용 플로우를 도시한 도면이다. 도 2는 종래 기술에서의 학습 알고리즘을 설명하기 위한 도면이다. 도 3은 데이터의 과소적합을 설명하기 위한 도면이다.1 is a diagram showing a deep learning application flow for conventional sensor data. 2 is a diagram for explaining a learning algorithm in the prior art. 3 is a diagram for explaining underfitting of data.

일반적으로 센서 데이터 또는 숫자로 표시된 데이터(이하, 숫자 데이터)는 도 1과 같이, 데이터를 분류하고 데이터의 특징 벡터를 추출한 후 학습 모델에 적용시킨다.In general, sensor data or data represented by numbers (hereinafter referred to as numeric data) is applied to a learning model after classifying the data and extracting a feature vector of the data, as shown in FIG. 1 .

특히, 도 2에 도시된 바와 같이 센서 데이터 또는 숫자 데이터는 딥러닝 알고리즘을 적용하는데 있어, 한 두개 만의 은닉층으로 구성되는 Shallow feature learning 알고리즘을 주로 사용하게 되며, 이 경우 만족스러운 정확도를 얻기 어려운 문제점이 있다.In particular, as shown in FIG. 2, in applying a deep learning algorithm to sensor data or numeric data, a shallow feature learning algorithm consisting of only one or two hidden layers is mainly used, and in this case, it is difficult to obtain satisfactory accuracy. There is a problem.

또한, 데이터 셋이 적은 경우 복잡한 알고리즘을 적용하더라도, 훈련 세트와 검증 세트의 성능차이는 크지 않지만 모두 낮은 성능을 가져오는 과소적합(Under-Fitting) 문제가 발생할 수 있다. 도 3을 참조하면, 샘플 개수가 충분하지 않을 경우 불필요한 노이즈도 같이 모델링되는 문제가 있다.In addition, when the data set is small, even if a complex algorithm is applied, the performance difference between the training set and the validation set is not large, but an under-fitting problem that results in low performance may occur. Referring to FIG. 3 , when the number of samples is not sufficient, there is a problem in that unnecessary noise is also modeled.

이와 같은 문제를 해결하기 위해서는 더 많은 데이터를 확보하고 다양한 특징을 찾아서 학습하는 것이 필요하나, 데이터를 확보할 수 없는 경우 이에 대한 해결 방안이 필요한 실정이다.In order to solve such a problem, it is necessary to secure more data and find and learn various features. However, when data cannot be obtained, a solution is needed.

국내 공개특허공보 제10-2017-0079161호(2017.07.10)Domestic Patent Publication No. 10-2017-0079161 (2017.07.10)

본 발명의 실시예는 데이터 셋의 샘플 개수가 충분하지 않은 과소적합 상태의 문제를 해소하기 위하여, 원본 데이터를 이미지 형태의 데이터 셋으로 구성하고, 이미지 분류 기반 알고리즘에 적용시킨 예측값을 기반으로 데이터 셋을 재구성하여, 예측 정확도를 보다 향상시키는 것을 향상시킬 수 있는 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 방법 및 시스템를 제공한다. Embodiments of the present invention provide a data conversion method and system for improving the accuracy of an artificial intelligence algorithm, which can improve the accuracy of an artificial intelligence algorithm by constructing original data into a data set in the form of an image and reconstructing the data set based on a prediction value applied to an image classification-based algorithm in order to solve the problem of underfitting in which the number of samples in the data set is not sufficient.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the present embodiment is not limited to the technical problem as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 제 1 측면에 따른 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 방법은 원본 데이터로부터 특성(feature) 및 정답(target)으로 구성된 제1 데이터 셋을 설정하는 단계; 상기 제1 데이터 셋을 미리 준비된 머신러닝 알고리즘에 적용시키는 단계; 상기 제1 데이터 셋에 기반한 상기 머신러닝 알고리즘에서의 실제 값과 예측 값 간의 정확도가 미리 설정된 정확도 이상을 만족하는지 여부를 판단하는 단계; 상기 판단 결과 미리 설정된 정확도 미만인 경우, 상기 원본 데이터로부터 이미지 형태의 제2 데이터 셋을 생성하는 단계; 상기 제2 데이터 셋를 이미지 분류 딥러닝 알고리즘에 적용시키는 단계; 상기 제2 데이터 셋에 기반한 상기 이미지 분류 딥러닝 알고리즘에서의 실제 값과 예측 값 간의 정확도가 상기 설정된 정확도 이상을 만족하는지 여부를 판단하는 단계; 및 상기 판단 결과에 기초하여 상기 예측 값을 추가하여 상기 제1 데이터 셋을 갱신하는 단계를 포함한다.As a technical means for achieving the above-described technical problem, a data conversion method for improving the accuracy of an artificial intelligence algorithm according to a first aspect of the present invention includes a first data set consisting of features and targets from original data Setting a data set; applying the first data set to a previously prepared machine learning algorithm; determining whether an accuracy between an actual value and a predicted value in the machine learning algorithm based on the first data set satisfies a preset accuracy or higher; generating a second data set in the form of an image from the original data when the accuracy is less than a preset accuracy as a result of the determination; applying the second data set to an image classification deep learning algorithm; determining whether an accuracy between an actual value and a predicted value in the image classification deep learning algorithm based on the second data set satisfies the set accuracy or higher; and updating the first data set by adding the prediction value based on the determination result.

본 발명의 일부 실시예에서, 상기 원본 데이터로부터 이미지 형태의 제2 데이터 셋을 생성하는 단계는, 상기 제1 데이터 셋의 특성 수에 상응하도록 이미지를 생성하여 맵핑시키는 단계; 및 상기 정답에 부여된 값을 분류 폴더명으로 저장하는 단계를 포함할 수 있다.In some embodiments of the present invention, generating a second data set in the form of an image from the original data may include generating and mapping images to correspond to the number of characteristics of the first data set; and storing a value assigned to the correct answer as a classification folder name.

본 발명의 일부 실시예에서, 상기 원본 데이터의 특성 수에 상응하도록 이미지를 생성하여 맵핑시키는 단계는, 상기 특성에 부여된 값의 크기에 따라 색의 요소를 구분하여 상기 이미지를 생성하여 맵핑시킬 수 있다.In some embodiments of the present invention, in the step of generating and mapping an image corresponding to the number of characteristics of the original data, the image may be generated and mapped by classifying color elements according to the size of a value assigned to the characteristic.

본 발명의 일부 실시예에서, 상기 원본 데이터의 특성 수에 상응하도록 이미지를 생성하여 맵핑시키는 단계는, 상기 특성마다 각각 색상을 상이하도록 구분하여 상기 이미지를 생성하여 맵핑시키되, 상기 특성에 부여된 값의 크기에 따라 상기 색상의 채도 및 명도를 점점 높은 값으로 또는 점점 낮은 값으로 할당하여 상기 이미지를 생성 및 맵핑시킬 수 있다.In some embodiments of the present invention, in the step of generating and mapping an image corresponding to the number of characteristics of the original data, the image may be generated and mapped by dividing the color so that each characteristic has a different color, and the image may be generated and mapped by assigning the saturation and lightness of the color to gradually higher or lower values according to the size of the value given to the characteristic.

본 발명의 일부 실시예에서, 상기 원본 데이터로부터 이미지 형태의 제2 데이터 셋을 생성하는 단계는, 상기 제2 데이터 셋의 특징 값의 범위를 확인하는 단계; 및 상기 특징 값의 범위가 속하는 유형에 기초하여 상기 이미지의 형태의 타입을 적용하는 단계를 포함할 수 있다.In some embodiments of the present invention, generating a second data set in the form of an image from the original data may include checking a range of feature values of the second data set; and applying a type of the shape of the image based on a type to which the feature value range belongs.

본 발명의 일부 실시예에서, 상기 특징 값의 범위가 유형에 기초하여 상기 이미지의 형태의 타입을 적용하는 단계는, 상기 특징 값의 범위가 정규 분포의 형태를 갖는 경우 막대 차트 타입의 이미지 형태를 적용하고, 상기 특징 값의 범위 중 일부가 다수의 범위에서 벗어나는 경우 파이 차트 타입의 형태를 적용할 수 있다.In some embodiments of the present invention, in the step of applying the type of image shape based on the type of the feature value range, a bar chart type image shape may be applied when the feature value range has a normal distribution shape, and a pie chart type shape may be applied when some of the feature value ranges are out of a plurality of ranges.

본 발명의 일부 실시예에서, 상기 정답에 부여된 값을 폴더명으로 저장하는 단계는, 상기 원본 데이터에서의 상기 특성에 상응하는 정답이 복수 개인 경우, 상기 정답에 부여된 값의 배열로 상기 분류 폴더명을 지정할 수 있다.In some embodiments of the present invention, in the step of storing the value assigned to the correct answer as a folder name, if there are a plurality of correct answers corresponding to the characteristic in the original data, the classification folder name may be designated as an array of values assigned to the correct answer.

본 발명의 일부 실시예에서, 상기 판단 결과에 기초하여 상기 예측 값을 추가하여 상기 제1 데이터 셋을 갱신하는 단계는, 상기 판단 결과 상기 정확도가 미리 설정된 정확도 이상을 만족하는 경우, 상기 제1 데이터 셋을 갱신하고, 상기 판단 결과 상기 정확도가 설정된 정확도 미만인 경우, 상기 원본 데이터로부터 이미지 형태의 제2 데이터 셋을 생성하는 단계를 재수행할 수 있다.In some embodiments of the present invention, the step of updating the first data set by adding the predicted value based on the determination result may include, as a result of the determination, when the accuracy satisfies a preset accuracy or higher, updating the first data set, and when the determination result is less than the set accuracy, the step of generating a second data set in the form of an image from the original data may be re-performed.

본 발명의 일부 실시예에서, 상기 제1 데이터 셋은 센서 데이터 또는 숫자 데이터인 원본 데이터로부터 구성됨에 따라 과소적합 상태로 생성되는 데이터 셋일 수 있다.In some embodiments of the present invention, the first data set may be a data set generated in an underfitting state as it is configured from original data such as sensor data or numeric data.

또한, 본 발명의 제2 측면에 따른 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 시스템은 과소적합 상태의 센서 데이터 또는 숫자 데이터로 구성되는 원본 데이터로부터 특성(feature) 및 정답(target)으로 구성된 제1 데이터 셋을 설정하고, 상기 원본 데이터로부터 이미지 형태의 제2 데이터 셋을 생성하여 상기 제1 데이터 셋을 갱신하기 위한 프로그램이 저장된 메모리 및 상기 메모리에 저장된 프로그램을 실행시키는 프로세서를 포함한다. 이때, 상기 프로세서는 상기 프로그램을 실행시킴에 따라, 상기 제2 데이터 셋를 이미지 분류 딥러닝 알고리즘에 적용시키고, 상기 제2 데이터 셋에 기반한 상기 이미지 분류 딥러닝 알고리즘에서의 실제 값과 예측 값 간의 정확도가 상기 설정된 정확도 이상을 만족하는지 여부를 판단하며, 상기 판단 결과에 기초하여 상기 예측 값을 반영하여 상기 제1 데이터 셋을 갱신한다.In addition, a data conversion system for improving the accuracy of an artificial intelligence algorithm according to a second aspect of the present invention includes a memory storing a program for setting a first data set composed of features and targets from original data composed of sensor data or numerical data in an underfitting state, generating a second data set in the form of an image from the original data, and updating the first data set, and a processor executing the program stored in the memory. At this time, as the program is executed, the processor applies the second data set to an image classification deep learning algorithm, determines whether the accuracy between the actual value and the predicted value in the image classification deep learning algorithm based on the second data set satisfies the set accuracy or higher, and based on the determination result, updates the first data set by reflecting the predicted value.

본 발명의 일부 실시예에서, 상기 프로세서는 상기 원본 데이터의 특성에 부여된 값의 크기에 따라 색의 요소를 구분하고, 상기 정답에 부여된 값을 분류 폴더명으로 저장하여, 상기 이미지를 생성하여 맵핑시킬 수 있다.In some embodiments of the present invention, the processor classifies color elements according to the size of values assigned to the characteristics of the original data, stores the value assigned to the correct answer as a classification folder name, and creates and maps the image.

본 발명의 일부 실시예에서, 상기 프로세서는 상기 제2 학습데이터 셋의 특징 값의 범위를 확인하고, 상기 특징 값의 범위가 속하는 유형에 기초하여 상기 이미지의 형태의 타입을 적용하되, 상기 특징 값의 범위가 정규 분포의 형태를 갖는 경우 막대 차트 타입의 이미지 형태를 적용하고, 상기 특징 값의 범위 중 일부가 다수의 범위에서 벗어나는 경우 파이 차트 타입의 형태를 적용할 수 있다.In some embodiments of the present invention, the processor checks the range of feature values of the second learning data set, and applies the type of the shape of the image based on the type to which the range of feature values belongs. If the range of feature values has a normal distribution form, a bar chart type image form may be applied, and if some of the feature value ranges are out of a plurality of ranges, a pie chart type form may be applied.

이 외에도, 본 발명을 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition to this, another method for implementing the present invention, another system, and a computer readable recording medium recording a computer program for executing the method may be further provided.

상기와 같은 본 발명에 따르면, 센서 데이터나 숫자 데이터와 같이 데이터 셋이 충분하지 않아 높은 정확도를 확보하기 어려우며, 또한 전처리 과정의 제한으로 인하여 특징 추출이 어려운 상태에서도, 해당 데이터 셋을 이미지 기반 형태의 데이터 셋으로 변환하여 다수의 특징 추출이 용이하도록 하여 과소적합에 따른 문제를 해소할 수 있다. 이를 통해 머신러닝 알고리즘이나 적은 수의 레이어로 구성된 신경망 알고리즘에 적용 시 데이터 셋의 정확도를 보다 향상시킬 수 있다는 장점이 있다.According to the present invention as described above, even when it is difficult to secure high accuracy due to insufficient data sets such as sensor data or numerical data, and feature extraction is difficult due to limitations in the preprocessing process, the problem of underfitting can be solved by converting the corresponding data set into an image-based data set to facilitate extraction of multiple features. This has the advantage of further improving the accuracy of the data set when applied to a machine learning algorithm or a neural network algorithm composed of a small number of layers.

또한, 회귀 문제 해결의 경우 정확한 예측이 어려우며 데이터의 노이즈가 많을 경우 그 예측 오차가 크게 발생하는 문제가 있으나, 본 발명의 일 실시예는 이미지 맵핑 방식을 적용하여 데이터의 특성을 반영하고 다양한 형태의 이미지로 정형화하여 용이한 특징 추출이 가능하도록 하여, 노이즈에 따른 오차 문제를 해소할 수 있다.In addition, in the case of solving the regression problem, it is difficult to accurately predict and there is a problem that the prediction error is large when there is a lot of noise in the data. However, an embodiment of the present invention applies an image mapping method to reflect the characteristics of the data and standardize them into various types of images to enable easy feature extraction, thereby solving the error problem due to noise.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 종래 센서 데이터에 대한 딥러닝 적용 플로우를 도시한 도면이다.
도 2는 종래 기술에서의 학습 알고리즘을 설명하기 위한 도면이다.
도 3은 데이터의 과소적합을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 방법의 순서도이다.
도 5는 센서 데이터의 일 예시를 도시한 도면이다.
도 6a 내지 도 6d는 본 발명에서의 이미지 형태의 제2 데이터 셋의 일 예시를 도시한 도면이다.
도 7은 이미지 분류 딥러닝 알고리즘의 일 예시를 도시한 도면이다.
도 8은 본 발명의 일 실시예에 따른 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 시스템을 설명하기 위한 도면이다.1 is a diagram showing a deep learning application flow for conventional sensor data.
2 is a diagram for explaining a learning algorithm in the prior art.
3 is a diagram for explaining underfitting of data.
4 is a flowchart of a data conversion method for improving the accuracy of an artificial intelligence algorithm according to an embodiment of the present invention.
5 is a diagram illustrating an example of sensor data.
6A to 6D are diagrams illustrating an example of a second data set in the form of an image in the present invention.
7 is a diagram illustrating an example of an image classification deep learning algorithm.
8 is a diagram for explaining a data conversion system for improving the accuracy of an artificial intelligence algorithm according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention, and methods of achieving them, will become clear with reference to the detailed description of the following embodiments taken in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in a variety of different forms, and only these embodiments are provided to complete the disclosure of the present invention and to fully inform those skilled in the art of the scope of the present invention to which the present invention belongs, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.Terminology used herein is for describing the embodiments and is not intended to limit the present invention. In this specification, singular forms also include plural forms unless specifically stated otherwise in a phrase. As used herein, "comprises" and/or "comprising" does not exclude the presence or addition of one or more other elements other than the recited elements. Like reference numerals throughout the specification refer to like elements, and “and/or” includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various components, these components are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first element mentioned below may also be the second element within the technical spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings commonly understood by those skilled in the art to which the present invention belongs. In addition, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless explicitly specifically defined.

도 4는 본 발명의 일 실시예에 따른 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 방법의 순서도이다.4 is a flowchart of a data conversion method for improving the accuracy of an artificial intelligence algorithm according to an embodiment of the present invention.

한편, 도 4에 도시된 단계들은 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 시스템(100)을 구성하는 서버(이하, 서버)에 의해 수행되는 것으로 이해될 수 있지만, 이에 제한되는 것은 아니다.On the other hand, the steps shown in Figure 4 can be understood as being performed by a server (hereinafter referred to as a server) constituting the data conversion system 100 for improving the accuracy of an artificial intelligence algorithm, but is not limited thereto.

먼저, 서버는 원본 데이터로부터 특성(feature) 및 정답(target)으로 구성된 제1 데이터 셋을 설정한다(S105). 그리고 서버는 제1 데이터 셋을 미리 준비된 머신러닝 알고리즘에 적용시키고(S110), 제1 데이터 셋에 기반한 머신러닝 알고리즘에서의 실제 값과 예측 값 간의 정확도가 미리 설정된 정확도 이상을 만족하는지 여부를 판단한다(S115).First, the server sets a first data set composed of features and targets from original data (S105). Then, the server applies the first data set to the machine learning algorithm prepared in advance (S110), and determines whether the accuracy between the actual value and the predicted value in the machine learning algorithm based on the first data set satisfies a preset accuracy or higher (S115).

도 5는 센서 데이터의 일 예시를 도시한 도면이다.5 is a diagram illustrating an example of sensor data.

일 실시예로, 원본 데이터는 센서 데이터 또는 숫자 데이터일 수 있다. 그리고 제1 학습 데이터는 원본 데이터로부터 구성됨에 따라 과소적합 상태로 생성되는 학습 데이터 셋일 수 있다.In one embodiment, the original data may be sensor data or numeric data. Also, the first training data may be a training data set generated in an underfitting state as it is constructed from original data.

일반적으로, 센서 데이터는 그 데이터의 수와 데이터로부터 추출되는 특징 벡터의 수가 적은 경우, 이로부터 학습 데이터가 생성되어 딥러닝 또는 머신러닝 기법이 적용되더라도 예측 정확도가 낮다는 문제가 있다.In general, when the number of sensor data and the number of feature vectors extracted from the data are small, there is a problem in that prediction accuracy is low even if learning data is generated and deep learning or machine learning techniques are applied.

예를 들어, 456개의 데이터를 가지고 있을 경우, 이 중에서 70% 또는 80%를 학습 데이터로 사용하고 나머지 20~30%는 테스트 데이터로 사용하는 경우에 있어, 테스트 데이터로 활용되는 개수는 약 100여개 밖에 되지 않으므로, 다양한 알고리즘을 적용하더라도 정확도가 확보되지 않는 문제가 있다.For example, if you have 456 pieces of data, 70% or 80% of them are used as training data and the remaining 20 to 30% are used as test data. Since only about 100 pieces are used as test data, there is a problem in that accuracy is not secured even if various algorithms are applied.

특히, 예측하는 경우 데이터의 개수가 적으면 노이즈의 영향을 제거할 수 있는 방법이 많지 않으며, 노이즈 성분이 적용되어 함께 학습되기 때문에 정확도는 더욱 낮아지게 된다. 따라서, 학습모델이 강화되더라도 테스트 셋의 오차는 더 커지게 되는 경향이 발생하게 된다.In particular, in the case of prediction, if the number of data is small, there are not many methods to remove the influence of noise, and since the noise component is applied and learned together, the accuracy is further lowered. Therefore, even if the learning model is strengthened, the error of the test set tends to increase.

예를 들어, 표 1 및 표 2를 참조하면, 테스트 데이터 셋의 편차는 학습 데이터의 편차에 비하여 더욱 많은 오류를 가지고 있는 것을 확인할 수 있다. 여기에서 정확도를 판단하기 위하여 [식 1] 내지 [식 4]에 따른 MSE(Mean Squared Error), RMSE(Root Mean Squared Error), MAE(Mean Absolute Error) 및 R²는 회귀 평가 지표를 사용할 수 있으며, MSE, RMSE, MAE는 그 값이 클수록 정확도가 낮으며, R²는 그 값이 클수록 정확도가 높다는 것을 의미한다.For example, referring to Tables 1 and 2, it can be seen that the deviation of the test data set has more errors than the deviation of the training data. Here, in order to determine the accuracy, MSE (Mean Squared Error), RMSE (Root Mean Squared Error), MAE (Mean Absolute Error), and R ² according to [Equation 1] to [Equation 4] can use regression evaluation indicators, MSE, RMSE, and MAE. The higher the value, the lower the accuracy, and the higher the value of R ² , the higher the accuracy.

[식 1][Equation 1]

MSE= MSE=

[식 2][Equation 2]

RMSE= RMSE=

[식 3][Equation 3]

MAE= MAE=

[식 4][Equation 4]

R²= R ² =

학습 모델learning model MSEMSE RMSERMSE MAEMAE R² ^R2 AA 5.465.46 2.3752.375 1.0681.068 0.7250.725 BB 11.17111.171 3.3423.342 1.6591.659 0.4550.455 CC 12.28812.288 3.5053.505 1.7671.767 0.4000.400

<학습 데이터 편차><Training Data Deviation>

학습 모델learning model MSEMSE RMSERMSE MAEMAE R² ^R2 AA 29.04429.044 5.3895.389 1.6761.676 0.2770.277 BB 30.83230.832 5.5535.553 1.9341.934 0.2320.232 CC 31.02331.023 5.5705.570 1.9971.997 0.2280.228

<테스트 데이터 편차><test data deviation>

본 발명의 일 실시예는 데이터를 획득하는 과정에서 장비의 문제나 또는 설치된 상태에 따라 더 많은 데이터를 확보할 수 없는 경우, 현재 확보된 데이터로부터 다양한 특징을 더 추출하는 방법으로 과소적합에 따른 문제를 해소할 수 있다.다시 도 4를 참조하면, 서버는 판단 결과 미리 설정된 정확도 미만인 경우, 원본 데이터로부터 이미지 형태의 제2 데이터 셋을 생성한다(S120). 만약, 서버는 판단 결과 미리 설정된 정확도 이상을 만족하는 경우에는 제2 데이터 셋을 생성하는 과정을 생략하게 된다. An embodiment of the present invention can solve the problem of underfitting by extracting various features from the currently obtained data when more data cannot be secured due to equipment problems or installed conditions in the process of obtaining data. Referring back to FIG. If the server satisfies the preset accuracy or higher as a result of the determination, the process of generating the second data set is omitted.

도 6a 내지 도 6d는 본 발명에서의 이미지 형태의 제2 데이터 셋의 일 예시를 도시한 도면이다. 도 7은 이미지 분류 딥러닝 알고리즘의 일 예시를 도시한 도면이다.6A to 6D are diagrams illustrating an example of a second data set in the form of an image in the present invention. 7 is a diagram illustrating an example of an image classification deep learning algorithm.

구체적으로, 서버는 제1 데이터 셋의 특성 수에 상응하도록 이미지를 생성하여 맵핑시키고, 정답에 부여된 값을 분류 폴더명으로 저장한다. 이때, 서버는 특성에 부여된 값의 크기에 따라 색의 요소를 구분하여 이미지를 생성 및 맵핑시킬 수 있다. 즉, 도 6a에서와 같이 특징 값이 '2'인 경우 밝은 하늘색, 특징 값이 '3'인 경우 개나리색, 특징 값이 '5'인 경우 진한 갈색 등, 특징 값에 따라 서로 구별되는 색의 요소를 갖도롤 이미지를 맵핑시킬 수 있다.Specifically, the server generates and maps images to correspond to the number of characteristics of the first data set, and stores a value assigned to a correct answer as a classification folder name. In this case, the server may generate and map an image by classifying color elements according to the size of a value assigned to the characteristic. That is, as shown in FIG. 6A, an image having color elements distinguished from each other according to the feature value, such as bright sky blue when the feature value is '2', forsythia color when the feature value is '3', and dark brown when the feature value is '5', can be mapped.

이와 같이, 서버는 특성마다 각각 색상을 상이하도록 구분하여 이미지를 생성 및 맵핑시킬 수 있으며, 이에 더 나아가 서버는 특성에 부여된 값의 크기에 따라 색상의 채도 및 명도를 점점 더 높은 값으로 할당하거나, 또는 그 반대로 점점 더 낮은 값으로 할당하여 이미지를 생성 및 맵핑시킬 수 있다.In this way, the server can create and map images by classifying the color so that each characteristic is different, and furthermore, the server assigns the saturation and lightness of the color to higher and higher values according to the size of the value given to the characteristic, or conversely, it can generate and map the image by allocating it to lower and lower values.

또한, 서버는 제2 데이터 셋의 특징 값의 범위를 확인하고(S125), 특징 값의 범위가 속하는 유형에 기초하여 이미지의 형태의 타입을 적용할 수 있다.In addition, the server may check the range of feature values of the second data set (S125), and apply the type of the shape of the image based on the type to which the range of feature values belongs.

일 실시예로, 서버는 특징 값의 범위 중 일부가 다수의 범위에서 벗어나는 경우(S130) 파이 차트(Pie Chart) 타입의 형태를 적용하고(S135), 특징 값의 범위가 정규 분포의 형태를 갖는 경우(S140) 막대 차트(Bar Chart) 타입의 형태를 적용할 수 있다. 또한, 특징 값의 범위를 특정할 수 없는 경우에는(S150) 특정 형태의 별도 차트를 생성하여 적용할 수도 있다(S155).As an embodiment, the server applies a pie chart type shape when some of the feature value ranges are out of a plurality of ranges (S130), and applies a bar chart type shape when the feature value range has a normal distribution shape (S140). In addition, when the range of feature values cannot be specified (S150), a separate chart of a specific shape may be created and applied (S155).

그 다음, 서버는 정답의 종류에 따라 분류 폴더를 정의하되, 정답에 부여된 값을 분류 폴더명으로 정의하여 저장할 수 있다(S160). 이때, 서버는 원본 데이터에서의 특성에 상응하는 정답이 복수 개인 경우 정답에 부여된 값의 배열로 분류 폴더명을 지정할 수 있다.Next, the server may define a classification folder according to the type of correct answer, but may define and store a value assigned to the correct answer as a classification folder name (S160). In this case, when there are a plurality of correct answers corresponding to the characteristics of the original data, the server may designate a classification folder name as an array of values assigned to correct answers.

이와 같이 제2 데이터 셋이 생성되고 나면, 서버는 이미지 분류 딥러닝 알조리즘에 적용시키고(S165), 제2 데이터 셋에 기반한 이미지 분류 딥러닝 알고리즘에서의 실제 값과 예측 값 간의 정확도가 미리 설정된 정확도 이상을 만족하는지 여부를 판단한다(S170).After the second data set is generated, the server applies it to the image classification deep learning algorithm (S165), and determines whether the accuracy between the actual value and the predicted value in the image classification deep learning algorithm based on the second data set satisfies a preset accuracy or higher (S170).

서버는 판단 결과 정확도가 미리 설정된 정확도 이상을 만족하는 경우 제1 데이터 셋을 이미지 분류 딥러닝 알고리즘을 통한 예측 값에 기반하여 갱신할 수 있다(S175). The server may update the first data set based on the predicted value through the image classification deep learning algorithm when the accuracy as a result of the determination satisfies the preset accuracy or higher (S175).

이와 달리 서버는 판단 결과 정확도기 미리 설정된 정확도 미만인 경우 원본 데이터로부터 이미지 형태의 제2 데이터 셋을 생성하는 단계 이후의 과정을 반복 수행하여 예측 값의 정확도를 보다 향상시킬 수 있다.On the other hand, if the accuracy is less than the preset accuracy as a result of the determination, the server may further improve the accuracy of the predicted value by repeating the process after the step of generating the second data set in the form of an image from the original data.

한편, 상술한 설명에서, 단계 S105 내지 단계 S175 은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. 아울러, 기타 생략된 내용이라 하더라도 도 4 내지 도 7에 기술된 내용은 도 8의 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 시스템(100)에도 적용된다.Meanwhile, in the above description, steps S105 to S175 may be further divided into additional steps or combined into fewer steps, depending on the implementation of the present invention. Also, some steps may be omitted if necessary, and the order of steps may be changed. In addition, even if other omitted content, the content described in FIGS. 4 to 7 is also applied to the data conversion system 100 for improving the accuracy of the artificial intelligence algorithm of FIG. 8.

이하에서는 본 발명의 일 실시예에 따른 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 시스템(100)에 대하여 설명하도록 한다.Hereinafter, a data conversion system 100 for improving the accuracy of an artificial intelligence algorithm according to an embodiment of the present invention will be described.

도 8은 본 발명의 일 실시예에 따른 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 시스템(100)을 설명하기 위한 도면이다.8 is a diagram for explaining a data conversion system 100 for improving the accuracy of an artificial intelligence algorithm according to an embodiment of the present invention.

도 8을 참조하면, 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 시스템(100)은 메모리(110) 및 프로세서(120)를 포함한다.Referring to FIG. 8 , the data conversion system 100 for improving the accuracy of an artificial intelligence algorithm includes a memory 110 and a processor 120.

메모리(110)에는 과소적합 상태의 센서 데이터 또는 숫자 데이터로 구성되는 원본 데이터로부터 특성(feature) 및 정답(target)으로 구성된 제1 데이터 셋을 설정하고, 원본 데이터로부터 이미지 형태의 제2 데이터 셋을 생성하여 상기 제1 데이터 셋을 갱신하기 위한 프로그램이 저장된다.The memory 110 stores a program for setting a first data set consisting of features and targets from original data consisting of sensor data or numeric data in an underfitting state, generating a second data set in the form of an image from the original data, and updating the first data set.

프로세서(120)는 메모리(110)에 저장된 프로그램을 실행시킴에 따라, 제2 데이터 셋를 이미지 분류 딥러닝 알고리즘에 적용시키고, 제2 데이터 셋에 기반한 이미지 분류 딥러닝 알고리즘에서의 실제 값과 예측 값 간의 정확도가 설정된 정확도 이상을 만족하는지 여부를 판단하며, 판단 결과에 기초하여 예측 값을 반영하여 상기 제1 데이터 셋을 갱신시킨다.As the program stored in the memory 110 is executed, the processor 120 applies the second data set to the image classification deep learning algorithm, determines whether or not the accuracy between the actual value and the predicted value in the image classification deep learning algorithm based on the second data set satisfies the set accuracy or higher, and reflects the predicted value based on the determination result to update the first data set.

이상에서 전술한 본 발명의 일 실시예에 따른 인공지능 알고리즘의 정확도 향상을 위한 데이터 변환 방법은, 하드웨어인 서버와 결합되어 실행되기 위해 프로그램(또는 어플리케이션)으로 구현되어 매체에 저장될 수 있다.The data conversion method for improving the accuracy of an artificial intelligence algorithm according to an embodiment of the present invention described above may be implemented as a program (or application) to be executed in combination with a server, which is hardware, and stored in a medium.

상기 전술한 프로그램은, 상기 컴퓨터가 프로그램을 읽어 들여 프로그램으로 구현된 상기 방법들을 실행시키기 위하여, 상기 컴퓨터의 프로세서(CPU)가 상기 컴퓨터의 장치 인터페이스를 통해 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다. 이러한 코드는 상기 방법들을 실행하는 필요한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Functional Code)를 포함할 수 있고, 상기 기능들을 상기 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수 있다. 또한, 이러한 코드는 상기 기능들을 상기 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 상기 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조되어야 하는지에 대한 메모리 참조관련 코드를 더 포함할 수 있다. 또한, 상기 컴퓨터의 프로세서가 상기 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 상기 컴퓨터의 통신 모듈을 이용하여 원격에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수 있다.The above-described program may include a code coded in a computer language such as C, C++, JAVA, or machine language that can be read by a processor (CPU) of the computer through a device interface of the computer so that the computer reads the program and executes the methods implemented as a program. These codes may include functional codes related to functions defining necessary functions for executing the methods, and control codes related to execution procedures necessary for the processor of the computer to execute the functions according to a predetermined procedure. In addition, these codes may further include memory reference related code indicating where additional information or media necessary for the processor of the computer to execute the functions should be referenced from which location (address address) of the computer's internal or external memory. In addition, when the processor of the computer needs to communicate with any other remote computer or server in order to execute the functions, the code may further include communication-related codes for how to communicate with any other remote computer or server using a communication module of the computer, and what information or media should be transmitted and received during communication.

상기 저장되는 매체는, 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상기 저장되는 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있지만, 이에 제한되지 않는다. 즉, 상기 프로그램은 상기 컴퓨터가 접속할 수 있는 다양한 서버 상의 다양한 기록매체 또는 사용자의 상기 컴퓨터상의 다양한 기록매체에 저장될 수 있다. 또한, 상기 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장될 수 있다.The storage medium is not a medium that stores data for a short moment, such as a register, cache, or memory, but a medium that stores data semi-permanently and is readable by a device. Specifically, examples of the storage medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc., but are not limited thereto. That is, the program may be stored in various recording media on various servers accessible by the computer or various recording media on the user's computer. In addition, the medium may be distributed to computer systems connected through a network, and computer readable codes may be stored in a distributed manner.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.Steps of a method or algorithm described in connection with an embodiment of the present invention may be implemented directly in hardware, implemented in a software module executed by hardware, or implemented by a combination thereof. A software module may reside in random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, a hard disk, a removable disk, a CD-ROM, or any other form of computer-readable recording medium well known in the art.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.In the above, the embodiments of the present invention have been described with reference to the accompanying drawings, but those skilled in the art to which the present invention pertains may be embodied in other specific forms without changing the technical spirit or essential features of the present invention. It will be appreciated. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

100: 데이터 변환 시스템
110: 메모리
120: 프로세서100: data conversion system
110: memory
120: processor

Claims

In a method performed by a computer,
setting a first data set composed of features and targets from original data;
applying the first data set to a previously prepared machine learning algorithm;
determining whether an accuracy between an actual value and a predicted value in the machine learning algorithm based on the first data set satisfies a preset accuracy or higher;
generating a second data set in the form of an image from the original data when the accuracy is less than a preset accuracy as a result of the determination;
applying the second data set to an image classification deep learning algorithm;
determining whether an accuracy between an actual value and a predicted value in the image classification deep learning algorithm based on the second data set satisfies the set accuracy or higher; and
Based on the determination result, determining whether to repeat the step of generating a second data set in the form of an image from the original data,
The step of determining whether to repeat the step of generating the second data set,
As a result of the determination, when the accuracy satisfies a preset accuracy or higher, the step of generating the second data set is terminated, and when the accuracy is less than the set accuracy, the step of generating a second data set in the form of an image from the original data is re-performed,
A data conversion method for improving the accuracy of artificial intelligence algorithms.

According to claim 1,
The step of generating a second data set in the form of an image from the original data,
generating and mapping images corresponding to the number of characteristics of the first data set; and
Including the step of storing the value given to the correct answer as a classification folder name,
A data conversion method for improving the accuracy of artificial intelligence algorithms.

According to claim 2,
The step of generating and mapping images to correspond to the number of features of the original data,
Generating and mapping the image by dividing color elements according to the size of the value given to the characteristic,
A data conversion method for improving the accuracy of artificial intelligence algorithms.

According to claim 3,
The step of generating and mapping images to correspond to the number of features of the original data,
The image is generated and mapped by dividing the color so that each characteristic is different, and the image is created and mapped by assigning the saturation and brightness of the color to a higher value or a lower value according to the size of the value given to the characteristic,
A data conversion method for improving the accuracy of artificial intelligence algorithms.

According to claim 2,
The step of generating a second data set in the form of an image from the original data,
checking a range of feature values of the second data set; and
Applying a type of the shape of the image based on a type to which the range of feature values belongs.
A data conversion method for improving the accuracy of artificial intelligence algorithms.

According to claim 5,
Applying the type of the shape of the image based on the type of the range of feature values,
Applying a bar chart type image shape when the range of the feature values has a normal distribution shape, and applying a pie chart type shape when some of the ranges of the feature values deviate from a plurality of ranges,
A data conversion method for improving the accuracy of artificial intelligence algorithms.

According to claim 2,
In the step of saving the value given to the correct answer as a folder name,
If there are a plurality of correct answers corresponding to the characteristics in the original data, specifying the classification folder name as an array of values assigned to the correct answers,
A data conversion method for improving the accuracy of artificial intelligence algorithms.

delete

According to claim 1,
The first data set is a data set generated in an underfitting state as it is constructed from original data such as sensor data or numeric data.
A data conversion method for improving the accuracy of artificial intelligence algorithms.

In the data conversion system for improving the accuracy of artificial intelligence algorithm,
A memory storing a program for setting a first data set composed of features and targets from original data composed of sensor data or numeric data in an underfitting state; and
Including a processor that executes the program stored in the memory,
As the program is executed, the processor applies the first data set to a pre-prepared machine learning algorithm, determines whether or not an accuracy between an actual value and a predicted value in the machine learning algorithm based on the first data set satisfies a preset accuracy or higher, and generates a second data set in the form of an image from the original data when the determination is less than the preset accuracy;
Applying the second data set to an image classification deep learning algorithm, determining whether or not the accuracy between the actual value and the predicted value in the image classification deep learning algorithm based on the second data set satisfies the set accuracy or higher, and based on the determination result, determining whether or not to repeat the process of generating a second data set in the form of an image from the original data;
As a result of the determination, if the accuracy satisfies the preset accuracy or higher, the process of generating the second data set is terminated, and if the accuracy is less than the preset accuracy, the process of generating a second data set in the form of an image from the original data is re-performed,
A data conversion system to improve the accuracy of artificial intelligence algorithms.

According to claim 10,
The processor classifies color elements according to the size of values given to the characteristics of the original data, stores the value given to the correct answer as a classification folder name, and creates and maps the image,
A data conversion system to improve the accuracy of artificial intelligence algorithms.

According to claim 11,
The processor identifies a range of feature values of the second data set, and applies a type of shape of the image based on a type to which the range of feature values belongs,
Applying a bar chart type image shape when the range of the feature values has a normal distribution shape, and applying a pie chart type shape when some of the ranges of the feature values deviate from a plurality of ranges,
A data conversion system to improve the accuracy of artificial intelligence algorithms.