KR102354725B1

KR102354725B1 - Apparatus and method for classifying data for process mining

Info

Publication number: KR102354725B1
Application number: KR1020200098048A
Authority: KR
Inventors: 이상화; 원석래; 아스리아나 수트리스노와티 리스카; 심성현
Original assignee: 주식회사 아이오코드
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2022-02-07

Abstract

The present invention relates to an apparatus and a method for classifying data. The apparatus receives an unclassified data table and a mapping schema data table, extracts a predetermined number of unclassified sample tables from the unclassified data table, converts each of the predetermined number of unclassified sample tables into an image to generate a predetermined number of unclassified images, extracts a preset number of mapping sample tables from the mapping schema data table, generates a preset number of mapping images by converting each of the preset number of mapping sample tables into an image, calculates the mapping accuracy between the preset number of unclassified images and the preset number of mapping images using a convolutional neural network, and classifies the data of the unclassified data table based on the mapping accuracy to classify the data for checking a relation schema.

Description

Apparatus and method for classifying data for process mining

이하의 일 실시 예들은 비분류된 데이터 테이블에서 데이터를 분류해서 릴레이션 스키마를 확인하는 기술에 관한 것이다.The following embodiments relate to a technique for classifying data in an unclassified data table to check a relation schema.

최근에 다양한 산업 분야의 경영 및 운용관리 영역에서　프로세스　기반 정보시스템(PAIS: Process-aware Information System)의 도입에 대한 관심이 증가하고 있다. 이에 따라, 대용량의 비즈니스　프로세스　실행 로그와 이로부터 유용한 지식을 발견하기 위한　프로세스　마이닝의 중요성 역시 더욱 강조되고 있다.Recently, interest in the introduction of a process-aware information system (PAIS) is increasing in the management and operation management areas of various industrial fields. Accordingly, the importance of large-capacity business, process, and execution logs and process and mining to discover useful knowledge from them is also being emphasized.

프로세스　마이닝(process mining)은 정보시스템에서 제공되는 이벤트 로그로부터 유용한 지식을 추출하는 연구로,　프로세스　도출(discover), 모니터링(monitoring), 개선(improvement)을 위한 새로운 기법을 제공하며, 다양한 분야의 　프로세스에 적용이 가능하다.　Process mining is a study that extracts useful knowledge from event logs provided by information systems. It provides new techniques for discovering, monitoring, and improving processes. can be applied to

프로세스　마이닝을 이용하여 비즈니스　프로세스에서 일어나는 업무처리 기록을 바탕으로 유용한 정보를 발견할 수 있으며,　프로세스　마이닝을 통해 발견된 정보를 기업의 비즈니스　프로세스　혁신 등에 활용할 수 있다. 인터넷 및 컴퓨팅 기술의 발전과 데이터의 증가에 따라　프로세스　마이닝이 적용되는 분야와 시장의 규모는 점차 확대될 것으로 예상된다.By using process 　 mining, useful information can be discovered based on the business processing records that occur in business 　 processes, and the information discovered through 　 process 　 mining can be used for business, process, and innovation of the company. With the development of internet and computing technology and the increase of data, it is expected that the field and market size to which process and mining are applied will gradually expand.

기존 프로세스 마이닝 방법론들은 이벤트 로그가 있다는 가정에서 출발한다. 이벤트 로그의 경우 정보 시스템에서 추출된 이벤트 데이터를 이벤트 매핑 과정을 통해 MXML 또는 XES 파일로 변환을 통해 생성되며, 이 과정에서 도메인 지식을 요구한다.Existing process mining methodologies start with the assumption that there is an event log. The event log is created by converting the event data extracted from the information system into an MXML or XES file through the event mapping process, and domain knowledge is required in this process.

따라서, 이벤트 데이터를 이벤트 로그로 매핑하는 과정에는 해당 이벤트 데이터의 도메인 지식을 알고 있는 전문가가 이를 토대로 이벤트 데이터의 릴레이션 스키마를 확인하고 이벤트 로그로 매핑한다.Therefore, in the process of mapping event data to the event log, an expert who knows the domain of the corresponding event data checks the relation schema of the event data based on this and maps the event data to the event log.

갈수록 복잡해지고 데이터가 많아지는 상황에서 도메인 지식을 아는 전문가의 도움 없이도 이벤트 데이터를 이벤트 로그로 매핑하는 방법이 요구된다.In the face of increasing complexity and data volume, a method for mapping event data into event logs without the help of experts with domain knowledge is required.

본 발명은 비분류 데이터 테이블에 포함된 데이터를 분류하여 데이터의 릴레이션 스키마를 확인할 수 있도록 하는 것을 목적으로 한다.An object of the present invention is to classify data included in an unclassified data table so that a relation schema of data can be checked.

본 발명의 일 실시 예에 따른 데이터 분류 장치는, 비분류 데이터 테이블과 매핑 스키마 데이터 테이블을 수신하는 수신부; 상기 비분류 데이터 테이블에서 기설정된 개수의 비분류 샘플 테이블을 추출하는 비분류 샘플 추출부; 상기 기설정된 개수의 비분류 샘플 테이블 각각을 이미지로 변환하여 기설정된 개수의 비분류 이미지를 생성하는 비분류 이미지 생성부; 상기 매핑 스키마 데이터 테이블에서 기설정된 개수의 매핑 샘플 테이블을 추출하는 매핑 샘플 추출부; 상기 기설정된 개수의 매핑 샘플 테이블 각각을 이미지로 변환하여 기설정된 개수의 매핑 이미지를 생성하는 매핑 이미지 생성부; 컴볼루션 신경망을 이용해서 상기 기설정된 개수의 비분류 이미지와 상기 기설정된 개수의 매핑 이미지 간의 매핑 정확도를 계산하는 매핑 계산부; 및 상기 매핑 정확도를 기반으로 상기 비분류 데이터 테이블의 데이터를 분류해서 릴레이션 스키마를 확인하는 분류부를 포함한다.A data classification apparatus according to an embodiment of the present invention includes: a receiver configured to receive an unclassified data table and a mapping schema data table; an unclassified sample extraction unit for extracting a preset number of unclassified sample tables from the unclassified data table; an unclassified image generator configured to convert each of the preset number of unclassified sample tables into images to generate a preset number of unclassified images; a mapping sample extraction unit for extracting a preset number of mapping sample tables from the mapping schema data table; a mapping image generator for generating a preset number of mapping images by converting each of the preset number of mapping sample tables into an image; a mapping calculator configured to calculate mapping accuracy between the preset number of unclassified images and the preset number of mapped images using a convolutional neural network; and a classifier configured to classify data of the unclassified data table based on the mapping accuracy to check a relation schema.

이때, 상기 비분류 이미지 생성부는, 상기 기설정된 개수의 비분류 샘플 테이블을 관계 프리퀀시 메트릭스로 변환하여 비분류 관계 프리퀀시 메트릭스로 생성하고, 상기 비분류 관계 프리퀀시 메트릭스의 값을 픽셀의 색상값으로 하는 이미지로 변환하여 비분류 이미지를 생성할 수 있다.In this case, the unclassified image generating unit converts the predetermined number of unclassified sample tables into a relational frequency matrix to generate an unclassified relational frequency matrix, and an image using the value of the unclassified relational frequency matrix as a color value of a pixel can be converted to an unclassified image.

이때, 상기 비분류 관계 프리퀀시 메트릭스는, 상기 비분류 샘플 테이블에 포함된 특정 필드에 포함된 데이터에 관련된 중복되지 않은 데이터의 수를 카운트한 메트릭스일 수 있다.In this case, the non-classification relationship frequency matrix may be a matrix counting the number of non-redundant data related to data included in a specific field included in the non-classification sample table.

이때, 상기 매핑 이미지 생성부는, 상기 기설정된 개수의 매핑 샘플 테이블 각각을 관계 프리퀀시 메트릭스로 변환하여 매핑 관계 프리퀀시 메트릭스로 생성하고, 상기 매핑 관계 프리퀀시 메트릭스의 값을 픽셀의 색상값으로 하는 이미지로 변환하여 매핑 이미지를 생성할 수 있다.At this time, the mapping image generation unit converts each of the predetermined number of mapping sample tables into a relation frequency matrix to generate a mapping relation frequency matrix, and converts the value of the mapping relation frequency matrix into an image using the color value of the pixel. You can create a mapping image.

이때, 상기 매핑 관계 프리퀀시 메트릭스는, 상기 매핑 샘플 테이블에 포함된 특정 필드에 포함된 데이터에 관련된 중복되지 않은 데이터의 수를 카운트한 메트릭스일 수 있다.In this case, the mapping relationship frequency matrix may be a matrix counting the number of non-overlapping data related to data included in a specific field included in the mapping sample table.

이때, 상기 분류부는, 상기 비분류 이미지와 상기 매핑 이미지가 매칭된 확률을 나타내는 상기 매핑 정확도가 기설정된 매핑값 이상이면, 상기 매핑 이미지에 대응하는 상기 매핑 샘플 테이블의 릴레이션 스키마 정보를 이용해서 상기 비분류 데이터 테이블의 데이터를 분류해서 릴레이션 스키마를 확인할 수 있다.In this case, if the mapping accuracy indicating a matching probability between the unclassified image and the mapping image is equal to or greater than a preset mapping value, the classification unit is configured to use the relation schema information of the mapping sample table corresponding to the mapping image to determine the ratio You can check the relation schema by classifying the data in the classification data table.

이때, 상기 매핑 샘플 테이블의 상기 릴레이션 스키마 정보는, 상기 매핑 샘플 테이블의 각 필드를 구분하는 정보로서, 케이스 아이디, 타임스탬프, 리소스, 속성 중 하나일 수 있다.In this case, the relation schema information of the mapping sample table is information for classifying each field of the mapping sample table, and may be one of a case ID, a timestamp, a resource, and an attribute.

본 발명의 일 실시 예에 따른 데이터 분류 장치에서 데이터를 분류하는 방법은, 비분류 데이터 테이블과 매핑 스키마 데이터 테이블을 수신하는 단계; 상기 비분류 데이터 테이블에서 기설정된 개수의 비분류 샘플 테이블을 추출하는 단계; 상기 기설정된 개수의 비분류 샘플 테이블 각각을 이미지로 변환하여 기설정된 개수의 비분류 이미지를 생성하는 단계; 상기 매핑 스키마 데이터 테이블에서 기설정된 개수의 매핑 샘플 테이블을 추출하는 단계; 상기 기설정된 개수의 매핑 샘플 테이블 각각을 이미지로 변환하여 기설정된 개수의 매핑 이미지를 생성하는 단계; 컴볼루션 신경망을 이용해서 상기 기설정된 개수의 비분류 이미지와 상기 기설정된 개수의 매핑 이미지 간의 매핑 정확도를 계산하는 단계; 및 상기 매핑 정확도를 기반으로 상기 비분류 데이터 테이블의 데이터를 분류해서 릴레이션 스키마를 확인하는 단계를 포함한다.A method for classifying data in a data classification apparatus according to an embodiment of the present invention includes: receiving an unclassified data table and a mapping schema data table; extracting a predetermined number of unclassified sample tables from the unclassified data table; generating a preset number of unclassified images by converting each of the preset number of unclassified sample tables into images; extracting a preset number of mapping sample tables from the mapping schema data table; generating a preset number of mapping images by converting each of the preset number of mapping sample tables into an image; calculating mapping accuracy between the preset number of unclassified images and the preset number of mapped images using a convolutional neural network; and classifying data of the unclassified data table based on the mapping accuracy to check a relation schema.

이때, 상기 기설정된 개수의 비분류 샘플 테이블 각각을 이미지로 변환하여 상기 기설정된 개수의 비분류 이미지를 생성하는 단계는, 상기 기설정된 개수의 비분류 샘플 테이블을 관계 프리퀀시 메트릭스로 변환하여 비분류 관계 프리퀀시 메트릭스로 생성하는 단계; 및 상기 비분류 관계 프리퀀시 메트릭스의 값을 픽셀의 색상값으로 하는 이미지로 변환하여 비분류 이미지를 생성하는 단계를 포함할 수 있다.In this case, the step of generating the preset number of unclassified images by converting each of the preset number of unclassified sample tables into images includes converting the preset number of unclassified sample tables into a relational frequency matrix to form a non-classified relationship generating a frequency matrix; and generating an unclassified image by converting a value of the unclassified relational frequency matrix into an image having a color value of a pixel.

이때, 상기 기설정된 개수의 매핑 샘플 테이블 각각을 이미지로 변환하여 기설정된 개수의 매핑 이미지를 생성하는 단계는, 상기 기설정된 개수의 매핑 샘플 테이블 각각을 관계 프리퀀시 메트릭스로 변환하여 매핑 관계 프리퀀시 메트릭스로 생성하는 단계; 및 상기 매핑 관계 프리퀀시 메트릭스의 값을 픽셀의 색상값으로 하는 이미지로 변환하여 매핑 이미지를 생성하는 단계를 포함할 수 있다.In this case, the step of generating a predetermined number of mapping images by converting each of the predetermined number of mapping sample tables into images includes converting each of the predetermined number of mapping sample tables into a relation frequency matrix to generate a mapping relation frequency matrix to do; and converting a value of the mapping relation frequency matrix into an image using a color value of a pixel to generate a mapping image.

이때, 상기 매핑 정확도를 기반으로 상기 비분류 데이터 테이블의 데이터를 분류해서 릴레이션 스키마를 확인하는 단계는, 상기 비분류 이미지와 상기 매핑 이미지가 매칭된 확률을 나타내는 상기 매핑 정확도가 기설정된 매핑값 이상이면, 상기 매핑 이미지에 대응하는 상기 매핑 샘플 테이블의 릴레이션 스키마 정보를 이용해서 상기 비분류 데이터 테이블의 데이터를 분류할 수 있다.In this case, the step of classifying the data of the unclassified data table based on the mapping accuracy and checking the relation schema may include, if the mapping accuracy indicating a matching probability between the unclassified image and the mapping image is greater than or equal to a preset mapping value , the data of the unclassified data table may be classified using relation schema information of the mapping sample table corresponding to the mapping image.

본 발명은 비분류 데이터 테이블과 매핑 스키마 데이터 테이블 각각에서 샘플 테이블을 추출하고 샘플 테이블을 이미지화해서 비교하여 데이터를 분류함으로써 분류되지 않은 데이터 테이블의 릴레이션 스키마를 확인할 수 있다.In the present invention, the relation schema of the unclassified data table can be identified by extracting a sample table from each of the unclassified data table and the mapping schema data table, and classifying the data by comparing the image of the sample table.

도 1은 본 발명의 일 실시 예에 따라 비분류 데이터 테이블의 데이터를 분류하는 데이터 분류 장치의 구성을 도시한 도면이다.
도 2는 본 발명의 일 실시 예에 따를 데이터 분류 장치에서 비분류 데이터 테이블의 데이터를 분류하는 과정을 도시한 흐름도이다.
도 3은 본 발명의 일 실시 예에 따라 매핑 스키마 데이터 테이블의 예를 도시한 도면이다.
도 4는 본 발명의 일 실시 예에 따라 매핑 샘플 테이블을 매핑 관계 프리퀀시 메트릭스로 변환한 예를 도시한 도면이다.
도 5는 본 발명의 일 실시 예에 따라 매핑 관계 프리퀀시 매트릭스를 매핑 이미지로 변환한 예를 도시한 도면이다.1 is a diagram illustrating a configuration of a data classification apparatus for classifying data of an unclassified data table according to an embodiment of the present invention.
2 is a flowchart illustrating a process of classifying data in an unclassified data table in a data classification apparatus according to an embodiment of the present invention.
3 is a diagram illustrating an example of a mapping schema data table according to an embodiment of the present invention.
4 is a diagram illustrating an example of converting a mapping sample table into a mapping relation frequency matrix according to an embodiment of the present invention.
5 is a diagram illustrating an example of converting a mapping relation frequency matrix into a mapping image according to an embodiment of the present invention.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, since various changes may be made to the embodiments, the scope of the patent application is not limited or limited by these embodiments. It should be understood that all modifications, equivalents and substitutes for the embodiments are included in the scope of the rights.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the examples are used for the purpose of description only, and should not be construed as limiting. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present specification, terms such as "comprise" or "have" are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiment belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, in the description with reference to the accompanying drawings, the same components are given the same reference numerals regardless of the reference numerals, and the overlapping description thereof will be omitted. In describing the embodiment, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the embodiment, the detailed description thereof will be omitted.

또한, 실시 예의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다. In addition, in describing the components of the embodiment, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. When it is described that a component is “connected”, “coupled” or “connected” to another component, the component may be directly connected or connected to the other component, but another component is between each component. It will be understood that may also be "connected", "coupled" or "connected".

어느 하나의 실시 예에 포함된 구성요소와, 공통적인 기능을 포함하는 구성요소는, 다른 실시 예에서 동일한 명칭을 사용하여 설명하기로 한다. 반대되는 기재가 없는 이상, 어느 하나의 실시 예에 기재한 설명은 다른 실시 예에도 적용될 수 있으며, 중복되는 범위에서 구체적인 설명은 생략하기로 한다.Components included in one embodiment and components having a common function will be described using the same names in other embodiments. Unless otherwise stated, descriptions described in one embodiment may be applied to other embodiments as well, and detailed descriptions within the overlapping range will be omitted.

이하에서는, 본 발명의 일 실시 예에 따른 프로세스 마이닝을 위해 데이터를 분류하는 장치 및 방법을 첨부된 도 1 내지 도 5를 참조하여 상세히 설명한다.Hereinafter, an apparatus and method for classifying data for process mining according to an embodiment of the present invention will be described in detail with reference to FIGS. 1 to 5 .

도 1은 본 발명의 일 실시 예에 따라 비분류 데이터 테이블의 데이터를 분류하는 데이터 분류 장치의 구성을 도시한 도면이다.1 is a diagram illustrating a configuration of a data classification apparatus for classifying data of an unclassified data table according to an embodiment of the present invention.

도 1을 참조하면, 데이터 분류 장치(100)는 수신부(101), 비분류 샘플 추출부(102), 매핑 샘플 추출부(103), 비분류 이미지 생성부(104), 매핑 이미지 생성부(105), 매핑 계산부(106) 및 분류부(107)를 포함하여 구성될 수 있다.Referring to FIG. 1 , the data classification apparatus 100 includes a receiver 101 , an unclassified sample extractor 102 , a mapping sample extractor 103 , an unclassified image generator 104 , and a mapping image generator 105 . ), a mapping calculation unit 106 and a classification unit 107 may be included.

수신부(101)는 비분류 데이터 테이블(110)과 매핑 스키마 데이터 테이블(120)을 수신한다.The receiver 101 receives the unclassified data table 110 and the mapping schema data table 120 .

여기서, 비분류 데이터 테이블(110)은 릴레이션 스키마가 정의되지 않은 이벤트 데이터들을 테이블 형태로 구성된 것이다.Here, the unclassified data table 110 is configured in the form of a table of event data for which a relation schema is not defined.

그리고, 매핑 스키마 데이터 테이블(120)은 릴레이션 스키마가 이미 정의된 이벤트 로그들이 테이블 형태로 구성된 것으로, 사전에 이벤트 데이트에서 이벤트 로그로 매핑이 완료된 것이다.And, the mapping schema data table 120 is configured in the form of a table of event logs in which the relation schema is already defined, and mapping from event data to event logs is completed in advance.

비분류 데이터 테이블(110)와 매핑 스키마 데이터 테이블(120)은 동일한 비즈니스 프로세스 과정에서 생성된 데이터일 수도 있지만, 동일하지 않지만 유사한 비즈니스 프로세스 과정에서 생성된 데이터 일 수도 있다.The unclassified data table 110 and the mapping schema data table 120 may be data generated in the same business process process, but may be data generated in a similar business process process although not the same.

매핑 샘플 추출부(103)는 매핑 스키마 데이터 테이블(120)에서 기설정된 개수의 매핑 샘플 테이블을 추출한다.The mapping sample extraction unit 103 extracts a preset number of mapping sample tables from the mapping schema data table 120 .

도 3은 본 발명의 일 실시 예에 따라 매핑 스키마 데이터 테이블의 예를 도시한 도면이다.3 is a diagram illustrating an example of a mapping schema data table according to an embodiment of the present invention.

도 3을 참조하면, 매핑 샘플 추출부(103)는 매핑 스키마 데이터 테이블(120)에서 기설정된 개수의 필드(310, 320, 330)를 선택하고, 선택된 필드(310, 320, 330)들에서 기설정된 크기의 매핑 샘플 테이블(340)을 기설정된 개수만큼 추출한다.Referring to FIG. 3 , the mapping sample extractor 103 selects a preset number of fields 310 , 320 , and 330 from the mapping schema data table 120 , and selects a preset number of fields 310 , 320 , and 330 from the selected fields 310 , 320 , 330 . A preset number of mapping sample tables 340 of a set size are extracted.

이때, 선택된 필드에서 추출하는 매핑 샘플 테이블의 기설정된 개수는 예를들어, 1000~2000로 설정할 수 있다.In this case, the preset number of mapping sample tables extracted from the selected field may be set to, for example, 1000 to 2000.

매핑 이미지 생성부(105)는 기설정된 개수의 매핑 샘플 테이블 각각을 이미지로 변환하여 기설정된 개수의 매핑 이미지를 생성한다.The mapping image generator 105 generates a preset number of mapping images by converting each of the preset number of mapping sample tables into an image.

매핑 이미지 생성부(105)는 기설정된 개수의 매핑 샘플 테이블 각각을 관계 프리퀀시 메트릭스로 변환하여 매핑 관계 프리퀀시 메트릭스로 생성하고, 매핑 관계 프리퀀시 메트릭스의 값을 픽셀의 색상값으로 하는 이미지로 변환하여 매핑 이미지를 생성할 수 있다.The mapping image generation unit 105 converts each of the predetermined number of mapping sample tables into a relation frequency matrix to generate a mapping relation frequency matrix, and converts the value of the mapping relation frequency matrix into an image using the color value of the pixel to convert the mapping image can create

이때, 매핑 관계 프리퀀시 메트릭스는 매핑 샘플 테이블에 포함된 특정 필드에 포함된 데이터에 관련된 중복되지 않은 데이터의 수를 카운트한 메트릭스이다.In this case, the mapping relationship frequency matrix is a matrix counting the number of non-redundant data related to data included in a specific field included in the mapping sample table.

도 4는 본 발명의 일 실시 예에 따라 매핑 샘플 테이블을 매핑 관계 프리퀀시 메트릭스로 변환한 예를 도시한 도면이다.4 is a diagram illustrating an example of converting a mapping sample table into a mapping relation frequency matrix according to an embodiment of the present invention.

도 4를 참조하면, 매핑 이미지 생성부(105)는 매핑 샘플 테이블(410)에 포함된 특정 필드(첫번째 필드)에 포함된 데이터에 관련된 다른 필드들에 포함된 중복되지 않은 데이터의 수를 카운트해서 생성된 메트릭스인 매핑 관계 프리퀀시 메트릭스(420)를 생성한다.Referring to FIG. 4 , the mapping image generator 105 counts the number of non-overlapping data included in other fields related to data included in a specific field (the first field) included in the mapping sample table 410 , A mapping relation frequency matrix 420 that is the generated matrix is generated.

도 5는 본 발명의 일 실시 예에 따라 매핑 관계 프리퀀시 매트릭스를 매핑 이미지로 변환한 예를 도시한 도면이다.5 is a diagram illustrating an example of converting a mapping relation frequency matrix into a mapping image according to an embodiment of the present invention.

도 5를 참조하면, 매핑 이미지 생성부(105)는 매핑 관계 프리퀀시 메트릭스(510)의 값을 픽셀의 색상값으로 하는 이미지로 변환하여 매핑 이미지(520)를 생성할 수 있다.Referring to FIG. 5 , the mapping image generator 105 may generate the mapping image 520 by converting the value of the mapping relation frequency matrix 510 into an image using the color value of the pixel.

보다 구체적으로, 매핑 이미지 생성부(105)는 매핑 관계 프리퀀시 메트릭스(510)의 값 3개를 묶어서 RGB의 색상값에 대응하여 하나의 픽셀의 칼라로 변환할 수 있으며, 매핑 관계 프리퀀시 메트릭스(510)의 모든 값에 대해서 픽셀의 칼라로 변환함으로써, 매핑 이미지(520)를 생성할 수 있다.More specifically, the mapping image generating unit 105 may bind three values of the mapping relation frequency matrix 510 and convert it into a color of one pixel corresponding to the RGB color value, and the mapping relation frequency matrix 510 . A mapping image 520 may be generated by converting all values of α into pixel colors.

비분류 샘플 추출부(102)는 비분류 데이터 테이블(110)에서 기설정된 개수의 비분류 샘플 테이블을 추출한다. 이때, 비분류 샘플 테이블의 추출은 매칭 샘플 테이블의 추출과 같은 방법으로 이루어진다.The unclassified sample extraction unit 102 extracts a preset number of unclassified sample tables from the unclassified data table 110 . In this case, the extraction of the unclassified sample table is performed in the same way as the extraction of the matching sample table.

비분류 이미지 생성부(104)는 기설정된 개수의 비분류 샘플 테이블 각각을 이미지로 변환하여 기설정된 개수의 비분류 이미지를 생성한다.The unclassified image generator 104 generates a preset number of unclassified images by converting each of the preset number of unclassified sample tables into images.

비분류 이미지 생성부(104)는 기설정된 개수의 비분류 샘플 데이터 각각을 관계 프리퀀시 메트릭스로 변환하여 비분류 관계 프리퀀시 메트릭스로 생성하고, 비분류 관계 프리퀀시 메트릭스의 값을 픽셀의 색상값으로 하는 이미지로 변환하여 비분류 이미지를 생성할 수 있다.The unclassified image generating unit 104 converts each of a preset number of unclassified sample data into a relational frequency matrix to generate an unclassified relational frequency matrix, and an image in which the value of the unclassified relational frequency matrix is the color value of a pixel. It can be converted to create an unclassified image.

이때, 비분류 관계 프리퀀시 메트릭스는 비분류 샘플 테이블에 포함된 특정 필드에 포함된 데이터에 관련된 중복되지 않은 데이터의 수를 카운트한 메트릭스이다.In this case, the non-classification relationship frequency matrix is a matrix counting the number of non-redundant data related to data included in a specific field included in the non-classification sample table.

그리고, 비분류 관계 프리퀀시 메트릭스는 매핑 관계 프리퀀시 메트릭스와 같은 방식으로 생성되고, 비분류 이미지는 매칭 이미지와 같은 방식으로 생성된다.In addition, the non-classification relation frequency matrix is generated in the same manner as the mapping relation frequency matrix, and the non-classification image is generated in the same manner as the matching image.

매핑 계산부(106)는 컴볼루션 신경망(CNN; Convolution Neural Network)을 이용해서 기설정된 개수의 비분류 이미지와 기설정된 개수의 매핑 이미지 간의 매핑 정확도를 계산한다.The mapping calculator 106 calculates mapping accuracy between a preset number of unclassified images and a preset number of mapping images using a convolutional neural network (CNN).

보다 구체적으로, 매핑 계산부(106)는 기설정된 개수의 비분류 이미지와 기설정된 개수의 매핑 이미지를 컴볼루션 신경망에 입력하여 비분류 이미지와 유사도가 기설정된 확률 이상인 매핑 이미지를 확인하고, 비분류 이미지와 유사도가 기설정된 확률 이상인 매핑 이미지를 카운트해서 검색된 확률을 이미지와 기설정된 개수의 매핑 이미지 간의 매핑 정확도로 계산한다.More specifically, the mapping calculation unit 106 inputs a preset number of unclassified images and a preset number of mapping images to the convolutional neural network to check the mapping image having a similarity with the unclassified image equal to or greater than a preset probability, and non-classification By counting the mapping images having a similarity with the image equal to or greater than a preset probability, the retrieved probability is calculated as the mapping accuracy between the image and the preset number of mapping images.

분류부(107)는 매핑 정확도를 기반으로 비분류 데이터 테이블의 데이터를 분류해서 릴레이션 스키마를 확인할 수 있다. 이때, 분류부(107)는 컴볼루션 신경망에 포함될 수도 있다.The classification unit 107 may classify the data of the unclassified data table based on the mapping accuracy to check the relation schema. In this case, the classifier 107 may be included in the convolutional neural network.

보다 구체적으로 분류부(107)는 비분류 이미지와 매핑 이미지가 매칭된 확률을 나타내는 매핑 정확도가 기설정된 매핑값 이상이면, 매핑 이미지에 대응하는 매핑 샘플 테이블의 릴레이션 스키마 정보를 이용해서 비분류 데이터 테이블의 데이터를 분류해서 릴레이션 스키마를 확인할 수 있다.More specifically, if the mapping accuracy indicating the matching probability between the unclassified image and the mapped image is greater than or equal to a preset mapping value, the classification unit 107 uses the relation schema information of the mapping sample table corresponding to the mapping image to the unclassified data table. You can check the relation schema by classifying the data of

이때, 매핑 샘플 테이블의 릴레이션 스키마 정보는 매핑 샘플 테이블의 각 필드를 구분하는 정보로서, 케이스 아이디, 타임스탬프, 리소스, 속성 중 하나일 수 있다.In this case, the relation schema information of the mapping sample table is information for classifying each field of the mapping sample table, and may be one of a case ID, a timestamp, a resource, and an attribute.

도 2는 본 발명의 일 실시 예에 따를 데이터 분류 장치에서 비분류 데이터 테이블의 데이터를 분류하는 과정을 도시한 흐름도이다.2 is a flowchart illustrating a process of classifying data in an unclassified data table in a data classification apparatus according to an embodiment of the present invention.

도 2를 참조하면, 데이터 분류 장치(100)는 비분류 데이터 테이블과 매핑 스키마 데이터 테이블을 수신한다(210).Referring to FIG. 2 , the data classification apparatus 100 receives an unclassified data table and a mapping schema data table ( 210 ).

그리고, 데이터 분류 장치(100)는 비분류 데이터 테이블에서 기설정된 개수의 비분류 샘플 테이블을 추출한다(212).Then, the data classification apparatus 100 extracts a preset number of unclassified sample tables from the unclassified data table ( 212 ).

그리고, 데이터 분류 장치(100)는 기설정된 개수의 비분류 샘플 테이블을 관계 프리퀀시 메트릭스로 변환하여 비분류 관계 프리퀀시 메트릭스로 생성한다(214).Then, the data classification apparatus 100 converts a predetermined number of unclassified sample tables into relational frequency metrics and generates them as nonclassified relational frequency metrics ( 214 ).

그리고, 데이터 분류 장치(100)는 비분류 관계 프리퀀시 메트릭스의 값을 픽셀의 색상값으로 하는 이미지로 변환하여 비분류 이미지를 생성한다(216).Then, the data classification apparatus 100 generates an unclassified image by converting the value of the unclassified relational frequency matrix into an image having a pixel color value ( 216 ).

한편, 데이터 분류 장치(100)는 매핑 스키마 데이터 테이블에서 기설정된 개수의 매핑 샘플 테이블을 추출한다(218).Meanwhile, the data classification apparatus 100 extracts a preset number of mapping sample tables from the mapping schema data table ( 218 ).

그리고, 데이터 분류 장치(100)는 기설정된 개수의 매핑 샘플 테이블 각각을 관계 프리퀀시 메트릭스로 변환하여 매핑 관계 프리퀀시 메트릭스로 생성한다(220).Then, the data classification apparatus 100 converts each of the predetermined number of mapping sample tables into a relational frequency matrix to generate a mapping relational frequency matrix ( 220 ).

이때, 매핑 관계 프리퀀시 메트릭스는 ,매핑 샘플 테이블에 포함된 특정 필드에 포함된 데이터에 관련된 중복되지 않은 데이터의 수를 카운트한 메트릭스이다.In this case, the mapping relationship frequency matrix is a matrix counting the number of non-overlapping data related to data included in a specific field included in the mapping sample table.

그리고, 데이터 분류 장치(100)는 매핑 관계 프리퀀시 메트릭스의 값을 픽셀의 색상값으로 하는 이미지로 변환하여 매핑 이미지를 생성한다(222).Then, the data classification apparatus 100 generates a mapping image by converting the value of the mapping relation frequency matrix into an image using the color value of the pixel ( 222 ).

여기서, 212단계에서 216단계와 218단계에서 220단계는 동시에 병렬로 처리될 수도 있지만, 212단계에서 216단계 또는 218단계에서 220단계가 먼저 수행될 수도 있다.Here, steps 212 to 216 and 218 to 220 may be simultaneously processed in parallel, but steps 212 to 216 or 218 to 220 may be performed first.

그리고, 데이터 분류 장치(100)는 컴볼루션 신경망을 이용해서 기설정된 개수의 비분류 이미지와 기설정된 개수의 매핑 이미지 간의 매핑 정확도를 계산한다(224).Then, the data classification apparatus 100 calculates the mapping accuracy between the preset number of unclassified images and the preset number of mapped images using the convolutional neural network ( 224 ).

그리고, 데이터 분류 장치(100)는 비분류 이미지와 매핑 이미지가 매칭된 확률을 나타내는 매핑 정확도가 기설정된 매핑값 이상이면, 매핑 이미지에 대응하는 매핑 샘플 테이블의 릴레이션 스키마 정보를 이용해서 비분류 데이터 테이블의 데이터를 분류한다(226).And, if the mapping accuracy indicating the matching probability between the unclassified image and the mapped image is equal to or greater than a preset mapping value, the data classification apparatus 100 uses the relation schema information of the mapping sample table corresponding to the mapping image to the unclassified data table. Classify the data of (226).

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, those skilled in the art may apply various technical modifications and variations based on the above. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

100: 데이터 분류 장치
101: 수신부
102: 비분류 샘플 추출부
103: 매핑 샘플 추출부
104: 비분류 이미지 생성부
105: 매핑 이미지 생성부
106: 매핑 계산부
107: 분류부
110: 비분류 데이터 테이블
120: 매핑 스키마 데이터 테이블100: data classification device
101: receiver
102: unclassified sample extraction unit
103: mapping sample extraction unit
104: unclassified image generation unit
105: mapping image generation unit
106: mapping calculator
107: classification unit
110: unclassified data table
120: mapping schema data table

Claims

a receiver for receiving the unclassified data table and the mapping schema data table;
an unclassified sample extraction unit for extracting a preset number of unclassified sample tables from the unclassified data table;
an unclassified image generator configured to convert each of the preset number of unclassified sample tables into images to generate a preset number of unclassified images;
a mapping sample extraction unit for extracting a preset number of mapping sample tables from the mapping schema data table;
a mapping image generator for generating a preset number of mapping images by converting each of the preset number of mapping sample tables into an image;
a mapping calculator configured to calculate mapping accuracy between the preset number of unclassified images and the preset number of mapped images using a convolutional neural network; and
A classification unit that classifies data in the unclassified data table based on the mapping accuracy to check a relation schema
A data classification device comprising a.

According to claim 1,
The unclassified image generation unit,
An unclassified relationship frequency matrix is generated, which is a matrix generated by counting the number of non-duplicate data related to data included in a specific field included in the predetermined number of unclassified sample tables, and the value of the unclassified relationship frequency metric Converts to an image with pixel color values to create an unclassified image.
data classification device.

delete

According to claim 1,
The mapping image generation unit,
Counting the number of non-overlapping data related to data included in a specific field included in each of the predetermined number of mapping sample tables to generate a mapping relationship frequency matrix, which is a matrix generated respectively, and the value of the mapping relationship frequency matrix Converting to an image with pixel color values to create a mapping image
data classification device.

delete

According to claim 1,
The classification unit,
When the mapping accuracy indicating the matching probability between the unclassified image and the mapping image is equal to or greater than a preset mapping value, the data of the unclassified data table is obtained using relation schema information of the mapping sample table corresponding to the mapping image. Classifying and checking the relation schema
data classification device.

7. The method of claim 6,
The relation schema information of the mapping sample table is,
As information for classifying each field of the mapping sample table, one of a case ID, a timestamp, a resource, and an attribute
data classification device.

receiving an unclassified data table and a mapping schema data table;
extracting a predetermined number of unclassified sample tables from the unclassified data table;
generating a preset number of unclassified images by converting each of the preset number of unclassified sample tables into images;
extracting a preset number of mapping sample tables from the mapping schema data table;
generating a preset number of mapping images by converting each of the preset number of mapping sample tables into an image;
calculating mapping accuracy between the preset number of unclassified images and the preset number of mapped images using a convolutional neural network; and
Classifying the data of the unclassified data table based on the mapping accuracy to check the relation schema
A method of classifying data in a data classification device comprising a.

9. The method of claim 8,
The step of generating the predetermined number of unclassified images by converting each of the predetermined number of unclassified sample tables into images,
generating an unclassified relationship frequency matrix, which is a matrix generated by counting the number of non-redundant data related to data included in a specific field included in the preset number of unclassified sample tables; and
generating an unclassified image by converting the value of the unclassified relational frequency matrix into an image having a color value of a pixel
A method of classifying data in a data classification device comprising a.

9. The method of claim 8,
The step of generating a preset number of mapping images by converting each of the preset number of mapping sample tables into an image,
generating a mapping relationship frequency matrix, which is a matrix generated by counting the number of non-overlapping data related to data included in a specific field included in each of the preset number of mapping sample tables; and
generating a mapping image by converting the value of the mapping relation frequency matrix into an image having a color value of a pixel
A method of classifying data in a data classification device comprising a.

9. The method of claim 8,
The step of classifying the data of the unclassified data table based on the mapping accuracy to check the relation schema,
When the mapping accuracy indicating the matching probability between the unclassified image and the mapping image is equal to or greater than a preset mapping value, the data of the unclassified data table is obtained using relation schema information of the mapping sample table corresponding to the mapping image. to classify
A method of classifying data in a data classification device.

12. The method of claim 11,
The relation schema information of the mapping sample table is,
As information for classifying each field of the mapping sample table, one of a case ID, a timestamp, a resource, and an attribute
A method of classifying data in a data classification device.

13. A computer-readable recording medium in which a program for executing the method of any one of claims 8 to 12 is recorded.