KR102419956B1

KR102419956B1 - Method for automatically generating metadata for structured data and apparatus for generating metadata using a machine learning/deep learning model for the same

Info

Publication number: KR102419956B1
Application number: KR1020210155464A
Authority: KR
Inventors: 마보현
Original assignee: 주식회사 스타캣
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2022-07-14

Abstract

According to an embodiment of the present invention, a method by which a metadata generation apparatus automatically generates metadata for structured data by using a machine learning/deep learning model comprises the steps of: (a) receiving structured data; (b) generating metadata for the structured data by applying metadata generation rules to the received structured data; and (c) learning the generated metadata by using a machine learning/deep learning model to update the metadata generation rules, wherein the metadata generation rules include at least one of full name setting and recommendation rules for a data field name, description recommendation rules for a dataset and a field, and hash tag generation rules for the dataset.

Description

A method for automatically generating metadata for structured data and an apparatus for generating metadata using a machine learning/deep learning model for the same

본 발명은 정형 데이터에 대한 메타데이터 자동 생성 방법 및 이를 위한 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치에 관한 것이다. 보다 자세하게는 인공지능 모델인 머신러닝/딥러닝 모델에 적용하기 위한 메타데이터를 정형 데이터로부터 손쉽고 간편하게 생성할 수 있는 방법 및 장치에 관한 것이다. The present invention relates to a method for automatically generating metadata for structured data and an apparatus for generating metadata using a machine learning/deep learning model therefor. More specifically, it relates to a method and apparatus for easily and conveniently generating metadata for application to machine learning/deep learning models, which are artificial intelligence models, from structured data.

머신러닝(Machine Learning)/딥러닝(Deep Learning) 모델을 포함하는 광범위한 인공지능(A.I, Artificial Intelligence) 모델에 데이터를 적용하기 위해서는 로우한 상태의 정리되지 않은 데이터를 그대로 적용하는 것보다는 해당 데이터를 설명해줄 수 있는 구조화된 데이터인 메타데이터를 우선적으로 파악하고 식별하는 것이 요구된다. In order to apply data to a wide range of artificial intelligence (A.I) models, including machine learning/deep learning models, the data should be It is required to first identify and identify metadata, which is structured data that can be explained.

한편, 데이터 중, 정형 데이터의 경우 각 필드의 데이터 타입, 값의 통계적 특성, 최대/최소/평균/표준 편차 등과 같은 분포, 이상치에 해당하는 범위, 데이터의 버전, 데이터 및 필드의 성명 등과 같은 메타데이터가 일목요연(一目瞭然)하게 생성된 상태에서 인공지능 모델에 적용됨으로써 모델의 성능이 현저하게 향상될 수 있는바, 이는 데이터에 대한 일종의 전처리 과정으로 볼 수 있으며 종래에는 인건비가 높은 전문 인력이 파이썬(Python)과 같은 고급 프로그래밍 언어를 이용하여 데이터를 탐색하거나 필드의 데이터 타입을 일일이 결정하는 전처리 방식이 주를 이뤘다. On the other hand, in the case of structured data among data, meta such as data type of each field, statistical characteristics of values, distribution such as maximum/minimum/mean/standard deviation, range corresponding to outliers, data version, data and field names, etc. The performance of the model can be remarkably improved by being applied to the artificial intelligence model in a state in which the data is generated at a glance, which can be viewed as a kind of preprocessing process for data. Most of the preprocessing methods were to search for data using a high-level programming language such as Python) or to determine the data type of a field one by one.

그러나 이러한 종래의 전처리 방식은 전문 인력의 역량과 이들 각각이 데이터를 바라보는 관점에 따라 동일한 데이터에 대한 전처리 결과가 상이해지고, 이는 인공지능 모델의 성능에까지 영향을 미칠 수 있다는 문제점이 있었다. 또한, 전문 인력이 직접 처리하는 것이기에 많은 시간과 높은 인건비가 소요될 수밖에 없는바, 자금 운용에 여력이 없는 스타트업이나 중소기업의 경우 전처리 작업을 수행하지 못하거나 내부 인력을 통해 긴 시간에 걸쳐 전처리 작업을 수행함으로써 출시된 제품의 성능이 저하되거나 제품 출시의 시기가 늦춰진다는 문제점까지 존재하였다. However, this conventional pre-processing method has a problem that the pre-processing results for the same data are different depending on the capabilities of the professional manpower and the point of view of each of them, which may affect the performance of the artificial intelligence model. In addition, since it is handled directly by a professional manpower, a lot of time and high labor cost are inevitably required. In the case of startups or small and medium-sized enterprises that cannot afford to manage funds, it is impossible to perform pre-processing work, or the pre-processing work is carried out for a long time through internal manpower. There was even a problem that the performance of the released product was lowered or the time of product release was delayed.

본 발명은 이러한 문제점들을 반영하여 데이터의 전처리 과정에 해당하는 메타데이터의 생성을 전문 인력의 처리 없이 손쉽고 간편하게 수행할 수 있는 방법 및 장치에 관한 것이다. Reflecting these problems, the present invention relates to a method and apparatus for easily and conveniently performing generation of metadata corresponding to a data pre-processing process without professional personnel processing.

대한민국 등록특허공보 제10-2310598호(2021.10.01)Republic of Korea Patent Publication No. 10-2310598 (01.01.02)

본 발명이 해결하고자 하는 기술적 과제는 정형 데이터에 대한 데이터 전처리 과정에 해당하는 메타 데이터의 생성을 전문 인력의 처리 없이 손쉽고 간편하게 수행할 수 있는 정형 데이터에 대한 메타데이터 자동 생성 방법 및 이를 위한 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치를 제공하는 것이다. The technical problem to be solved by the present invention is a method for automatically generating metadata for structured data that can easily and conveniently perform generation of metadata corresponding to a data preprocessing process for structured data without professional manpower processing, and machine learning/ It is to provide an apparatus for generating metadata using a deep learning model.

본 발명의 해결하고자 하는 기술적 과제는 정형 데이터에 대한 메타데이터의 생성 결과를 지속적으로 학습함으로써 머신러닝/딥러닝 모델의 성능을 나날이 향상시킬 수 있는 정형 데이터에 대한 메타데이터 자동 생성 방법 및 이를 위한 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치를 제공하는 것이다. A technical problem to be solved by the present invention is a method for automatically generating metadata for structured data that can improve the performance of a machine learning/deep learning model day by day by continuously learning the result of generating metadata for structured data, and a machine for the same It is to provide a metadata generation device using a learning/deep learning model.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the following description.

상기 기술적 과제를 달성하기 위한 본 발명의 일 실시 예에 따른 머신러닝(Machine Learning)/딥러닝(Deep Learning) 모델을 이용한 메타데이터(Metadata) 생성 장치가 정형 데이터에 대한 메타데이터를 자동으로 생성하는 방법은 (a) 정형 데이터를 수신하는 단계, (b) 상기 수신한 정형 데이터에 메타데이터 생성 규칙을 적용하여 상기 정형 데이터에 대한 메타데이터를 생성하는 단계 및 (c) 상기 생성한 메타데이터를 머신러닝/딥러닝 모델로 학습하여 상기 메타데이터 생성 규칙을 업데이트하는 단계를 포함하며, 상기 메타데이터 생성 규칙은, 데이터 필드명의 풀네임(Full Name) 설정과 추천 규칙, 데이터셋과 필드에 대한 설명 추천 규칙 및 상기 데이터셋에 대한 해시태그 생성 규칙 중 어느 하나 이상을 포함한다. The apparatus for generating metadata using a machine learning/deep learning model according to an embodiment of the present invention for achieving the above technical task automatically generates metadata for structured data. The method comprises the steps of (a) receiving structured data, (b) applying a metadata generation rule to the received structured data to generate metadata for the structured data, and (c) converting the generated metadata into a machine. and updating the metadata generation rule by learning with a learning/deep learning model, wherein the metadata generation rule recommends setting a full name of a data field name, a recommendation rule, and a description of a dataset and field It includes any one or more of a rule and a rule for generating a hashtag for the dataset.

일 실시 예에 따르면, 상기 메타데이터 생성 규칙이 데이터 필드명의 풀네임(Full Name) 설정과 추천 규칙을 포함하는 경우, 상기 (b) 단계는, (b-1) 상기 데이터 필드명의 풀네임 설정과 추천 규칙에 관한 필드 매핑 테이블(Field Mapping Table)이 포함하는 필드명/풀네임 매핑 테이블 - 상기 필드명/풀네임 매핑 테이블은 하나 이상의 필드명 각각에 대하여 (풀네임 ID, 참조 횟수)가 하나 이상 매핑됨- 에 상기 수신한 정형 데이터의 필드명을 포함하는지 판단하는 단계, (b-2) 상기 (b-1) 단계의 판단 결과, 포함하는 것으로 판단되었다면, 상기 필드명에 대하여 참조 횟수가 가장 높은 풀네임 ID를 탐색하는 단계 및 (b-3) 상기 탐색한 참조 횟수가 가장 높은 풀네임 ID를 상기 필드 매핑 테이블이 포함하는 풀네임 정보 테이블에서 탐색하고 이에 매핑된 풀네임을 상기 수신한 정형 데이터에 대한 메타데이터 필드명의 풀네임으로 설정하는 단계를 포함할 수 있다. According to an embodiment, when the metadata generation rule includes a full name setting of a data field name and a recommendation rule, the step (b) includes: (b-1) setting a full name of the data field name; Field name/full name mapping table included in the field mapping table related to the recommendation rule - The field name/full name mapping table contains at least one (full name ID, reference count) for each of one or more field names. Determining whether the field name of the received structured data is included in mapped-, (b-2) If it is determined that the field name includes the field name as a result of the determination in step (b-1), the number of references to the field name is the most Searching for a high full name ID and (b-3) searching for a full name ID having the highest number of references in the field mapping table in the full name information table including the searched full name ID, and receiving the full name mapped thereto It may include setting the full name of the metadata field name for the data.

일 실시 예에 따르면, 상기 (b-3) 단계 이후에, (b-4) 상기 필드명/풀네임 매핑 테이블이 포함하는 상기 수신한 정형 데이터의 필드명에 대하여 상기 참조 횟수가 가장 높은 풀네임 ID에 대한 참조 횟수를 1 증가시키는 단계 및 (b-5) 상기 풀네임 정보 테이블이 포함하는 상기 탐색한 참조 횟수가 가장 높은 풀네임 ID에 대한 참조 횟수를 1 증가시키는 단계를 더 포함할 수 있다. According to an embodiment, after step (b-3), (b-4) the full name with the highest reference count with respect to the field name of the received structured data included in the field name/full name mapping table The method may further include increasing the reference count for the ID by 1 and (b-5) increasing the reference count for the full name ID having the highest number of references included in the full name information table by 1. .

일 실시 예에 따르면, (b-2′) 상기 (b-1) 단계의 판단 결과, 포함하지 않는 것으로 판단되었다면, 상기 수신한 정형 데이터에 대한 메타데이터 필드명의 풀네임을 사용자로부터 한 글자 이상 수신하는 단계, (b-3′) 상기 사용자로부터 수신한 한 글자 이상의 메타데이터 필드명의 풀네임에 대하여 그 뒤에 이어지는 글자 또는 단어를 예측하여 완성된 풀네임을 추천하는 단계 및 (b-4′) 상기 추천한 완성된 풀네임을 상기 사용자로부터 선택 받는지 판단하는 단계를 포함할 수 있다. According to an embodiment, (b-2′) if it is determined that the data is not included as a result of the determination in step (b-1), one or more characters are received from the user for the full name of the metadata field name for the received structured data. (b-3′) predicting a character or word following the full name of a metadata field name of one or more characters received from the user and recommending a complete full name; and (b-4′) the full name It may include the step of determining whether the recommended completed full name is selected by the user.

일 실시 예에 따르면, (b-5′) 상기 (b-4′) 단계의 판단 결과, 사용자로부터 선택 받았다면, 상기 풀네임 정보 테이블이 포함하는 상기 선택 받은 풀네임에 대한 참조 횟수를 1 증가시키는 단계 및 (b-6′) 상기 필드명/풀네임 매핑 테이블이 포함하는 하나 이상의 필드명 중, 상기 선택 받은 풀네임에 매칭된 필드명에 대한 참조 횟수를 1 증가시키는 단계를 더 포함할 수 있다. According to an embodiment, if (b-5') is selected by the user as a result of the determination in step (b-4'), the number of references to the selected full name included in the full name information table is increased by 1. and (b-6′) incrementing the number of references to a field name matching the selected full name among one or more field names included in the field name/full name mapping table by 1 have.

일 실시 예에 따르면, (b-5″) 상기 (b-4′) 단계의 판단 결과, 사용자로부터 선택 받지 않았다면, 상기 수신한 정형 데이터에 대한 메타데이터 필드명의 풀네임을 사용자로부터 전부 수신하는 단계, (b-6″) 상기 사용자로부터 전부 수신한 메타데이터 필드명의 풀네임을 상기 풀네임 정보 테이블이 포함하는지 판단하는 단계, (b-7″) 상기 (b-6″) 단계의 판단 결과, 포함하는 것으로 판단되었다면, 상기 풀네임 정보 테이블이 포함하는 상기 사용자로부터 전부 수신한 메타데이터 필드명의 풀네임에 대한 참조 횟수를 1 증가시키는 단계, (b-8″) 상기 사용자로부터 전부 수신한 메타데이터 필드명의 풀네임과 매핑되는 정형 데이터의 필드명을 상기 필드명/풀네임 매핑 테이블이 포함하는지 판단하는 단계 및 (b-9″) 상기 (b-8″) 단계의 판단 결과, 포함하지 않는 것으로 판단되었다면, 상기 필드명/풀네임 매핑 테이블에 상기 사용자로부터 전부 수신한 메타데이터 필드명의 풀네임 또는 이에 대한 약어를 추가하고, 상기 풀네임 정보 테이블이 포함하는 상기 사용자로부터 전부 수신한 메타데이터 필드명의 풀네임에 대한 풀네임 ID를 추가하되, 참조 횟수는 1로 설정하는 단계를 더 포함할 수 있다. According to an embodiment, (b-5″) if it is determined in step (b-4′) that the user has not selected the full name of the metadata field name for the received structured data, the step of receiving all the full names from the user , (b-6″) determining whether the full name information table includes the full name of the metadata field name all received from the user, (b-7″) as a result of the determination in step (b-6″), If it is determined to include, increasing the number of references to the full name of the metadata field name all received from the user included in the full name information table by 1, (b-8″) metadata all received from the user Determining whether the field name/full name mapping table includes the field name of the structured data mapped with the full name of the field name, and (b-9″) as a result of the determination of the step (b-8″), it does not include If it is determined, the full name or an abbreviation thereof is added to the full name of the metadata field name received from the user in the field name/full name mapping table, and the full name of the metadata field name received from the user included in the full name information table is added. The method may further include adding a full name ID for the full name, but setting the reference count to 1.

일 실시 예에 따르면, (b-7′″) 상기 (b-6″) 단계의 판단 결과, 포함하지 않는 것으로 판단되었다면, 상기 풀네임 정보 테이블에 상기 사용자로부터 전부 수신한 메타데이터 필드명의 풀네임 및 이에 대한 풀네임 ID를 추가하고, 참조 횟수를 1로 설정하는 단계를 더 포함할 수 있다. According to an embodiment, (b-7′″) If it is determined that the information is not included as a result of the determination in step (b-6″), the full name of the metadata field name received from the user in the full name information table and adding a full name ID thereto, and setting the reference count to 1.

일 실시 예에 따르면, 상기 메타데이터 생성 규칙이 데이터셋과 필드에 대한 설명 추천 규칙을 포함하는 경우, 상기 (b) 단계는, (b-Ⅰ) 상기 데이터셋의 이름 및 상기 필드의 풀네임 중 어느 하나 이상을 이용하여 데이터셋과 필드에 대한 설명 추천 규칙에 관한 필드 매핑 테이블이 포함하는 풀네임 정보 테이블 상의 필드 설명 항목에 기재될 내용을 추천하는 단계를 포함할 수 있다. According to an embodiment, when the metadata generation rule includes a description recommendation rule for a dataset and a field, the step (b) includes: (b-I) the name of the dataset and the full name of the field. The method may include recommending content to be described in a field description item on a full name information table included in a field mapping table related to a data set and a field description recommendation rule using one or more.

일 실시 예에 따르면, 상기 데이터셋에 대한 해시태그 생성 규칙을 포함하는 경우, 상기 (b) 단계는, (b-Ⅰ′) 상기 데이터셋의 이름과 설명 및 상기 데이터셋을 구성하는 각 필드들의 풀네임 중 어느 하나 이상을 이용하여 상기 데이터셋에 대한 하나 이상의 해시태그를 추천하는 단계를 포함할 수 있다. According to an embodiment, when a hash tag generation rule for the dataset is included, the step (b) includes: (b-I′) the name and description of the dataset, and each field constituting the dataset. It may include recommending one or more hashtags for the dataset using any one or more of the full names.

상기 기술적 과제를 달성하기 위한 본 발명의 일 실시 예에 따른 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치는 하나 이상의 프로세서, 네트워크 인터페이스, 상기 프로세서에 의해 수행되는 컴퓨터 프로그램을 로드(Load)하는 메모리 및 대용량 네트워크 데이터 및 상기 컴퓨터 프로그램을 저장하는 스토리지를 포함하되, 상기 컴퓨터 프로그램은 상기 하나 이상의 프로세서에 의해, (A) 정형 데이터를 수신하는 오퍼레이션, (B) 상기 수신한 정형 데이터에 메타데이터 생성 규칙을 적용하여 상기 정형 데이터에 대한 메타데이터를 생성하는 오퍼레이션 및 (C) 상기 생성한 메타데이터를 머신러닝/딥러닝 모델로 학습하여 상기 메타데이터 생성 규칙을 업데이트하는 오퍼레이션을 실행하며, 상기 메타데이터 생성 규칙은, 데이터 필드명의 풀네임(Full Name) 설정과 추천 규칙, 데이터셋과 필드에 대한 설명 추천 규칙 및 상기 데이터셋에 대한 해시태그 생성 규칙 중 어느 하나 이상을 포함한다. Metadata generation apparatus using a machine learning / deep learning model according to an embodiment of the present invention for achieving the above technical problem is one or more processors, a network interface, a memory for loading a computer program executed by the processor and storage for storing large-capacity network data and the computer program, wherein the computer program is configured by the one or more processors to perform (A) an operation for receiving structured data, (B) rules for generating metadata in the received structured data. and (C) learning the generated metadata with a machine learning/deep learning model to update the metadata generation rule by applying The rules include any one or more of a full name setting and recommendation rule for a data field name, a description recommendation rule for a dataset and field, and a hash tag generation rule for the dataset.

상기와 같은 본 발명에 따르면, 정형 데이터가 데이터마다 상이한 약어로 구성된 필드명을 포함하고 있다 할지라도 전문 인력이 개별적으로 확인하여 처리함이 없이 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치가 필드 매핑 테이블을 이용하여 가장 적합한 풀네임을 자동으로 설정해주기 때문에 불필요한 시간 및 비용 소모를 방지할 수 있으며, 데이터에 대한 전처리 프로세스를 손쉽고 간편하게 수행할 수 있다는 효과가 있다. According to the present invention as described above, even if the structured data includes field names composed of different abbreviations for each data, the metadata generating apparatus using the machine learning/deep learning model can Since the most suitable full name is automatically set using the mapping table, unnecessary time and cost consumption can be prevented, and the data pre-processing process can be easily and conveniently performed.

또한, 필드 매핑 테이블에 필드명 또는 이에 대한 풀네임 중 어느 하나 이상이 포함되어 있지 않은 경우에는 이를 포함시켜 필드 매핑 테이블을 신속하게 업데이트함으로써 그 이후의 메타데이터 생성에 정확도를 향상시킬 수 있다는 효과가 있다. In addition, if the field mapping table does not contain any one or more of the field name or the full name, it has the effect of improving the accuracy of subsequent metadata creation by including it to quickly update the field mapping table. have.

또한, 필드 매핑 테이블이 업데이트됨으로써 데이터 필드명의 풀네임 설정과 추천 규칙 그리고 생성된 메타데이터를 머신러닝/딥러닝 모델로 지속적으로 학습할 수 있는바, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치의 사용에 따라 머신러닝/딥러닝 모델의 성능 및 풀네임 설정과 추천 성능이 나날이 향상될 수 있다는 효과가 있다. In addition, as the field mapping table is updated, it is possible to continuously learn the full name setting of the data field name, the recommendation rule, and the generated metadata with the machine learning/deep learning model. It has the effect that the performance of machine learning/deep learning models, full name setting, and recommendation performance can be improved day by day depending on the use of

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해 될 수 있을 것이다.Effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 제1 실시 예에 따른 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치가 포함하는 전체 구성을 나타낸 도면이다.
도 2는 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법의 대표적인 단계를 나타낸 순서도이다.
도 3은 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 정형 데이터를 수신하고, 메타데이터 생성 규칙을 적용하여 메타데이터를 생성한 후, 이를 학습함으로써 메타데이터 생성 규칙을 업데이트하는 모습을 도시한 모식도이다.
도 4는 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법에 있어서, 가장 핵심적인 단계에 해당하는 S220 단계, 보다 구체적으로 메타데이터 생성 규칙이 데이터 필드명의 풀네임 설정과 추천 규칙을 포함하는 경우를 구체화한 순서도이다.
도 5는 도 4에 도시된 순서도에서 일부를 분리하여 도시한 순서도 1이다.
도 6은 도 4에 도시된 순서도에서 일부를 분리하여 도시한 순서도 2이다.
도 7은 메타데이터 생성 규칙이 데이터 필드명의 풀네임 설정과 추천 규칙을 포함하는 경우에 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법에서 이용되는 필드 매핑 테이블을 예시적으로 도시한 도면이다.
도 8은 S220-1 단계 내지 S220-5 단계에 따라 필드 매핑 테이블에서 참조 횟수가 업데이트되는 모습을 예시적으로 도시한 도면이다.
도 9는 S220-2′ 단계 내지 S220-6′ 단계에 따라 필드 매핑 테이블에서 참조 횟수가 업데이트되는 모습을 예시적으로 도시한 도면이다.
도 10은 S220-2′ 단계 내지 S220-6′ 단계에 따라 필드명/풀네임 매핑 테이블에 새로운 필드명이 업데이트되는 모습을 예시적으로 도시한 도면이다.
도 11은 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법에 있어서, 메타데이터 생성 규칙이 데이터셋과 필드에 대한 설명 추천 규칙을 포함하는 경우를 구체화한 순서도이다.
도 12는 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법에 있어서, 메타데이터 생성 규칙이 데이터셋에 대한 해시태그 생성 규칙을 포함하는 경우를 구체화한 순서도이다. 1 is a view showing the overall configuration included in the apparatus for generating metadata using a machine learning/deep learning model according to a first embodiment of the present invention.
2 is a flowchart illustrating representative steps of a method for automatically generating metadata for structured data according to a second embodiment of the present invention.
3 is a view illustrating that the metadata generation apparatus 100 using a machine learning/deep learning model receives structured data, generates metadata by applying the metadata generation rule, and updates the metadata generation rule by learning it. is a schematic diagram showing
4 is a method for automatically generating metadata for structured data according to a second embodiment of the present invention, in step S220 corresponding to the most essential step, more specifically, in the metadata generation rule, full name setting and recommendation of a data field name. It is a flow chart embodying the case including rules.
FIG. 5 is a flowchart 1 in which a part is separated from the flowchart shown in FIG. 4 .
FIG. 6 is a flowchart 2 in which a part is separated from the flowchart shown in FIG. 4 .
7 exemplarily illustrates a field mapping table used in a method for automatically generating metadata for structured data according to a second embodiment of the present invention when the metadata generation rule includes a full name setting of a data field name and a recommendation rule. It is the drawing shown.
8 is a diagram exemplarily showing how the reference count is updated in the field mapping table according to steps S220-1 to S220-5.
9 is a diagram exemplarily showing how the reference count is updated in the field mapping table according to steps S220-2' to S220-6'.
10 is a diagram exemplarily illustrating a state in which a new field name is updated in the field name/full name mapping table according to steps S220-2' to S220-6'.
11 is a flowchart illustrating a case in which a metadata generation rule includes a description recommendation rule for a dataset and a field in a method for automatically generating metadata for structured data according to a second embodiment of the present invention.
12 is a flowchart illustrating a case in which a metadata generation rule includes a hash tag generation rule for a dataset in a method for automatically generating metadata for structured data according to a second embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시 예를 상세히 설명한다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 게시되는 실시 예에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시 예들은 본 발명의 게시가 완전하도록 하고, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Advantages and features of the present invention, and a method for achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments published below, but may be implemented in various different forms, and only these embodiments allow the publication of the present invention to be complete, and common knowledge in the technical field to which the present invention pertains. It is provided to fully inform the possessor of the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. Unless otherwise defined, all terms (including technical and scientific terms) used herein may be used with the meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless clearly defined in particular.

한편, 본 명세서에서 사용된 용어는 실시 예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Meanwhile, the terms used herein are for the purpose of describing the embodiments and are not intended to limit the present invention. As used herein, the singular also includes the plural, unless the phrase specifically states otherwise.

본 명세서에서 사용되는 "포함한다 (comprises)" 및/또는 "포함하는 (comprising)"은 언급된 구성 요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성 요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.As used herein, “comprises” and/or “comprising” refers to a referenced component, step, operation and/or element of one or more other components, steps, operations and/or elements. The presence or addition is not excluded.

도 1은 본 발명의 제1 실시 예에 따른 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 포함하는 전체 구성을 나타낸 도면이다. 1 is a view showing the overall configuration included in the metadata generating apparatus 100 using a machine learning/deep learning model according to a first embodiment of the present invention.

그러나 이는 본 발명의 목적을 달성하기 위한 바람직한 실시 예일 뿐이며, 필요에 따라 일부 구성이 추가되거나 삭제될 수 있고, 어느 한 구성이 수행하는 역할을 다른 구성이 함께 수행할 수도 있음은 물론이다. However, this is only a preferred embodiment for achieving the object of the present invention, some components may be added or deleted as necessary, and of course, a role performed by one component may be performed by another component.

본 발명의 제1 실시 예에 따른 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)는 프로세서(10), 네트워크 인터페이스(20), 메모리(30), 스토리지(40) 및 이들을 연결하는 데이터 버스(50)를 포함할 수 있으며, 그 자체로써 독립된 장치로 구현하거나 인하우스 시스템 및 공간 임대형 시스템 등과 같은 유형의 물리적인 서비스 서버 또는 무형의 클라우드(Cloud) 서비스 서버 등과 같은 형태로 구현할 수도 있다 할 것이다. The apparatus 100 for generating metadata using a machine learning/deep learning model according to the first embodiment of the present invention includes a processor 10, a network interface 20, a memory 30, a storage 40, and data connecting them. The bus 50 may be included, and may be implemented as an independent device by itself, or may be implemented in the form of a physical service server of a type such as an in-house system and a space rental system or an intangible cloud service server. will be.

프로세서(10)는 각 구성의 전반적인 동작을 제어한다. 프로세서(10)는 CPU(Central Processing Unit), MPU(Micro Processer Unit), MCU(Micro Controller Unit) 또는 본 발명이 속하는 기술 분야에서 널리 알려져 있는 형태의 프로세서 중 어느 하나일 수 있으며, 머신러닝 모델 프로세서 또는 딥러닝 모델 프로세서 등과 같이 인공지능 모델 프로세서로 구현할 수 있다. 아울러, 프로세서(10)는 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법을 수행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다. The processor 10 controls the overall operation of each component. The processor 10 may be any one of a central processing unit (CPU), a micro processor unit (MPU), a micro controller unit (MCU), or a type widely known in the art to which the present invention belongs, and a machine learning model processor Alternatively, it can be implemented as an artificial intelligence model processor, such as a deep learning model processor. In addition, the processor 10 may perform an operation on at least one application or program for performing the method for automatically generating metadata for structured data according to the second embodiment of the present invention.

네트워크 인터페이스(20)는 본 발명의 제1 실시 예에 따른 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)의 유무선 인터넷 통신을 지원하며, 그 밖의 공지의 통신 방식을 지원할 수도 있다. 따라서 네트워크 인터페이스(20)는 그에 따른 통신 모듈을 포함하여 구성될 수 있다.The network interface 20 supports wired/wireless Internet communication of the metadata generating apparatus 100 using the machine learning/deep learning model according to the first embodiment of the present invention, and may support other known communication methods. Accordingly, the network interface 20 may be configured to include a corresponding communication module.

메모리(30)는 각종 정보, 명령 및/또는 정보를 저장하며, 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법을 수행하기 위해 스토리지(40)로부터 하나 이상의 컴퓨터 프로그램(41)을 로드할 수 있다. 도 1에서는 메모리(30)의 하나로 RAM을 도시하였으나 이와 더불어 다양한 저장 매체를 메모리(30)로 이용할 수 있음은 물론이다. The memory 30 stores various types of information, commands and/or information, and one or more computer programs 41 from the storage 40 to perform the method for automatically generating metadata for structured data according to the second embodiment of the present invention. ) can be loaded. Although RAM is illustrated as one of the memories 30 in FIG. 1 , it goes without saying that various storage media can be used as the memory 30 .

스토리지(40)는 하나 이상의 컴퓨터 프로그램(41) 및 대용량 네트워크 정보(42)를 비임시적으로 저장할 수 있다. 이러한 스토리지(40)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 발명이 속하는 기술 분야에서 널리 알려져 있는 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체 중 어느 하나일 수 있다. The storage 40 may non-temporarily store one or more computer programs 41 and mass network information 42 . The storage 40 is a non-volatile memory such as a read only memory (ROM), an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a hard disk, a removable disk, or in the art to which the present invention pertains. It may be any one of widely known computer-readable recording media.

컴퓨터 프로그램(41)은 메모리(30)에 로드되어, 하나 이상의 프로세서(10)가 (A) 정형 데이터를 수신하는 오퍼레이션, (B) 상기 수신한 정형 데이터에 메타데이터 생성 규칙을 적용하여 상기 정형 데이터에 대한 메타데이터를 생성하는 오퍼레이션 및 (C) 상기 생성한 메타데이터를 머신러닝/딥러닝 모델로 학습하여 상기 메타데이터 생성 규칙을 업데이트하는 오퍼레이션을 실행할 수 있으며, 상기 메타데이터 생성 규칙은, 데이터 필드명의 풀네임(Full Name) 설정과 추천 규칙, 데이터셋과 필드에 대한 설명 추천 규칙 및 상기 데이터셋에 대한 해시태그 생성 규칙 중 어느 하나 이상을 포함할 수 있다. The computer program 41 is loaded into the memory 30, and the one or more processors 10 perform (A) an operation to receive the structured data, and (B) apply a metadata generation rule to the received structured data to apply the formatted data and (C) learning the generated metadata with a machine learning/deep learning model to update the metadata generation rule, wherein the metadata generation rule is a data field It may include any one or more of a full name setting and recommendation rule, a description recommendation rule for a dataset and field, and a hash tag generation rule for the dataset.

이상 간단하게 언급한 컴퓨터 프로그램(41)이 수행하는 오퍼레이션은 컴퓨터 프로그램(41)의 일 기능으로 볼 수 있으며, 보다 자세한 설명은 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법에 대한 설명에서 후술하도록 한다. The operation performed by the computer program 41 simply mentioned above can be viewed as a function of the computer program 41, and a more detailed description will be given of the method for automatically generating metadata for structured data according to the second embodiment of the present invention. It will be described later in the description.

데이터 버스(50)는 이상 설명한 프로세서(10), 네트워크 인터페이스(20), 메모리(30) 및 스토리지(40) 사이의 명령 및/또는 정보의 이동 경로가 된다. The data bus 50 serves as a movement path for commands and/or information between the processor 10 , the network interface 20 , the memory 30 , and the storage 40 described above.

이상 설명한 본 발명의 제1 실시 예에 따른 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 독립된 장치의 형태로 구현된 경우 사용자 단말(미도시)이 네트워크를 통해 해당 장치에 정형 데이터를 송신하거나 생성된 메타데이터를 수신 받을 수 있을 것이며, 서버의 형태로 구현된 경우 서버와 네트워크를 통해 연동된 전용 어플리케이션에서 제공하는 메타데이터 자동 생성 기능을 사용자 단말(미도시)에게 제공할 수 있을 것이다. When the metadata generating apparatus 100 using the machine learning/deep learning model according to the first embodiment of the present invention described above is implemented in the form of an independent device, the user terminal (not shown) transmits the structured data to the device through the network. will be able to transmit or receive the generated metadata, and if implemented in the form of a server, it will be possible to provide the user terminal (not shown) with the metadata automatic creation function provided by the dedicated application linked through the server and the network. will be.

이하, 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법에 대하여 도 2 내지 도 12을 참조하여 설명하도록 한다. Hereinafter, a method for automatically generating metadata for structured data according to a second embodiment of the present invention will be described with reference to FIGS. 2 to 12 .

도 2는 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법의 대표적인 단계를 나타낸 순서도이다. 2 is a flowchart illustrating representative steps of a method for automatically generating metadata for structured data according to a second embodiment of the present invention.

그러나 이는 본 발명의 목적을 달성함에 있어서 바람직한 실시 예일 뿐이며, 필요에 따라 일부 단계가 추가 또는 삭제될 수 있음은 물론이며, 어느 한 단계가 다른 단계에 포함되어 수행될 수도 있다. However, this is only a preferred embodiment in achieving the object of the present invention, and of course, some steps may be added or deleted as necessary, and any one step may be included in another step and performed.

또한, 각 단계는 앞서 본 발명의 제1 실시 예에 따른 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100), 구현 형태는 서버의 형태에 의해 수행됨을 전제로 하며, 사용자 단말(미도시)의 경우 네트워크 기능을 보유한 단말, 예를 들어 스마트폰, 스마트 워치, 스마트 글라스, 노트북 컴퓨터, 테블릿 컴퓨터, PDA, PDP, PMP 등이라면 어떠한 것이라도 무방하나 이하의 설명에서는 사용자 단말(미도시)이 데스크톱 PC임을 전제로 설명을 이어가도록 한다. In addition, each step is premised on the assumption that the metadata generating apparatus 100 using the machine learning/deep learning model according to the first embodiment of the present invention, the implementation form is performed by the form of the server, and the user terminal (not shown) ), any terminal having a network function, for example, a smartphone, smart watch, smart glass, notebook computer, tablet computer, PDA, PDP, PMP, etc. may be used, but in the following description, a user terminal (not shown) Let's continue the explanation on the premise that this is a desktop PC.

우선, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 정형 데이터를 수신한다(S210). First, the apparatus 100 for generating metadata using a machine learning/deep learning model receives structured data (S210).

여기서 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 장형 데이터를 수신할 수 있도록 정형 데이터를 송신하는 주체는 대표적으로 사용자 단말(미도시)일 수 있으며, 사용자 단말(미도시)이 아니라 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100) 스스로 또는 관리자의 조작에 의해 인터넷에서 크롤링하거나 특정 데이터베이스에 접속하여 다운로드할 수도 있다 할 것인바, 정형 데이터의 송신 주체는 특별히 한정하지 않는다 할 것이다. Here, the subject that transmits the structured data so that the metadata generating apparatus 100 using the machine learning/deep learning model can receive the long data may be a user terminal (not shown), and the user terminal (not shown) Rather, the metadata generating device 100 using a machine learning/deep learning model by itself or by an administrator's operation may be crawled on the Internet or downloaded by accessing a specific database, the subject of sending the structured data is not particularly limited. something to do.

정형 데이터를 수신한 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)는 이를 스토리지(30) 등과 같은 내부 저장 공간 또는 네트워크를 통해 연결된 외부 저장 공간에 저장할 수 있으며, 저장 없이 곧바로 S220 단계로 넘어갈 수도 있다. The metadata generating device 100 using the machine learning/deep learning model that has received the structured data can store it in an internal storage space such as the storage 30 or an external storage space connected through a network, and go directly to step S220 without storage. may pass

정형 데이터를 수신했다면, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 수신한 정형 데이터에 메타데이터 생성 규칙을 적용하여 정형 데이터에 대한 메타데이터를 생성하고(S220), 생성한 메타데이터를 머신러닝/딥러닝 모델로 학습하여 메타데이터 생성 규칙을 업데이트한다(S230). If the structured data is received, the metadata generating apparatus 100 using a machine learning/deep learning model applies a metadata generation rule to the received structured data to generate metadata for the structured data (S220), and the generated meta data The data is learned by the machine learning/deep learning model to update the metadata generation rule (S230).

정형 데이터에 대한 메타데이터를 생성하는 S220 단계는 본 발명의 제2 실시 예에 다른 정형 데이터에 대한 메타데이터 자동 생성 방법의 가장 핵심적인 단계에 해당하는바, 이를 설명하기에 앞서 정형 데이터에 대한 메타데이터를 구성하는 정보에 대하여 먼저 설명하도록 한다. Step S220 of generating metadata for structured data corresponds to the most essential step of the method for automatically generating metadata for structured data according to the second embodiment of the present invention. The information constituting the data will be described first.

정형 데이터에 대한 메타데이터는 크게 ⅰ) 데이터셋 전체에 대한 요약 정보, ⅱ) 각 데이터 샘플 별 정보 및 ⅲ) 필드 별 정보로 구성되며, ⅰ) 데이터셋 전체에 대한 요약 정보는 데이터셋 이름, 데이터셋에 대한 설명, 데이터셋에 대한 해시태그 목록, 전체 데이터 샘플 수, 필드 수, 시계열 데이터셋 여부, 데이터셋의 분포 및 클러스터 수, Key 필드 목록, 버전 등의 정보로 구성되고, ⅱ) 각 데이터 샘플 별 정보는 샘플이 이상치에 속할 확률, 샘플이 속한 클러스터 번호, 버전 등의 정보로 구성되며, ⅲ) 필드 별 정보는 원 필드명, 필드명 풀네임(Full Name), 필드에 대한 설명, 필드의 데이터 타입, 최대, 최소, 평균, 표준 편차, 중앙값, 최빈값, 4분위수 등과 같은 통계적 정보, Key 필드 여부, 시계열 필드 여부, Null 값의 수, 정상 범위, 버전 등의 정보로 구성되는바, S220 단계는 ⅲ) 필드 별 정보와 가장 밀접하게 연관되는 단계로 볼 수 있다. Metadata for structured data is largely composed of i) summary information for the entire dataset, ii) information for each data sample, and iii) information for each field, i) summary information for the entire dataset includes the dataset name, data It consists of information such as a description of the set, a list of hashtags for the dataset, the total number of data samples, the number of fields, whether it is a time series dataset, the distribution and number of clusters of the dataset, a list of key fields, and the version, ii) each data The information for each sample consists of information such as the probability that the sample belongs to an outlier, the cluster number to which the sample belongs, and version. Statistical information such as data type, maximum, minimum, mean, standard deviation, median, mode, quartile, etc. of Step iii) can be viewed as the step most closely related to information for each field.

한편, 정형 데이터에 적용하는 메타데이터 생성 규칙은 데이터 필드명의 풀네임(Full Name) 설정과 추천 규칙, 데이터셋과 필드에 대한 설명 추천 규칙 및 상기 데이터셋에 대한 해시태그 생성 규칙 중 어느 하나 이상과 같이 메타데이터 생성을 위한 규칙을 포함할 수 있는바, 메타데이터 생성 규칙은 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)에 데이터베이스 또는 파일 형태로 저장될 수 있으며, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)의 구동 초기에는 기 세팅된 메타데이터 생성 규칙이 적용되고 그 이후에는 정형 데이터에 대한 메타데이터를 지속적으로 생성하면서 사용자가 추가하거나 변경한 규칙을 머신러닝/딥러닝 모델을 통해 실시간 또는 주기적으로 학습한 후, 그 이전까지의 메타데이터 생성 규칙을 업데이트함으로써 메타데이터 생성 규칙의 품질을 지속적으로 향상시킬 수 있으며, 이는 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100) 자체의 성능 향상과도 연관되는바, 메타데이터 생성 규칙의 품질이 향상될수록 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100) 자체의 성능 역시 지속적으로 향상될 수 있다. On the other hand, the metadata generation rules applied to structured data include any one or more of the full name setting and recommendation rules for data field names, the description recommendation rules for datasets and fields, and the hash tag generation rules for the datasets. Similarly, a rule for generating metadata may be included, and the metadata generation rule may be stored in the form of a database or file in the metadata generating apparatus 100 using a machine learning/deep learning model, and machine learning/deep learning. At the beginning of the operation of the metadata generating apparatus 100 using the model, a preset metadata generation rule is applied, and thereafter, while continuously generating metadata for the structured data, the rules added or changed by the user are applied to machine learning/deep After learning through the learning model in real time or periodically, the quality of the metadata generation rules can be continuously improved by updating the metadata generation rules up to that point, which is a metadata generation device using machine learning/deep learning models. As the quality of the metadata generation rule is improved, the performance of the metadata generation apparatus 100 itself using the machine learning/deep learning model itself may also be continuously improved.

도 3은 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 정형 데이터를 수신하고, 메타데이터 생성 규칙을 적용하여 메타데이터를 생성한 후, 이를 학습함으로써 메타데이터 생성 규칙을 업데이트하는 모습을 도시한 모식도인바, 이하 본 발명의 제2 실시 예에 다른 정형 데이터에 대한 메타데이터 자동 생성 방법의 가장 핵심적인 단계에 해당하는 S220 단계에 대하여 자세히 설명하도록 한다. 3 is a view illustrating that the metadata generation apparatus 100 using a machine learning/deep learning model receives structured data, generates metadata by applying a metadata generation rule, and updates the metadata generation rule by learning it. is a schematic diagram showing, hereinafter, step S220, which corresponds to the most essential step of the method for automatically generating metadata for structured data according to the second embodiment of the present invention, will be described in detail.

도 4는 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법에 있어서, 가장 핵심적인 단계에 해당하는 S220 단계, 보다 구체적으로 메타데이터 생성 규칙이 데이터 필드명의 풀네임 설정과 추천 규칙을 포함하는 경우를 구체화한 순서도이다. 4 is a method for automatically generating metadata for structured data according to a second embodiment of the present invention, in step S220 corresponding to the most essential step, more specifically, in the metadata generation rule, full name setting and recommendation of data field names It is a flow chart embodying the case including rules.

한편, 명세서에 첨부할 수 있는 도면의 크기로 인해 도 4에 첨부한 순서도가 포함하는 글씨의 식별이 어려울 것이 예상되어 도 5및 도 6에는 도 4에 도시된 순서도의 일부를 분리하여 도시하였으며, 도 7은 메타데이터 생성 규칙이 데이터 필드명의 풀네임 설정과 추천 규칙을 포함하는 경우에 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법에서 이용되는 필드 매핑 테이블(Field Mapping Table)을 예시적으로 도시한 도면인바, 좌측 테이블은 하나 이상의 필드명 각각에 대하여 (풀네임 ID, 참조 횟수)가 하나 이상 매핑된 필드명/풀네임 매핑 테이블, 우측 테이블은 풀네임 ID별로 정렬된 특정 풀네임에 대한 다양한 정보가 정리되어 있는 풀네임 정보 테이블로 명명하도록 한다. On the other hand, due to the size of the drawings that can be attached to the specification, it is expected that it will be difficult to identify the letters included in the flowchart attached to FIG. 7 is a field mapping table used in the method for automatically generating metadata for structured data according to the second embodiment of the present invention when the metadata generation rule includes a full name setting of a data field name and a recommendation rule. ) as an example, the left table is a field name/full name mapping table in which one or more (full name ID, reference count) is mapped for each of one or more field names, and the right table is sorted by full name ID Name it as a full name information table in which various information about a specific full name is organized.

우선, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 데이터 필드명의 풀네임 설정과 추천 규칙에 관한 필드 매핑 테이블이 포함하는 필드명/풀네임 매핑 테이블에 S210 단계에서 수신한 정형 데이터의 필드명을 포함하는지 판단한다(S220-1). First, the structured data received in step S210 to the field name/full name mapping table included in the field mapping table related to the setting of the full name of the data field name and the recommendation rule by the metadata generating apparatus 100 using a machine learning/deep learning model It is determined whether or not the field name is included (S220-1).

여기서 필드명은 앞서 언급한 바와 같이 약어에 해당하며, S210 단계에서 수신한 정형 데이터의 필드명이 PRCS_DT인 경우를 예로 하여 설명을 이어가도록 한다. Here, the field name corresponds to an abbreviation as mentioned above, and the description will be continued taking the case where the field name of the structured data received in step S210 is PRCS_DT as an example.

도 7을 참조하면, 좌측 테이블인 필드명/풀네임 매핑 테이블이 다양한 필드명이 기재되어 있음을 확인할 수 있는바, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)는 S210 단계에서 수신한 정형 데이터의 필드명이 PRCS_DT이기 때문에 필드명/풀네임 매핑 테이블에 필드명 PRCS_DT이 포함되어 있는지 판단하고, 1열 3행에 포함되어 있기에 포함하는 것으로 판단할 수 있다. Referring to FIG. 7 , it can be seen that various field names are described in the field name/full name mapping table, which is the left table, and the apparatus 100 for generating metadata using a machine learning/deep learning model receives the Since the field name of the structured data is PRCS_DT, it is determined whether the field name PRCS_DT is included in the field name/full name mapping table, and it can be determined that the field name is included because it is included in column 1, row 3, etc.

S220-1 단계의 판단 결과, 포함하는 것으로 판단되었다면, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 해당 필드명에 대하여 참조 횟수가 가장 높은 풀네임 ID를 탐색한다(S220-2). As a result of the determination in step S220-1, if it is determined to include, the apparatus 100 for generating metadata using a machine learning/deep learning model searches for the full name ID having the highest number of references to the field name (S220-2) ).

도 7을 참조하면 좌측 테이블인 필드명/풀네임 매핑 테이블의 각 필드명에는 (Full Name ID, 참조 횟수)가 하나 이상 매칭되어 있음을 확인할 수 있는바, 여기서 풀네임 ID는 우측 테이블인 풀네임 정보 테이블이 포함하는 특정 풀넴임을 나타내는 ID에 해당하며, 참조 횟수는 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100) 내에서 해당 필드명이 특정 풀네임으로 몇 번이나 사용되었는지를 나타내는 숫자에 해당하는바, 특정 풀네임 ID에 대한 참조 횟수가 높을수록 해당 필드명은 해당 특정 풀네임 ID에 해당하는 풀네임으로 많이 사용된 것으로 볼 수 있는바, 참조 횟수가 가장 높은 풀네임 ID를 탐색하는 것은 지금까지의 경험에 의해 해당 필드명에 해당할 가능성이 가장 높은 풀네임을 추천하기 위함이라 할 것이다.Referring to FIG. 7 , it can be seen that at least one (Full Name ID, reference count) matches each field name of the field name/full name mapping table, which is the left table, where the full name ID is the full name of the right table. It corresponds to an ID indicating a specific full name included in the information table, and the reference count is a number indicating how many times the corresponding field name is used as a specific full name in the metadata generating device 100 using a machine learning/deep learning model. That is, the higher the number of references to a specific full name ID, the more the field name is used as the full name corresponding to the specific full name ID. Searching for the full name ID with the highest number of references is This is to recommend the full name that is most likely to correspond to the field name based on the experience so far.

이를 앞선 예에 적용하면 S210 단계에서 수신한 정형 데이터의 필드명이 PRCS_DT이었기에, S220-2 단계에 따르면 가장 높은 참조 횟수를 탐색하여 참조 횟수가 3인 풀네임 ID 1을 선정할 수 있다. If this is applied to the previous example, since the field name of the structured data received in step S210 is PRCS_DT, according to step S220-2, the full name ID 1 having the reference count of 3 may be selected by searching for the highest reference count.

이후, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 탐색한 참조 횟수가 가장 높은 풀네임 ID를 필드 매핑 테이블이 포함하는 풀네임 정보 테이블에서 탐색하고 이에 매핑된 풀네임을 수신한 정형 데이터에 대한 메타데이터 필드명의 풀네임으로 설정한다(S220-3). After that, the full name ID with the highest number of references searched by the metadata generating device 100 using a machine learning / deep learning model is searched in the full name information table including the field mapping table, and the full name mapped to this is retrieved. It is set as the full name of the metadata field name for the structured data (S220-3).

앞서 S220-2 단계에서 참조 횟수가 가장 높은 풀네임 ID를 탐색하였기에, 풀네임 정보 테이블에서 해당 풀네임 ID를 탐색한다면 이에 매칭된 풀네임을 정형 데이터에 대한 메타데이터 필드명의 풀네임으로 설정할 수 있다. Since the full name ID with the highest reference count was searched for in step S220-2, if the full name ID is searched for in the full name information table, the matching full name can be set as the full name of the metadata field name for structured data. .

이를 앞선 예에 적용하면, 정형 데이터의 필드명 PRCS_DT에 대하여 참조 횟수가 가장 높은 풀네임 ID인 풀네임 ID 1을 탐색하였기에, 풀네임 ID 1을 도 7의 우측에 도시된 풀네임 정보 테이블에서 탐색하고, 이에 매핑된 풀네임인 Process Date를 S210 단계에서 수신한 정형 데이터에 대한 메타데이터 필드명 PRCS_DT의 풀네임으로 설정할 수 있는 것이다. If this is applied to the previous example, since the full name ID 1, which is the full name ID with the highest reference count, was searched for with respect to the field name PRCS_DT of the structured data, the full name ID 1 is searched for in the full name information table shown on the right side of FIG. and Process Date, which is the full name mapped to this, can be set as the full name of the metadata field name PRCS_DT for the structured data received in step S210.

한편, 여기서의 설정은 추천의 의미도 포함하는바, 필드명인 약어가 어떠한 풀네임에 해당하는지 정확하게 알 수 없는 상태에서 해당할 가능성이 가장 높은 풀네임을 참조 횟수에 기초하여 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 설정해주기 때문이며, 어디까지나 추천이기 때문에 추천해주는 풀네임을 사용자가 정형 데이터에 대한 메타데이터의 필드명의 풀네임으로 반드시 선택할 필요는 없으며, 직접 입력할 수 있음은 물론이라 할 것이다. On the other hand, the setting here also includes the meaning of recommendation, and in a state where it is not known exactly which full name the abbreviation of the field name corresponds to, the full name that is most likely to correspond to is a machine learning/deep learning model based on the number of references. This is because the metadata generating apparatus 100 using Of course it would be.

수신한 정형 데이터에 대한 메타데이터 필드명의 풀네임으로 설정했다면, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 필드명/풀네임 매핑 테이블이 포함하는 해당 정형 데이터의 필드명에 대하여 상기 참조 횟수가 가장 높은 풀네임 ID에 대한 참조 횟수를 1 증가시키며(S220-4), 풀네임 정보 테이블이 포함하는 앞서 탐색한 참조 횟수가 가장 높은 풀네임 ID에 대한 참조 횟수를 1 증가시킨다(S220-5).If it is set as the full name of the metadata field name for the received structured data, the metadata generating apparatus 100 using the machine learning/deep learning model relates to the field name of the corresponding structured data included in the field name/full name mapping table The reference count for the full name ID having the highest reference count is increased by 1 (S220-4), and the reference count to the full name ID having the highest reference count previously searched in the full name information table is increased by 1 (S220-4) (S220-4). S220-5).

이러한 S220-5 단계는 필드 매핑 테이블을 이용하여 필드명의 풀네임을 설정하였기에 필드 매핑 테이블, 보다 구체적으로 필드명/풀네임 매핑 테이블과 풀네임 정보 테이블을 최신 상태로 업데이트시키는 것으로 볼 수 있는바, 이를 앞선 예에 적용하면 도 8에 도시된 바와 같이 좌측의 필드명/풀네임 매핑 테이블이 포함하는 해당 정형 데이터의 필드명인 PRCS_DT 중, 앞서 탐색한 참조 횟수가 가장 높은 풀네임 ID 1에 대한 참조 횟수 3에서 1을 증가시켜 4로 업데이트하는 것이며, 우측의 풀네임 정보 테이블이 포함하는 풀네임 ID 1에 대한 참조 횟수를 11에서 1을 증가시켜 12로 업데이트하는 것이다. This step S220-5 can be seen as updating the field mapping table, more specifically, the field name/full name mapping table and the full name information table, to the latest state because the full name of the field name is set using the field mapping table. When this is applied to the previous example, as shown in FIG. 8 , the number of references to the full name ID 1 having the highest number of references searched above among PRCS_DT, which is the field name of the corresponding structured data included in the field name/full name mapping table on the left, as shown in FIG. 8 . It is updated to 4 by increasing 1 from 3, and the number of references to full name ID 1 included in the full name information table on the right is updated to 12 by increasing 1 from 11.

이상 설명한 S220-1 단계 내지 S220-5 단계는 수신한 정형 데이터의 필드명이 필드명/풀네임 매핑 테이블에 포함된 경우에 관한 것이며, 이하, 포함되지 않는 경우에 대하여 설명하도록 한다. Steps S220-1 to S220-5 described above relate to a case in which the field name of the received structured data is included in the field name/full name mapping table. Hereinafter, a case in which the field name of the received structured data is not included will be described.

다시 도 7을 참조하여 S220-2 단계에서 NO를 따라가면, 즉, S220-1 단계의 판단 결과 포함하지 않는 것으로 판단되었다면, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 수신한 정형 데이터에 대한 메타데이터 필드명의 풀네임을 사용자로부터 한 글자 이상 수신한다(S220-2′)Referring to FIG. 7 again, if NO is followed in step S220-2, that is, if it is determined that the result of step S220-1 is not included, the metadata generating apparatus 100 using the machine learning/deep learning model receives the Receive one or more characters of the full name of the metadata field name for the structured data (S220-2')

이는 기 설정된 필드명/풀네임 매핑 테이블에 필드명에 해당하는 약어가 포함되어 있지 않기 때문에 사용자로부터 직접 풀네임에 대한 단서인 한 글자 이상을 수신함으로써 풀네임 정보 테이블에서 이에 해당하는 풀네임을 추천하기 위함인 것이며, 사용자로부터 수신한 글자가 Cu 두 글자임을 예로 하여 설명을 이어가도록 한다. This is because the preset field name/full name mapping table does not contain an abbreviation corresponding to the field name, so the full name corresponding to the full name is recommended in the full name information table by receiving one or more letters as a clue for the full name directly from the user. This is for this purpose, and the description is continued by taking as an example that the letters received from the user are two letters of Cu.

이후, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 사용자로부터 수신한 한 글자 이상의 메타데이터 필드명의 풀네임에 대하여 그 뒤에 이어지는 글자 또는 단어를 예측하여 완성된 풀네임을 추천한다(S220-3′)Thereafter, the metadata generating apparatus 100 using a machine learning/deep learning model predicts the letters or words that follow with respect to the full name of the metadata field name of one or more letters received from the user, and recommends the completed full name ( S220-3′)

앞서 S220-2′ 단계에서 사용자로부터 수신한 글자가 Cu이기에, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)는 풀네임 정보 테이블이 포함하는 풀네임 중, Cu로 시작하는 풀네임인 Customer Number과 Customer Grade를 추천할 수 있다.Since the character received from the user in step S220-2' is Cu, the metadata generating apparatus 100 using a machine learning/deep learning model is a full name starting with Cu among the full names included in the full name information table. Customer Number and Customer Grade can be recommended.

완성된 풀네임을 추천했다면, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 추천한 완성된 풀네임을 사용자로부터 선택 받는지 판단하며(S220-4′), 판단 결과 사용자로부터 선택 받았다면, 풀네임 정보 테이블이 포함하는 선택 받은 풀네임에 대한 참조 횟수를 1 증가시키고(S220-5′), 필드명/풀네임 매핑 테이블이 포함하는 하나 이상의 필드명 중, 선택 받은 풀네임에 매칭된 필드명에 대한 참조 횟수를 1 증가시킨다(S220-6′)If the completed full name is recommended, the metadata generating apparatus 100 using the machine learning/deep learning model determines whether the recommended complete full name is selected by the user (S220-4 ′), and as a result of the determination, the user has selected the completed full name Then, the number of references to the selected full name included in the full name information table is increased by 1 (S220-5′), and the selected full name is matched among one or more field names included in the field name/full name mapping table. The number of references to the field name is increased by 1 (S220-6')

이는 S220-4′ 단계에서 추천한 완성된 풀네임을 사용자로부터 선택 받았다함은 풀네임이 풀네임 정보 테이블에 포함되어 있다는 것이며, 이는 해당 풀네임에 매칭된 필드명 역시 필드명/풀네임 매핑 테이블에 포함되어 있다는 것으로 볼 수 있기에, 참조 횟수를 1씩 증가시킴으로써 필드 매핑 테이블을 최신 상태로 업데이트하는 것이다. This means that the user selected the completed full name recommended in step S220-4' means that the full name is included in the full name information table, which means that the field name matching the full name is also a field name/full name mapping table Since it can be seen that it is included in , the field mapping table is updated to the latest state by incrementing the reference count by 1.

이를 앞선 예에 적용하여 사용자가 추천된 풀네임 Customer Number과 Customer Grade 중, Customer Grade를 선택했다면 도 9에 도시된 바와 같이 우측의 풀네임 정보 테이블이 포함하는 풀네임 Customer Grade의 참조 횟수를 2에서 1을 증가시켜 3으로 업데이트하는 것이며, 좌측의 필드명/풀네임 매핑 테이블에서 풀네임 Customer Grade의 풀네임 ID인 풀네임 ID 5에 대한 필드명 CSTMRGRAD에서의 참조 횟수를 2에서 1을 증가시켜 3으로 업데이트하는 것이다. Applying this to the previous example, if the user selects Customer Grade among the recommended full name Customer Number and Customer Grade, as shown in FIG. It is updated to 3 by incrementing 1, and in the field name/full name mapping table on the left, the reference count in the field name CSTMRGRAD for the full name ID 5, which is the full name ID of the full name Customer Grade, is increased from 2 to 3 to update to

이상 설명한 S220-2′ 단계 내지 S220-6′ 단계는 정형 데이터의 필드명과 정확하게 일치하는 필드명이 필드명/풀네임 매핑 테이블에 포함되어 있지는 않으나, 해당 필드명에 대한 풀네임이 풀네임 정보 테이블에 포함되어 있음으로써 필드명/풀네임 매핑 테이블에 실질적으로 동일한 의미의 필드명이 포함되는 경우에 관한 것인바, 예를 들어, 정형 데이터의 필드명이 CUGRAD라면 필드명/풀네임 매핑 테이블에 포함되어 있지 않으므로 S220-2′ 단계가 실시될 것이나, 필드명 CUGRAD의 풀네임에 대하여 사용자로부터 수신한 글자가 CU이고, 추천 받은 풀네임 중 Customer Grade를 선택 받았다면 필드명/풀네임 매핑 테이블이 포함하는 CSTMRGRAD가 실질적으로 참조된 것으로 보는 것이다. In steps S220-2' to S220-6' described above, field names that exactly match the field names of the structured data are not included in the field name/full name mapping table, but the full names for the corresponding field names are stored in the full name information table. It relates to a case where field names with substantially the same meaning are included in the field name/full name mapping table by being included. For example, if the field name of structured data is CUGRAD, it is not included in the field name/full name mapping table Step S220-2' will be executed, but if the character received from the user for the full name of the field name CUGRAD is CU, and Customer Grade is selected among the recommended full names, the CSTMRGRAD included in the field name/full name mapping table is It is considered to be an actual reference.

이와 별개로, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)는 도 10에 도시된 바와 같이 필드명/풀네임 매핑 테이블에 포함되지 않은 필드명, 예를 들어 CUGRAD를 추가하고, 이에 매칭되는 풀네임 아이디인 5를 기재하되, 이에 대한 참조 횟수를 1로 기재함으로써 새로운 필드명을 업데이트할 수도 있는바, 이는 하나의 풀네임이 복수 개의 필드명에 매핑될 수 있음을 전제로 한 것이며, 이와 역으로 하나의 필드명이 복수 개의 풀네임에 매칭될 수도 있음은 물론이라 할 것이고, 새롭게 추가한 필드명과 실질적으로 동일한 의미를 갖는 기존의 필드명, 예를 들어 CSTMRGRAD의 참조 횟수를 1 증가시킬지 여부는 관리자의 설정에 의할 수 있다 할 것이며, 도 10에서는 증가시키지 않은 상태를 예로 하여 도시하였다. Separately, the apparatus 100 for generating metadata using a machine learning/deep learning model adds a field name that is not included in the field name/full name mapping table, for example, CUGRAD, as shown in FIG. 5, which is a matching full name ID, but a new field name may be updated by setting the reference count to 1. This is assuming that one full name can be mapped to a plurality of field names. , Conversely, it goes without saying that a single field name may match a plurality of full names, and whether the reference count of an existing field name having substantially the same meaning as a newly added field name, for example, CSTMRGRAD, is increased by 1? Whether or not it may be determined by the administrator's setting, FIG. 10 exemplifies the non-increased state.

이번에는 S220-4′ 단계에서 추천한 완성된 풀네임을 사용자로부터 선택 받지 않은 경우에 대하여 설명하도록 한다. This time, a case in which the complete name recommended in step S220-4' is not selected by the user will be described.

S220-4′ 단계의 판단 결과, 추천한 완성된 풀네임을 사용자로부터 선택 받지 않았다면, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 수신한 정형 데이터에 대한 메타데이터 필드명의 풀네임을 사용자로부터 전부 수신하며(S220-5″), 사용자로부터 전부 수신한 메타데이터 필드명의 풀네임을 풀네임 정보 테이블이 포함하는지 판단한다(S220-6″). As a result of the determination in step S220-4', if the recommended complete name is not selected by the user, the metadata field name for the structured data received by the metadata generating apparatus 100 using a machine learning/deep learning model is the full name is received from the user (S220-5″), and it is determined whether the full name information table includes the full name of the metadata field name all received from the user (S220-6″).

추천한 완성된 풀네임을 사용자로부터 선택 받지 않았다 함은 사용자가 의도하는 해당 필드명에 대한 풀네임을 풀네임 정보 테이블이 포함하지 않는다는 것일 가능성이 높으나, 사용자에 따라서는 포함하는 경우라도 추천한 완성된 풀네임을 의도적으로 선택하지 않을 수도 있으며, 이에 보다 정확성을 기하기 위해 풀네임 정보 테이블에서 사용자로부터 수신한 풀네임을 다시 한번 탐색하는 것이고, 판단 결과 포함하는 것으로 판단되었다면 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 풀네임 정보 테이블이 포함하는 사용자로부터 전부 수신한 메타데이터 필드명의 풀네임에 대한 참조 횟수를 1 증가시키며(S220-7″), 이에 대한 설명은 앞서 S220-5′ 단계에 대한 설명과 동일하므로 중복 서술을 방지하기 위해 자세한 설명은 생략하도록 하고, 그 이후 후술할 S220-8″ 단계가 실시된다. The fact that the recommended full name is not selected by the user means that the full name information table does not include the full name for the field name intended by the user, but depending on the user, even if the full name is included, the recommended completion The full name may not be selected intentionally, and for more accuracy, the full name received from the user in the full name information table is searched once again, and if it is determined that the full name is included as a result of the determination, the machine learning/deep learning model The metadata generating apparatus 100 using the increments the number of references to the full name of the metadata field name all received from the user included in the full name information table by 1 (S220-7″), a description of which is described above in S220- Since it is the same as the description for step 5', a detailed description will be omitted to prevent duplicate description, and step S220-8″ to be described later is performed thereafter.

한편, S220-6″ 단계의 판단 결과 포함하지 않는 것으로 판단되었다면 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 풀네임 정보 테이블에 사용자로부터 전부 수신한 메타데이터 필드명의 풀네임 및 이에 대한 풀네임 ID를 추가하고, 참조 횟수를 1로 설정한다(S220-7′″). On the other hand, if it is determined that it is not included as a result of the determination of step S220-6″, the metadata generating apparatus 100 using a machine learning/deep learning model includes the full name of the metadata field name received from the user in the full name information table and the Add the full name ID for the , and set the reference count to 1 (S220-7′″).

이후, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 사용자로부터 전부 수신한 메타데이터 필드명의 풀네임과 매핑되는 정형 데이터의 필드명을 필드명/풀네임 매핑 테이블이 포함하는지 판단하며(S220-8″), 판단 결과 포함하지 않는 것으로 판단되었다면 필드명/풀네임 매핑 테이블에 사용자로부터 전부 수신한 메타데이터 필드명의 풀네임 또는 이에 대한 약어를 추가하고, 풀네임 정보 테이블이 포함하는 사용자로부터 전부 수신한 메타데이터 필드명의 풀네임에 대한 풀네임 ID를 추가하되, 참조 횟수는 1로 설정하며(S220-9″), 판단 결과 포함하는 것으로 판단되었다면 해당 필드명에 대하여 S220-7″ 단계에서 참조 횟수를 1 증가시킨 풀네임 ID의 참조 횟수를 1 증가시킨다(S220-10″). Thereafter, the metadata generating apparatus 100 using the machine learning / deep learning model determines whether the field name / full name mapping table includes the full name of the metadata field name received from the user and the field name of the structured data that is mapped, (S220-8″), if it is determined not to be included as a result of the determination, the full name or abbreviation of the metadata field name received from the user is added to the field name/full name mapping table, and the user included in the full name information table Add the full name ID for the full name of the metadata field name received from the metadata, but set the reference count to 1 (S220-9″). The reference count of the full name ID, whose reference count is increased by 1, is increased by 1 (S220-10″).

이상의 S220-9″ 단계는 풀네임 정보 테이블에는 포함되어 있되, 이에 대한 필드명이 필드명/풀네임 매핑 테이블에 포함되어 있지 않은 경우 필드명/풀네임 매핑 테이블을 최신 상태로 업데이트하는 것이며, S220-7′″ 단계는 풀네임 정보 테이블과 필드명/풀네임 매핑 테이블 모두에 포함되어 있지 않은 경우에 풀네임 정보 테이블 및 필드명/풀네임 매핑 테이블 모두를 최신 상태로 업데이트하는 것인바, 최신 상태로의 업데이트에 관함은 앞선 설명과 동일하므로 중복 서술을 방지하기 위해 자세한 설명은 생략하도록 한다. Step S220-9” above is to update the field name/full name mapping table to the latest state if the field name is included in the full name information table, but the field name for this is not included in the field name/full name mapping table, S220- Step 7′″ is to update both the full name information table and the field name/full name mapping table to the latest state if they are not included in both the full name information table and the field name/full name mapping table. Since the update is the same as the previous description, a detailed description will be omitted to prevent duplicate description.

이상 설명한 S220-1 단계 내지 S220-5 단계, S220-2′ 단계 내지 S220-6′ 단계, S220-5″ 단계 내지 S220-10″ 단계 모두 프로세스가 완료된 이후에는 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 필드명 및 필드명에 대한 참조 횟수, 풀네임 및 풀네임에 대한 참조 횟수를 기반으로 필드 매핑 테이블을 재구성하며, 필드명/풀네임 매핑 테이블의 경우 특정 필드명에 대한 업데이트된 참조 횟수에 따라 매핑 목록을 재정렬할 것이다. Steps S220-1 to S220-5, S220-2′ to S220-6′, and S220-5″ to S220-10″ all described above. After the process is completed, the meta using machine learning / deep learning model The data generating device 100 reconstructs the field mapping table based on the number of references to the field name and field name, the full name and the number of references to the full name, and in the case of the field name/full name mapping table, We will reorder the list of mappings according to the updated reference count.

지금까지 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법에 있어서, 메타데이터 생성 규칙이 데이터 필드명의 풀네임 설정과 추천 규칙을 포함하는 경우에 대하여 설명하였다. 본 발명에 따르면 정형 데이터가 데이터마다 상이한 약어로 구성된 필드명을 포함하고 있다 할지라도 전문 인력이 개별적으로 확인하여 처리함이 없이 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 필드 매핑 테이블을 이용하여 가장 적합한 풀네임을 자동으로 설정해주기 때문에 불필요한 시간 및 비용 소모를 방지할 수 있으며, 데이터에 대한 전처리 프로세스를 손쉽고 간편하게 수행할 수 있다. 또한, 필드 매핑 테이블에 필드명 또는 이에 대한 풀네임 중 어느 하나 이상이 포함되어 있지 않은 경우에는 이를 포함시켜 필드 매핑 테이블을 신속하게 업데이트함으로써 그 이후의 메타데이터 생성에 정확도를 향상시킬 수 있다. 더 나아가, 필드 매핑 테이블이 업데이트됨으로써 데이터 필드명의 풀네임 설정과 추천 규칙 그리고 생성된 메타데이터를 머신러닝/딥러닝 모델로 지속적으로 학습할 수 있는바, 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)의 사용에 따라 머신러닝/딥러닝 모델의 성능 및 풀네임 설정과 추천 성능이 나날이 향상될 수 있다. So far, in the method for automatically generating metadata for structured data according to the second embodiment of the present invention, the case in which the metadata generation rule includes the full name setting of the data field name and the recommendation rule has been described. According to the present invention, even if the structured data includes field names composed of different abbreviations for each data, the metadata generating apparatus 100 using the machine learning/deep learning model is field mapping without a professional manpower individually checking and processing the fields. Since the most suitable full name is automatically set using a table, unnecessary time and money consumption can be prevented, and the data pre-processing process can be performed easily and conveniently. In addition, when the field mapping table does not include any one or more of a field name or a full name thereof, it is possible to quickly update the field mapping table by including it to improve accuracy in subsequent metadata generation. Furthermore, by updating the field mapping table, it is possible to continuously learn the full name setting of the data field name, the recommendation rule, and the generated metadata with the machine learning/deep learning model, thus generating metadata using the machine learning/deep learning model. According to the use of the device 100, the performance of the machine learning/deep learning model and the full name setting and recommendation performance may be improved day by day.

한편, 별도로 설명하지는 않았지만 이상 설명한 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법에 있어서, 메타데이터 생성 규칙이 데이터 필드명의 풀네임 설정과 추천 규칙을 포함하기 위해서는 인공지능 모델을 자연어 문장들을 사전에 학습한 머신러닝/딥러닝 기반의 NLP 모델로 채택함이 필요하다 할 것이다. Meanwhile, although not separately described, in the method for automatically generating metadata for structured data according to the second embodiment of the present invention described above, in order for the metadata generation rule to include the full name setting of the data field name and the recommendation rule, the artificial intelligence model It will be necessary to adopt the NLP model based on machine learning/deep learning that has learned natural language sentences in advance.

또한, 메타데이터 생성 규칙이 데이터셋과 필드에 대한 설명 추천 규칙을 포함하는 경우, S220 단계는 도 11에 도시된 바와 같이 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 데이터셋의 이름 및 필드의 풀네임 중 어느 하나 이상을 이용하여 데이터셋과 필드에 대한 설명 추천 규칙에 관한 필드 매핑 테이블이 포함하는 풀네임 정보 테이블 상의 필드 설명 항목에 기재될 내용을 추천하는 단계(S220-Ⅰ)를 포함할 수 있으며, 메타데이터 생성 규칙이 데이터셋에 대한 해시태그 생성 규칙을 포함하는 경우, S220 단계는 도 12에 도시된 바와 같이 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치(100)가 데이터셋의 이름과 설명 및 데이터셋을 구성하는 각 필드들의 풀네임 중 어느 하나 이상을 이용하여 상기 데이터셋에 대한 하나 이상의 해시태그를 추천하는 단계 S220-Ⅰ′)를 포함할 수 있다. In addition, when the metadata generation rule includes a description recommendation rule for the dataset and fields, in step S220, as shown in FIG. 11 , the metadata generation apparatus 100 using the machine learning/deep learning model Recommending content to be written in the field description item on the full name information table included in the field mapping table for the data set and the field description recommendation rule using any one or more of the name and the full name of the field (S220-I ), and when the metadata generation rule includes a hash tag generation rule for the dataset, step S220 is a metadata generation apparatus 100 using a machine learning/deep learning model as shown in FIG. 12 . may include a step S220-I') of recommending one or more hashtags for the dataset using any one or more of the name and description of the dataset and the full name of each field constituting the dataset.

이상 설명한 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법은 동일한 기술적 특징을 전부 포함하는 본 발명의 제3 실시 예에 따른 매체에 저장된 컴퓨터 프로그램으로 구현할 수 있는바, 중복 서술을 방지하기 위해 자세히 설명하지 않겠지만 이상 설명한 본 발명의 제2 실시 예에 따른 정형 데이터에 대한 메타데이터 자동 생성 방법에 적용되는 기술적 특징 모두, 본 발명의 제3실시 예에 따른 매체에 저장된 컴퓨터 프로그램에 동일하게 적용될 수 있음은 물론이다. The method for automatically generating metadata for structured data according to the second embodiment of the present invention described above can be implemented as a computer program stored in the medium according to the third embodiment of the present invention including all the same technical features, and the description is redundant. Although it will not be described in detail in order to prevent Of course, the same can be applied to

이상 첨부된 도면을 참조하여 본 발명의 실시 예들을 설명하였지만, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.Although the embodiments of the present invention have been described above with reference to the accompanying drawings, those of ordinary skill in the art to which the present invention pertains can realize that the present invention can be implemented in other specific forms without changing the technical spirit or essential features. you will be able to understand Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

100: 머신러닝/딥러닝 모델을 이용한 메타데이터 생성 장치
10: 프로세서
20: 네트워크 인터페이스
30: 메모리
40: 스토리지
41: 컴퓨터 프로그램
50: 데이터 버스100: Metadata generation device using machine learning/deep learning model
10: Processor
20: network interface
30: memory
40: storage
41: computer program
50: data bus

Claims

In a method for automatically generating metadata for structured data by a metadata generating device using a machine learning/deep learning model,
(a) receiving structured data;
(b) generating metadata for the structured data by applying a metadata generation rule to the received structured data; and
(c) updating the metadata generation rule by learning the generated metadata with a machine learning/deep learning model;
In a method for automatically generating metadata for structured data comprising:
The metadata generation rule is
It includes any one or more of a full name setting and recommendation rule for a data field name, a description recommendation rule for a dataset and field, and a hash tag generation rule for the dataset,
When the metadata generation rule includes setting a full name of a data field name and a recommendation rule, the step (b) includes:
(b-1) A field name/full name mapping table included in a field mapping table related to a full name setting and recommendation rule for the data field name - the field name/full name mapping table includes one or more field names, respectively determining whether the field name of the received structured data is included in one or more mapped to (full name ID, reference count);
(b-2) searching for a full name ID having the highest number of references to the field name if it is determined that the field name is included as a result of the determination in step (b-1); and
(b-3) The full name ID of the searched full name with the highest reference count is searched for in the full name information table included in the field mapping table, and the full name mapped thereto is the full name of the metadata field name for the received structured data set to;
A method for automatically generating metadata for structured data containing

delete

According to claim 1,
After step (b-3),
(b-4) increasing the reference count of the full name ID having the highest reference count with respect to the field name of the received structured data included in the field name/full name mapping table by 1; and
(b-5) increasing the number of references to the full name ID having the highest number of searched references included in the full name information table by 1;
A method for automatically generating metadata for structured data further comprising:

According to claim 1,
(b-2') if it is determined that the data is not included as a result of the determination in step (b-1), receiving one or more characters of the full name of the metadata field name for the received structured data;
(b-3') predicting a character or word following the full name of a metadata field name of one or more characters received from the user, and recommending a complete full name; and
(b-4') determining whether the recommended complete name is selected by the user;
A method for automatically generating metadata for structured data containing

5. The method of claim 4,
(b-5') increasing the number of references to the selected full name included in the full name information table by 1 if the user has selected the user as a result of the determination in step (b-4'); and
(b-6') increasing the number of references to a field name matching the selected full name among one or more field names included in the field name/full name mapping table by 1;
A method for automatically generating metadata for structured data further comprising:

5. The method of claim 4,
(b-5″) if it is determined in step (b-4′) that the user has not selected a full name of the metadata field name for the received structured data from the user;
(b-6″) determining whether the full name information table includes the full name of the metadata field name all received from the user;
(b-7″) If it is determined as including the full name information table as a result of the determination in step (b-6″), the number of references to the full name of the metadata field name received from the user including the full name information table is 1 increasing;
(b-8″) determining whether the field name/full name mapping table includes the full name of the metadata field name received from the user and the field name of the structured data that is mapped; and
(b-9″) If it is determined not to be included as a result of the determination in step (b-8″), the full name of the metadata field name received from the user in the field name/full name mapping table or an abbreviation thereof adding a full name ID for the full name of the metadata field name all received from the user included in the full name information table, but setting the reference count to 1;
A method for automatically generating metadata for structured data further comprising:

7. The method of claim 6,
(b-7′″) If it is determined not to be included as a result of the determination in step (b-6″), the full name of the metadata field name received from the user in the full name information table and the full name ID thereof adding , and setting the reference count to 1;
A method for automatically generating metadata for structured data further comprising:

According to claim 1,
When the metadata generation rule includes a description recommendation rule for a dataset and a field, the step (b) includes:
(b-I) Use any one or more of the name of the dataset and the full name of the field to be described in the field description item on the full name information table included in the field mapping table for the rule for recommending descriptions for datasets and fields recommending content to be;
A method for automatically generating metadata for structured data containing

According to claim 1,
In the case of including a hash tag generation rule for the dataset, the step (b) comprises:
(b-I') recommending one or more hashtags for the dataset using any one or more of the name and description of the dataset and full names of fields constituting the dataset;
A method for automatically generating metadata for structured data containing

one or more processors;
network interface;
a memory for loading a computer program executed by the processor; and
A storage for storing large-capacity network data and the computer program,
The computer program is executed by the one or more processors,
(A) receiving the structured data;
(B) generating metadata for the structured data by applying a metadata generation rule to the received structured data; and
(C) learning the generated metadata with a machine learning/deep learning model and updating the metadata generation rule;
In the metadata generating apparatus using a machine learning / deep learning model that executes,
The metadata generation rule is
It includes any one or more of a full name setting and recommendation rule for a data field name, a description recommendation rule for a dataset and field, and a hash tag generation rule for the dataset,
When the metadata generation rule includes setting a full name of a data field name and a recommendation rule, the (B) operation is
(B-1) A field name/full name mapping table included in a field mapping table for full name setting and recommendation rules for data field names - The field name/full name mapping table includes one or more field names, respectively an operation of determining whether (full name ID, reference count) is mapped to at least one field name of the received structured data;
(B-2) an operation of searching for a full name ID having the highest number of references to the field name if it is determined that the field name is included as a result of the determination of the operation (B-1); and
(B-3) The full name ID of the searched full name with the highest reference count is searched for in the full name information table included in the field mapping table, and the full name mapped thereto is the full name of the metadata field name for the received structured data operation to set;
Metadata generation device using machine learning/deep learning models that run