KR20220037608A

KR20220037608A - SYSTEM AND METHOD FOR classification OF LIST AND COMPUTER-READABLE RECORDING MEDIUM THEREOF

Info

Publication number: KR20220037608A
Application number: KR1020200120286A
Authority: KR
Inventors: 김지윤; 홍성민; 허철은
Original assignee: 대우조선해양 주식회사
Priority date: 2020-09-18
Filing date: 2020-09-18
Publication date: 2022-03-25

Abstract

According to an embodiment of the present invention, a list classification system receives learning data including a name and corresponding attribute information, encodes the learning data into an ASCII code, and may extract a correlation between the name and the corresponding attribute information as output data using a machine learning algorithm for encoded learning data. Therefore, accurate and fast machine learning can be performed.

Description

List classification system and method, and a computer program for executing the method on a computer is recorded, a computer readable recording medium

본 발명은 리스트 분류 시스템과 방법, 동 방법을 컴퓨터에서 실행하기 위한 컴퓨터 프로그램이 기록된, 컴퓨터 판독 가능한 기록 매체에 관련된 것으로, 보다 상세하게는 리스트로 입력된 명칭 및 그에 대응하는 속성 정보를 아스키(Ascii) 코드로 변환하고 기계 학습 알고리즘을 이용하여 명칭과 그에 대응하는 속성 간의 관계를 출력 데이터로 형성할 수 있는 리스트 분류 시스템 및 방법, 동 방법을 컴퓨터에서 실행하기 위한 컴퓨터 프로그램이 기록된, 컴퓨터 판독 가능한 기록 매체에 관한 것이다.The present invention relates to a list classification system and method, and to a computer-readable recording medium in which a computer program for executing the method in a computer is recorded. Ascii) a list classification system and method capable of converting into code and forming a relationship between a name and a property corresponding thereto as output data using a machine learning algorithm, a computer program for executing the method on a computer, computer-readable It relates to possible recording media.

인공지능(Artificial Intelligence, AI) 시스템은 인간 수준의 지능을 구현하는 컴퓨터 시스템이며, 기존 규칙(Rule) 기반의 스마트 시스템과 달리 기계가 스스로 학습하고 판단하는 시스템이다. 인공지능 시스템은 사용할수록 인식률이 향상되고 사용자 취향을 보다 정확하게 이해할 수 있게 되어, 기존 규칙 기반의 스마트 시스템은 점차 심층 학습(Deep Learning) 기반 인공지능 시스템으로 대체되고 있다.An artificial intelligence (AI) system is a computer system that implements human-level intelligence, and unlike the existing rule-based smart system, the machine learns and makes decisions on its own. The more the AI system is used, the better the recognition rate and the more accurate understanding of user preferences, and the existing rule-based smart systems are gradually being replaced by deep learning-based AI systems.

인공지능 기술은 기계 학습 및 기계 학습을 활용한 요소기술들로 구성된다. 기계 학습은 입력 데이터들의 특징을 스스로 분류/학습하는 알고리즘 기술이며, 요소기술은 심층 학습 등의 기계 학습 알고리즘을 활용하여 인간 두뇌의 인지, 판단 등의 기능을 모사하는 기술로서, 언어적 이해, 시각적 이해, 추론/예측, 지식 표현, 동작 제어 등의 기술 분야로 구성된다.Artificial intelligence technology consists of machine learning and element technologies using machine learning. Machine learning is an algorithm technology that categorizes/learns the characteristics of input data by itself, and element technology uses machine learning algorithms such as deep learning to simulate functions such as cognition and judgment of the human brain. It consists of technical fields such as understanding, reasoning/prediction, knowledge expression, and motion control.

인공지능 기술이 응용되는 다양한 분야는 다음과 같다. 언어적 이해는 인간의 언어/문자를 인식하고 응용/처리하는 기술로서, 자연어 처리, 기계 번역, 대화시스템, 질의 응답, 음성 인식/합성 등을 포함한다. 시각적 이해는 사물을 인간의 시각처럼 인식하여 처리하는 기술로서, 객체 인식, 객체 추적, 영상 검색, 사람 인식, 장면 이해, 공간 이해, 영상 개선 등을 포함한다. 추론 예측은 정보를 판단하여 논리적으로 추론하고 예측하는 기술로서, 지식/확률 기반 추론, 최적화 예측, 선호 기반 계획, 추천 등을 포함한다. 지식 표현은 인간의 경험정보를 지식데이터로 자동화 처리하는 기술로서, 지식 구축(데이터 생성/분류), 지식 관리(데이터 활용) 등을 포함한다. 동작 제어는 차량의 자율 주행, 로봇의 움직임을 제어하는 기술로서, 움직임 제어(항법, 충돌, 주행), 조작 제어(행동 제어) 등을 포함한다.The various fields where artificial intelligence technology is applied are as follows. Linguistic understanding is a technology for recognizing and applying/processing human language/text, and includes natural language processing, machine translation, dialogue system, question and answer, and speech recognition/synthesis. Visual understanding is a technology for recognizing and processing objects like human vision, and includes object recognition, object tracking, image search, human recognition, scene understanding, spatial understanding, image improvement, and the like. Inference prediction is a technology for logically reasoning and predicting by judging information, and includes knowledge/probability-based reasoning, optimization prediction, preference-based planning, and recommendation. Knowledge expression is a technology that automatically processes human experience information into knowledge data, and includes knowledge construction (data generation/classification) and knowledge management (data utilization). Motion control is a technology for controlling autonomous driving of a vehicle and movement of a robot, and includes motion control (navigation, collision, driving), manipulation control (action control), and the like.

일반적으로 기계 학습 알고리즘을 실생활에 적용하기 위해서는 기계 학습의 기본 방법론의 특성상 Trial and Error 방식으로 학습을 수행하게 된다. 특히, 심층 학습의 경우 수십만 번의 반복 실행을 필요로 한다. 이를 실제 물리적인 외부 환경에서 실행하기는 불가능하여 대신 실제 물리적인 외부 환경을 컴퓨터상에서 가상으로 구현하여 시뮬레이션을 통해 학습을 수행한다.In general, in order to apply a machine learning algorithm to real life, learning is performed in a trial and error method due to the characteristics of the basic methodology of machine learning. In particular, deep learning requires hundreds of thousands of iterations. It is impossible to execute this in the actual physical external environment, so instead, the actual physical external environment is implemented on a computer and learning is performed through simulation.

한편, 일반적으로 선박의 제조 공정에서 호선별 자재 내역을 선박 제작 시 기능에 따라 속성 그룹으로 분류하여 관리하고 있으나 사람이 수기로 분류하여 기록하다 보니 실수로 다른 속성 그룹으로 분류되는 경우가 발생할 여지가 있다. 이로 인해, 후 공정에서 잘못된 위치에 자재를 배치하는 오류가 발생하고 이를 정정하는데 불필요한 시수가 낭비되는 문제점이 있다.On the other hand, in general, in the manufacturing process of a ship, the material details for each ship are classified and managed into attribute groups according to the function when manufacturing the ship. there is. Due to this, there is a problem in that an error of arranging the material in the wrong position in the subsequent process occurs, and unnecessary time is wasted to correct it.

따라서, 본 발명은 리스트로 입력된 명칭 및 그에 대응하는 속성 정보를 아스키 코드로 변환하고 기계 학습 알고리즘을 이용하여 명칭과 그에 대응하는 속성 간의 관계를 출력 데이터로 형성할 수 있는 리스트 분류 시스템 및 방법, 동 방법을 컴퓨터에서 실행하기 위한 컴퓨터 프로그램이 기록된, 컴퓨터 판독 가능한 기록 매체를 제공한다.Accordingly, the present invention provides a list classification system and method capable of converting a name input into a list and attribute information corresponding thereto into an ASCII code and forming a relationship between a name and a corresponding attribute as output data using a machine learning algorithm; There is provided a computer-readable recording medium in which a computer program for executing the method in a computer is recorded.

본 발명의 실시예에 따른 리스트 분류 방법은, 명칭 및 대응하는 속성 정보를 포함하는 학습 데이터를 입력 받는 단계; 상기 학습 데이터를 아스키(Ascii) 코드로 인코딩하는 단계; 및 상기 인코딩된 학습 데이터를 기계 학습 알고리즘을 이용하여 상기 명칭과 상기 대응하는 속성 정보 사이의 상관 관계를 출력 데이터로 추출하는 단계를 포함한다.A list classification method according to an embodiment of the present invention includes: receiving learning data including a name and corresponding attribute information; encoding the training data into an ASCII code; and extracting a correlation between the name and the corresponding attribute information as output data using a machine learning algorithm for the encoded learning data.

또한, 소정 명칭을 입력 받는 단계; 및 상기 출력 데이터를 이용하여 입력된 명칭에 대응하는 속성 정보를 추출하는 단계를 더 포함할 수 있다.In addition, the step of receiving a predetermined name input; and extracting attribute information corresponding to the input name by using the output data.

또한, 소정 명칭 및 대응하는 속성 정보를 포함하는 리스트를 입력 받는 단계; 및 상기 출력 데이터를 이용하여 상기 소정 명칭 및 상기 대응하는 속성 정보가 잘못 매칭된 상기 리스트의 오류를 검출하는 단계를 포함할 수 있다.In addition, receiving a list including a predetermined name and corresponding attribute information as input; and detecting an error in the list in which the predetermined name and the corresponding attribute information are erroneously matched by using the output data.

또한, 상기 출력 데이터로 추출하는 단계는, 상기 인코딩된 학습 데이터의 아스키 코드 배열을 시계열 신호 데이터와 같이 입력 받아 상기 명칭과 상기 대응하는 속성 정보 사이의 상관 관계를 출력 데이터로 추출하는 단계를 포함할 수 있다.In addition, the step of extracting the output data may include the step of receiving the ASCII code array of the encoded learning data as time series signal data and extracting the correlation between the name and the corresponding attribute information as output data. can

또한, 상기 명칭은, 조선업에서 사용되는 호선별 자재 내역을 포함하고, 상기 대응하는 속성 정보는, 상기 자재 내역에 해당하는 속성 그룹을 포함할 수 있다.In addition, the name includes the material details for each ship used in the shipbuilding industry, and the corresponding attribute information may include an attribute group corresponding to the material details.

또한, 상기 기계 학습 알고리즘은, 조밀 KNN(K-Nearest Neighbor) 알고리즘, 가중 KNN 알고리즘, 앙상블 배깅트리(Bagging Tree) 알고리즘, 앙상블 부분공간 KNN 알고리즘 중 적어도 하나를 포함할 수 있다.Also, the machine learning algorithm may include at least one of a K-Nearest Neighbor (KNN) algorithm, a weighted KNN algorithm, an ensemble bagging tree algorithm, and an ensemble subspace KNN algorithm.

본 발명의 실시예에 따른 리스트 분류 시스템은, 명칭 및 대응하는 속성 정보를 포함하는 학습 데이터를 저장하는 데이터베이스; 상기 학습 데이터를 수신하고, 상기 학습 데이터에 포함된 명칭을 아스키(Ascii) 코드로 인코딩하며, 상기 인코딩된 명칭을 기계 학습 알고리즘을 이용하여 상기 명칭과 상기 대응하는 속성 정보 사이의 상관 관계를 출력 데이터로 추출하는 서버를 포함할 수 있다.A list classification system according to an embodiment of the present invention includes: a database for storing learning data including a name and corresponding attribute information; Receive the training data, encode the name included in the training data into an ASCII code, and output the correlation between the name and the corresponding attribute information using the encoded name using a machine learning algorithm It may include a server that extracts to .

또한, 상기 서버는, 소정 명칭을 입력 받고, 상기 출력 데이터를 이용하여 입력된 명칭에 대응하는 속성 정보를 추출하도록 동작할 수 있다.Also, the server may operate to receive a predetermined name and extract attribute information corresponding to the inputted name using the output data.

또한, 상기 서버는, 소정 명칭 및 대응하는 속성 정보를 포함하는 리스트를 입력 받고, 상기 출력 데이터를 이용하여 상기 소정 명칭 및 상기 대응하는 속성 정보가 잘못 매칭된 상기 리스트의 오류를 검출하도록 동작할 수 있다.In addition, the server may operate to receive a list including a predetermined name and corresponding attribute information, and use the output data to detect an error in the list in which the predetermined name and the corresponding attribute information are erroneously matched. there is.

또한, 상기 서버는, 상기 인코딩된 학습 데이터의 아스키 코드 배열을 시계열 신호 데이터와 같이 입력 받아 상기 명칭과 상기 대응하는 속성 정보 사이의 상관 관계를 출력 데이터로 추출하도록 동작할 수 있다.Also, the server may operate to receive the ASCII code arrangement of the encoded learning data as time series signal data and extract a correlation between the name and the corresponding attribute information as output data.

한편 본 발명의 실시예에 따른 컴퓨터 판독 가능한 기록매체는, 전술한 리스트 분류 방법을 컴퓨터에서 실행하기 위한 컴퓨터 프로그램이 기록될 수 있다.Meanwhile, in the computer-readable recording medium according to an embodiment of the present invention, a computer program for executing the above-described list classification method in a computer may be recorded.

본 발명에 의하면, 아스키 코드를 분류가 쉬운 시계열 신호로 활용하여 보다 용이하게 기계 학습을 수행할 수 있어서 리스트에 포함된 명칭과 그에 대응하는 속성 사이의 상관 관계를 손쉽게 파악할 수 있다. 또한, 자연어 처리나 심층 학습(Deep Learning)과 같은 복잡한 알고리즘이 아닌 KNN(K-Nearest Neighbor)이나 앙상블(Ensemble)과 같은 비교적 단순한 기계 학습 알고리즘을 이용하여 보다 정확하고 빠른 기계 학습을 수행할 수 있다.According to the present invention, machine learning can be performed more easily by using an ASCII code as a time-series signal that is easy to classify, so that a correlation between a name included in a list and an attribute corresponding thereto can be easily grasped. In addition, more accurate and faster machine learning can be performed using relatively simple machine learning algorithms such as KNN (K-Nearest Neighbor) or ensemble, rather than complex algorithms such as natural language processing or deep learning. .

도 1은 본 발명의 실시예에 따른 리스트 분류 시스템의 구성을 보이는 예시도이다.
도 2는 본 발명의 실시예에 따른 서버의 구성을 보이는 예시도이다.
도 3은 본 발명의 실시예에 따른 리스트 분류 방법의 절차를 보이는 흐름도이다.
도 4는 자연어 처리 없이 자재 내역을 아스키 코드로 변환된 신호형태로 표현하는 경우 피처(feature)에 대한 학습 데이터의 확률 분포를 표시한 도면이다.1 is an exemplary diagram showing the configuration of a list classification system according to an embodiment of the present invention.
2 is an exemplary diagram showing the configuration of a server according to an embodiment of the present invention.
3 is a flowchart illustrating a procedure of a list classification method according to an embodiment of the present invention.
4 is a diagram showing the probability distribution of learning data for features when the material details are expressed in the form of signals converted to ASCII codes without natural language processing.

이하, 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예들에 한정되지 않는다.Hereinafter, with reference to the accompanying drawings, the embodiments of the present invention will be described in detail so that those of ordinary skill in the art to which the present invention pertains can easily implement them. The present invention may be embodied in several different forms and is not limited to the embodiments described herein.

이하 도면을 참조하여 본 발명의 실시예에 대해서 구체적으로 설명한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시예에 따른 리스트 분류 시스템의 구성을 보이는 예시도이다.1 is an exemplary diagram showing the configuration of a list classification system according to an embodiment of the present invention.

도 1에 도시한 바와 같이, 리스트 분류 시스템(100)은 다수의 사용자 단말(110-1, ... , 110-n), 서버(120), 데이터베이스(130) 및 네트워크(N)를 포함할 수 있다. 일 실시예에 따르면, 데이터베이스(130)는 서버(120)와는 별도로 클라우드(Cloud) 환경에서 구현될 수도 있지만 이에 한정되지 않고, 데이터베이스(130)가 서버(120)내에 구비될 수도 있다. 예를 들어, 다수의 사용자 단말(110-1, ... , 110-n), 서버(120) 및 데이터베이스(130)는 네트워크(N)를 통하여 서로 통신 가능하도록 연결될 수 있다. 일 실시예에 따르면, 리스트 분류 시스템(100)은 호선별 자재 내역(명칭)과 그에 대응하는 속성 그룹 정보를 리스트로 입력 받아서 기계 학습 알고리즘을 이용하여 명칭과 그에 대응하는 속성 간의 상관 관계를 추출할 수 있다.1, the list classification system 100 includes a plurality of user terminals 110-1, ..., 110-n, a server 120, a database 130, and a network (N). can According to an embodiment, the database 130 may be implemented in a cloud environment separately from the server 120 , but is not limited thereto, and the database 130 may be provided in the server 120 . For example, the plurality of user terminals 110-1, ..., 110-n, the server 120, and the database 130 may be connected to each other to be able to communicate with each other through the network N. According to an embodiment, the list classification system 100 receives the material details (names) and corresponding attribute group information for each line as a list, and uses a machine learning algorithm to extract the correlation between the name and the corresponding attribute. can

예를 들어, 현업의 호선별 자재 구분별 이상 비율은 표 1에 기재된 바와 같이 나타날 수 있다.For example, the rate of abnormality by material category by ship line in the field can be shown as shown in Table 1.

전체all 정상normal 이상More than 비율(%)ratio(%) 철의장재iron fittings 27,52527,525 25,53325,533 1,9921,992 7.87.8 전장재electrical equipment 713713 491491 222222 45.245.2 배관재plumbing material 29,10729,107 24,27324,273 4,8344,834 19.919.9 선실재cabin material 608608 549549 5959 10.710.7 보온재lagging 187187 140140 4747 33.633.6 제작품products 1,7001,700 1,6851,685 1515 0.90.9

다수의 사용자 단말(110-1, ... , 110-n)은, 조선소 등에서 사용자로부터 호선별 자재 내역(명칭) 및 그에 대응하는 속성 그룹에 대한 정보를 입력 받아 자재 내역 리스트를 형성할 수 있다. 또한, 다수의 사용자 단말(110-1, ... , 110-n)은, 실시간 또는 비 실시간으로 형성된 자재 내역 리스트를 네트워크(N)를 통하여 서버(120) 및/또는 데이터베이스(130)로 전송할 수 있다.서버(120)는, 다수의 사용자 단말(110-1, ... , 110-n)로부터 자재 내역 리스트를 학습 데이터로서 추출할 수 있다. 일 실시예에 따르면, 서버(120)는, 표 2와 같이 특정 호선의 철의장재에 대한 자재 내역 리스트를 학습 데이터로 추출할 수 있지만, 이에 한정되지 않는다.A plurality of user terminals (110-1, ..., 110-n) may form a list of material details by receiving information on material details (names) for each ship and attribute groups corresponding thereto from users in shipyards, etc. . In addition, a plurality of user terminals (110-1, ..., 110-n), to transmit the material details list formed in real time or non-real time to the server 120 and / or the database 130 via the network (N) The server 120 may extract a list of material details from a plurality of user terminals 110-1, ..., 110-n as learning data. According to one embodiment, the server 120, as shown in Table 2, may extract a list of material details for iron fittings of a specific line as learning data, but is not limited thereto.

AA BB 1One STANDARD DOOR CUSHIONSTANDARD DOOR CUSHION AJDCAJDC 22 COSMO STANDARD STOPPER SETCOSMO STANDARD STOPPER SET AJDSAJDS ...... ...... ...... 2752427524 BOTTOM PLUG PACKING LEAD 4.0T(+0.2~-0.2) OD63XID55BOTTOM PLUG PACKING LEAD 4.0T(+0.2~-0.2) OD63XID55 FRBPFRBP 2752527525 BOTTOM PLUG SOCKET D-GRADE 35T DSE-D2074SBOTTOM PLUG SOCKET D-GRADE 35T DSE-D2074S FRBPFRBP

예를 들어, 표 2의 학습 데이터는 총 27,525건의 데이터 중 노이즈 정보 5건을 삭제한 27,520건의 데이터 수를 포함할 수 있고, 분류 속성 그룹은 219개를 포함할 수 있다. 또한, 서버(120)는, 추출된 학습 데이터에 포함된 자재 내역(명칭) 및 그에 대응하는 속성 그룹을 아스키(Ascii) 코드로 인코딩할 수 있다. 기존 인코딩 방식으로 자재 내역과 속성 그룹의 키워드들을 추출하여 인덱스(index) 값을 이용하려 하였으나 다음의 이유로 아스키 코드를 이용한 인코딩 방식을 이용한다. For example, the training data of Table 2 may include the number of 27,520 pieces of data in which 5 pieces of noise information are deleted out of a total of 27,525 pieces of data, and the classification attribute group may include 219 pieces. In addition, the server 120 may encode a material detail (name) and a corresponding attribute group included in the extracted learning data into an ASCII code. An attempt was made to use the index value by extracting the keywords of the material description and attribute group using the existing encoding method, but the encoding method using the ASCII code is used for the following reasons.

우선, 새로운 단어 추가 시 인덱스를 업데이트할 필요가 없고, 개발 자원을 최소화하기 위하여 자연어 처리를 배제하였으나 언어의 시계열적 특징을 살리기 위해서이며, 문제의 데이터 속성을 직관적으로 분류가 쉬운 시계열 신호로 기계 학습(Machine Learning)하는 것이 유리하기 때문이다.First, there is no need to update the index when new words are added, natural language processing is excluded to minimize development resources, but this is to preserve the time-series characteristics of the language. Because it is advantageous to do (Machine Learning).

또한, 도 4에 도시한 바와 같이, 자연어 처리 없이 학습 데이터를 아스키 코드로 변환된 신호 형태로 표현할 경우 피처 맵(Feature map)이 기계 학습으로 분류하기 유리한 형태로 분포함을 확인할 수 있다. In addition, as shown in FIG. 4 , when the learning data is expressed in the form of a signal converted into an ASCII code without natural language processing, it can be confirmed that the feature map is distributed in a form advantageous for classification by machine learning.

표 3은 자재 내역 및 그에 대응하는 속성 그룹을 아스키 코드로 인코딩한 결과를 보이는 예시도이다.Table 3 is an example diagram showing the result of encoding material details and corresponding attribute groups into ASCII codes.

AA BB CC DD EE ...... AAAA ABAB ACAC ADAD AEAE 1One 8989 9494 6565 7777 4343 ...... 5454 22 4747 7878 8383 7474 5656 ...... 9696 8787 33 3232 7878 4545 5858 9797 ...... 6464 5858 7272 44 6868 3232 7979 4242 5252 ...... 2323 3737 9494 4545 55 4545 2424 7676 8585 3434 ...... 6565 8787 9292 2323 3838 ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ......

또한, 서버(120)는 인코딩된 학습 데이터를 기계 학습 알고리즘을 이용하여 자재 내역(명칭)과 그에 대응하는 속성 정보 사이의 상관 관계를 출력 데이터로 추출할 수 있다. 일 실시예에 따르면, 서버(120)는, 인코딩된 학습 데이터의 아스키 코드 배열을 시계열 신호 데이터와 같이 입력 받아 자재 내역(명칭)과 그에 대응하는 속성 정보 사이의 상관 관계를 출력 데이터로 추출할 수 있지만, 이에 한정되지 않는다. 표 4는 사용된 기계 학습 알고리즘과 학습 완료에 소요된 학습시간 및 예측 정확도 사이의 관계를 나타낸다.In addition, the server 120 may extract the correlation between the material details (name) and the attribute information corresponding thereto as output data by using the encoded learning data and a machine learning algorithm. According to an embodiment, the server 120 receives the ASCII code arrangement of the encoded learning data as input as time series signal data, and extracts the correlation between the material details (name) and the corresponding attribute information as output data. However, the present invention is not limited thereto. Table 4 shows the relationship between the machine learning algorithm used and the learning time required to complete the training and the prediction accuracy.

알고리즘algorithm 학습시간study time 예측 정확도Prediction Accuracy 조밀 KNNDense KNN 20분(1,202초)20 minutes (1,202 seconds) 96.3%96.3% 가중 KNNweighted KNN 21분(1,265초)21 minutes (1,265 seconds) 96.1%96.1% 앙상블(배깅트리)Ensemble (Bagging Tree) 27분(1,649초)27 minutes (1,649 seconds) 95.9%95.9% 앙상블(부분공간 KNN)Ensemble (subspace KNN) 28.6분(1,717초)28.6 minutes (1,717 seconds) 95.9%95.9%

표 4에 나타난 결과에 따르면, 현업의 실무 데이터 중에는 심층학습 없이 데이터 인코딩만으로 예측 정확도 95% 이상의 기계 학습 모델 개발이 가능한 문제가 다수 존재함을 확인할 수 있다.데이터베이스(130)는, 리스트 분류 환경의 구현을 위한 다양한 데이터를 저장할 수 있다. 데이터베이스(130)에 저장되는 데이터는, 다수의 사용자 단말(110-1, ... , 110-n), 및 서버(120)의 적어도 하나의 구성요소에 의해 획득되거나, 처리되거나, 사용되는 데이터로서, 소프트웨어(예를 들어: 프로그램)를 포함할 수 있다. 데이터베이스(130)는, 휘발성 및/또는 비휘발성 메모리를 포함할 수 있다. 또한, 데이터베이스(130)는 클라우드 환경에서 구현될 수 있어서 데이터 누적 시 서버(120)의 용량이 제한되는 것을 해결할 수 있지만, 데이터베이스(130)는 이에 한정되지 않고, 클라우드 인프라(Infra) 및 매니지드(Managed) 서비스 기반으로 고 가용성의 확장성이 높은 다양한 시스템을 포함할 수 있다. 일 실시예로서, 데이터베이스(130)는, 호선별 자재 내역 및 그에 대응하는 속성 그룹에 대한 정보를 매핑하여 매핑 테이블 형태로 저장할 수 있지만, 이에 한정되지 않는다.According to the results shown in Table 4, it can be confirmed that there are a number of problems in which a machine learning model with a prediction accuracy of 95% or more can be developed only by data encoding without deep learning among the actual data in the field. The database 130 is the It can store various data for implementation. Data stored in the database 130, a plurality of user terminals (110-1, ..., 110-n), and data obtained, processed, or used by at least one component of the server 120 As such, it may include software (eg: a program). The database 130 may include volatile and/or non-volatile memory. In addition, the database 130 can be implemented in a cloud environment, so that the capacity of the server 120 is limited when data is accumulated, but the database 130 is not limited thereto, and cloud infrastructure and managed ) service-based, it can include various systems with high availability and high scalability. As an embodiment, the database 130 may be stored in the form of a mapping table by mapping the material details for each line and information on the attribute group corresponding thereto, but is not limited thereto.

본 발명에서, 프로그램은 데이터베이스(130)에 저장되는 소프트웨어로서, 다수의 사용자 단말(110-1, ... , 110-n), 및 서버(120)의 리소스를 제어하기 위한 운영체제, 어플리케이션 및/또는 어플리케이션이 다수의 사용자 단말(110-1, ... , 110-n), 및 서버(120)의 리소스들을 활용할 수 있도록 다양한 기능을 어플리케이션에 제공하는 미들 웨어 등을 포함할 수 있다.In the present invention, the program is software stored in the database 130, a plurality of user terminals (110-1, ..., 110-n), and an operating system for controlling the resources of the server 120, applications and / Alternatively, the application may include middleware that provides various functions to the application so that the application can utilize the resources of the plurality of user terminals 110-1, ..., 110-n, and the server 120, and the like.

네트워크(N)는 다수의 사용자 단말(110-1, ... , 110-n), 서버(120), 및 데이터베이스(130) 간의 무선 또는 유선 통신을 수행할 수 있다. 예를 들어, 네트워크(N)는 LTE(long-term evolution), LTE-A(LTE Advanced), CDMA(code division multiple access), WCDMA(wideband CDMA), WiBro(Wireless BroadBand), WiFi(wireless fidelity), 블루투스(Bluetooth), NFC(near field communication), GPS(Global Positioning System) 또는 GNSS(global navigation satellite system) 등의 방식에 따른 무선 통신을 수행할 수 있다. 예를 들어, 네트워크(N)는 USB(universal serial bus), HDMI(high definition multimedia interface), RS-122(recommended standard 122) 또는 POTS(plain old telephone service) 등의 방식에 따른 유선 통신을 수행할 수도 있다.The network N may perform wireless or wired communication between a plurality of user terminals 110 - 1 , ... , 110 -n, the server 120 , and the database 130 . For example, the network N is a long-term evolution (LTE), LTE Advanced (LTE-A), code division multiple access (CDMA), wideband CDMA (WCDMA), Wireless BroadBand (WiBro), wireless fidelity (WiFi) , Bluetooth (Bluetooth), near field communication (NFC), GPS (Global Positioning System), or GNSS (global navigation satellite system) according to a method such as wireless communication can be performed. For example, the network N may perform wired communication according to a method such as universal serial bus (USB), high definition multimedia interface (HDMI), recommended standard 122 (RS-122), or plain old telephone service (POTS). may be

본 발명에서, 인공지능(Artificial Intelligence, AI)은 인간의 학습능력, 추론능력, 지각능력 등을 모방하고, 이를 컴퓨터로 구현하는 기술을 의미하고, 기계 학습, 심볼릭 로직(Symbolic Logic) 등의 개념을 포함할 수 있다. 기계 학습(Machine Learning)은 입력 데이터들의 특징을 스스로 분류 또는 학습하는 알고리즘 기술이다. 인공지능의 기술은 기계 학습의 알고리즘으로써 입력 데이터를 분석하고, 그 분석의 결과를 학습하며, 그 학습의 결과에 기초하여 판단이나 예측을 할 수 있다. 또한, 기계 학습의 알고리즘을 활용하여 인간 두뇌의 인지, 판단 등의 기능을 모사하는 기술들 역시 인공지능의 범주로 이해될 수 있다. 예를 들어, 언어적 이해, 시각적 이해, 추론/예측, 지식 표현, 동작 제어 등의 기술 분야가 포함될 수 있다.In the present invention, artificial intelligence (AI) refers to a technology that imitates human learning ability, reasoning ability, perceptual ability, etc., and implements it with a computer, and concepts such as machine learning and symbolic logic may include Machine Learning is an algorithm technology that classifies or learns characteristics of input data by itself. Artificial intelligence technology is an algorithm of machine learning that can analyze input data, learn the results of the analysis, and make judgments or predictions based on the results of the learning. In addition, technologies that use machine learning algorithms to simulate functions such as cognition and judgment of the human brain can also be understood as a category of artificial intelligence. For example, technical fields such as verbal comprehension, visual comprehension, reasoning/prediction, knowledge expression, and motion control may be included.

기계 학습은 데이터를 처리한 경험을 이용해 신경망 모델을 훈련시키는 처리를 의미할 수 있다. 기계 학습을 통해 컴퓨터 소프트웨어는 스스로 데이터 처리 능력을 향상시키는 것을 의미할 수 있다. 신경망 모델은 데이터 사이의 상관 관계를 모델링하여 구축된 것으로서, 그 상관 관계는 복수의 파라미터에 의해 표현될 수 있다. 신경망 모델은 주어진 데이터로부터 특징들을 추출하고 분석하여 데이터 간의 상관 관계를 도출하는데, 이러한 과정을 반복하여 신경망 모델의 파라미터를 최적화해 나가는 것이 기계 학습이라고 할 수 있다. 예를 들어, 신경망 모델은 입출력 쌍으로 주어지는 데이터에 대하여, 입력과 출력 사이의 매핑(상관 관계)을 학습할 수 있다. 또는, 신경망 모델은 입력 데이터만 주어지는 경우에도 주어진 데이터 사이의 규칙성을 도출하여 그 관계를 학습할 수도 있다.Machine learning may refer to the processing of training a neural network model using the experience of processing data. With machine learning, computer software could mean improving its own data processing capabilities. The neural network model is constructed by modeling the correlation between data, and the correlation may be expressed by a plurality of parameters. A neural network model extracts and analyzes features from given data to derive correlations between data, and repeating this process to optimize parameters of a neural network model can be called machine learning. For example, the neural network model may learn a mapping (correlation) between an input and an output with respect to data given as an input/output pair. Alternatively, the neural network model may learn the relationship by deriving regularity between the given data even when only input data is given.

인공지능 학습모델 또는 신경망 모델은 인간의 뇌 구조를 컴퓨터 상에서 구현하도록 설계될 수 있으며, 인간의 신경망의 뉴런(neuron)을 모의하며 가중치를 가지는 복수의 네트워크 노드들을 포함할 수 있다. 복수의 네트워크 노드들은 뉴런이 시냅스(synapse)를 통하여 신호를 주고받는 뉴런의 시냅틱(synaptic) 활동을 모의하여, 서로 간의 연결 관계를 가질 수 있다. 인공지능 학습모델에서 복수의 네트워크 노드들은 서로 다른 깊이의 레이어에 위치하면서 컨볼루션(convolution) 연결 관계에 따라 데이터를 주고받을 수 있다. 인공지능 학습모델은, 예를 들어, 인공 신경망 모델(Artificial Neural Network), 컨볼루션 신경망 모델(Convolution Neural Network: CNN) 등일 수 있다. 일 실시예로서, 인공지능 학습모델은, 지도학습(Supervised Learning), 비지도 학습(Unsupervised Learning), 강화 학습(Reinforcement Learning), 앙상블 학습(Ensemble Learning) 등의 방식에 따라 기계 학습될 수 있다. 기계 학습을 수행하기 위한 기계 학습 알고리즘에는, K-최근접 이웃(KNN: K-Nearest Neighbor) 알고리즘, 의사결정트리(Decision Tree), 베이지안 망(Bayesian Network), 서포트 벡터 머신(Support Vector Machine), 인공 신경망(Artificial Neural Network), 에이다부스트(Ada-boost), 퍼셉트론(Perceptron), 유전자 프로그래밍(Genetic Programming), 군집화(Clustering) 등이 사용될 수 있다.The artificial intelligence learning model or neural network model may be designed to implement a human brain structure on a computer, and may include a plurality of network nodes that simulate neurons of a human neural network and have weights. A plurality of network nodes may have a connection relationship with each other by simulating a synaptic activity of a neuron through which a neuron sends and receives a signal through a synapse. In the AI learning model, a plurality of network nodes can exchange data according to a convolutional connection relationship while being located in layers of different depths. The artificial intelligence learning model may be, for example, an artificial neural network model, a convolutional neural network model, or the like. As an embodiment, the artificial intelligence learning model may be machine-learned according to methods such as supervised learning, unsupervised learning, reinforcement learning, ensemble learning, and the like. Machine learning algorithms for performing machine learning include K-Nearest Neighbor (KNN) algorithm, Decision Tree, Bayesian Network, Support Vector Machine, Artificial Neural Network, Ada-boost, Perceptron, Genetic Programming, Clustering, etc. may be used.

이중, KNN 알고리즘은 인스턴스 기반 학습 또는 지연 학습의 한 유형으로, 함수는 로컬로만 근사화되고 모든 계산은 함수 평가까지 연기될 수 있다. KNN 알고리즘은 분류를 위해 거리에 의존하기 때문에 훈련 데이터를 정규화하면 정확도를 크게 향상시킬 수 있다. 분류와 회귀 모두에 대해 유용한 기술은 이웃의 기여도에 가중치를 할당하여 가까운 이웃이 더 먼 이웃보다 평균에 더 많이 기여하도록 할 수 있다. 예를 들어, 공통 가중치 체계는 각 이웃에 1/d의 가중치를 부여하는 것으로 구성될 수 있다. 여기서 d는 이웃까지의 거리를 나타낸다. 이웃은 클래스(KNN 분류의 경우) 또는 객체 속성 값(KNN 회귀의 경우)이 알려진 객체 세트에서 가져올 수 있다. 이것은 명시적인 훈련 단계가 필요하지 않지만 알고리즘에 대한 훈련 세트로 생각할 수 있다. KNN 알고리즘의 특징은 데이터의 로컬 구조에 민감하다는 것이다.Among them, the KNN algorithm is a type of instance-based learning or lazy learning, in which functions are approximated only locally and all computations can be deferred until function evaluation. Because KNN algorithms rely on distance for classification, normalizing the training data can significantly improve accuracy. A useful technique for both classification and regression is to assign weights to the contributions of neighbors, such that close neighbors contribute more to the mean than more distant neighbors. For example, a common weight scheme could consist of giving each neighbor a weight of 1/d. where d is the distance to the neighbor. Neighbors can come from a set of objects whose classes (for KNN classification) or object attribute values (for KNN regression) are known. It does not require an explicit training step, but can be thought of as a training set for the algorithm. A characteristic of the KNN algorithm is that it is sensitive to the local structure of the data.

CNN은 최소한의 전처리(preprocess)를 사용하도록 설계된 다계층 퍼셉트론(multilayer perceptrons)의 한 종류이다. CNN은 하나 또는 여러 개의 합성곱 계층과 그 위에 올려진 일반적인 인공 신경망 계층들로 이루어져 있으며, 가중치와 통합 계층(pooling layer)들을 추가로 활용한다. 이러한 구조 덕분에 CNN은 2차원 구조의 입력 데이터를 충분히 활용할 수 있다. 다른 딥러닝 구조들과 비교해서, CNN은 영상, 음성 분야 모두에서 좋은 성능을 보여준다. CNN은 또한 표준 역전달을 통해 훈련될 수 있다. CNN은 다른 피드포워드 인공신경망 기법들보다 쉽게 훈련되는 편이고 적은 수의 매개변수를 사용한다는 이점이 있다.CNNs are a type of multilayer perceptrons designed to use minimal preprocessing. CNN consists of one or several convolutional layers and general artificial neural network layers on top of it, and additionally utilizes weights and pooling layers. Thanks to this structure, CNN can fully utilize the input data of the two-dimensional structure. Compared with other deep learning structures, CNN shows good performance in both video and audio fields. CNNs can also be trained through standard back-passing. CNNs are easier to train than other feed-forward neural network techniques and have the advantage of using fewer parameters.

컨볼루션 네트워크는 묶인 파라미터들을 가지는 노드들의 집합들을 포함하는 신경 네트워크들이다. 사용 가능한 트레이닝 데이터의 크기 증가와 연산 능력의 가용성이, 구분적 선형 단위 및 드롭아웃 트레이닝과 같은 알고리즘 발전과 결합되어, 많은 컴퓨터 비전 작업들이 크게 개선되었다. 오늘날 많은 작업에 사용할 수 있는 데이터 세트들과 같은 엄청난 양의 데이터 세트에서는 초과 맞춤(outfitting)이 중요하지 않으며, 네트워크의 크기를 늘리면 테스트 정확도가 향상된다. 컴퓨팅 리소스들의 최적 사용은 제한 요소가 된다. 이를 위해, 심층 신경 네트워크들의 분산된, 확장 가능한 구현예가 사용될 수 있다.Convolutional networks are neural networks that contain sets of nodes with bound parameters. The increasing size of available training data and the availability of computational power, coupled with advances in algorithms such as piecewise linear units and dropout training, have greatly improved many computer vision tasks. For huge data sets, such as those available for many tasks today, overfitting is not important, and increasing the size of the network improves test accuracy. Optimal use of computing resources becomes a limiting factor. To this end, a distributed, scalable implementation of deep neural networks may be used.

도 2는 본 발명의 실시예에 따른 서버의 구성을 보이는 예시도이다.2 is an exemplary diagram showing the configuration of a server according to an embodiment of the present invention.

도 2에 도시한 바와 같이, 서버(120)는 수신부(121), 프로세서(122), 송신부(123), 시스템 버스(124), 디지털 패킷(125) 및 데이터베이스(130)를 포함할 수 있다. 일 실시예로서, 수신부(121), 프로세서(122), 송신부(123), 디지털 패킷(125) 및 데이터베이스(130)는 시스템 버스(124)를 이용하여 통신 가능하도록 서로 연결될 수 있고, 서버(120)의 이 구성요소들 중 적어도 하나가 생략되거나, 다른 구성요소가 서버(120)에 추가될 수 있다. 아울러, 추가적으로(additionally) 또는 대체적으로(alternatively), 일부의 구성요소들이 통합되어 구현되거나, 단수 또는 복수의 개체로 구현될 수 있다.As shown in FIG. 2 , the server 120 may include a receiver 121 , a processor 122 , a transmitter 123 , a system bus 124 , a digital packet 125 , and a database 130 . As an embodiment, the receiver 121 , the processor 122 , the transmitter 123 , the digital packet 125 , and the database 130 may be connected to each other so as to be able to communicate using the system bus 124 , and the server 120 . ), at least one of these components may be omitted, or another component may be added to the server 120 . In addition, additionally (additionally) or alternatively (alternatively), some of the components may be implemented as integrated, or may be implemented as a singular or a plurality of entities.

수신부(121)는, 네트워크(N)를 통하여 다수의 사용자 단말(110-1, ... , 110-n)로부터 자재 내역(명칭) 및 그에 대응하는 속성 그룹에 대한 정보를 포함하는 자재 내역 리스트를 디지털 패킷(125)의 형태로 수신하여 프로세서(122) 및 데이터베이스(130)로 전송할 수 있다.The receiving unit 121, a list of material details including information on material details (names) and corresponding attribute groups from a plurality of user terminals 110-1, ..., 110-n through a network N may be received in the form of a digital packet 125 and transmitted to the processor 122 and the database 130 .

프로세서(122)는, 수신된 자재 내역 리스트를 학습 데이터로 하여 아스키 코드로 인코딩을 수행하도록 동작할 수 있다. 일 실시예에 따르면, 프로세서(122)는, 인코딩된 학습 데이터의 아스키 코드 배열을 시계열 신호 데이터와 같이 입력 받을 수 있지만, 이에 한정되지 않는다.The processor 122 may operate to perform encoding in ASCII codes using the received material detail list as training data. According to an embodiment, the processor 122 may receive an ASCII code arrangement of the encoded learning data as time series signal data, but is not limited thereto.

또한, 프로세서(122)는, 인코딩된 학습 데이터를 기계 학습 알고리즘을 이용하여 자재 내역(명칭)과 그에 대응하는 속성 그룹 정보 사이의 상관 관계를 출력 데이터로 추출하도록 동작할 수 있다. 일 실시예에 따르면, 프로세서(122)는, 인코딩된 학습 데이터의 아스키 코드 배열을 시계열 신호 데이터와 같이 입력 받아 자재 내역(명칭)과 그에 대응하는 속성 그룹 정보 사이의 상관 관계를 출력 데이터로 추출할 수 있지만, 이에 한정되지 않는다. 여기서, 기계 학습 알고리즘은 조밀 KNN(K-Nearest Neighbor) 알고리즘, 가중 KNN 알고리즘, 앙상블 배깅트리(Bagging Tree) 알고리즘, 앙상블 부분공간 KNN 알고리즘 등을 포함할 수 있지만, 이에 한정되지 않는다. 여기서, 앙상블(Ensemble) 기계 학습 알고리즘은 서포트 벡터 머신(Support Vector Machine), 은닉 마르코프 모델(Hidden Markov model), 회귀 분석(Regression), 신경망(Neural network), 나이브 베이즈 분류(Naive Bayes Classification) 등의 지도 학습(Supervised Learning) 알고리즘들과 k-평균 군집화, 계층적 군집화, 베이즈 망(Bayesian network), 린데-부조-그레이(Linde-Buzo-Gray) 알고리즘, 주성분 분석(principal components analysis), 오토인코더(autoencoder) 등의 비지도 학습(Unsupervised Learning) 알고리즘들 중 다양한 알고리즘을 결합하여 기계 학습을 수행하는 알고리즘을 나타낼 수 있다.In addition, the processor 122 may operate to extract the correlation between the material description (name) and the attribute group information corresponding thereto as output data using the encoded learning data using a machine learning algorithm. According to an embodiment, the processor 122 receives the ASCII code arrangement of the encoded learning data as time series signal data, and extracts the correlation between the material details (name) and the corresponding attribute group information as output data. can, but is not limited thereto. Here, the machine learning algorithm may include, but is not limited to, a dense KNN (K-Nearest Neighbor) algorithm, a weighted KNN algorithm, an ensemble bagging tree algorithm, an ensemble subspace KNN algorithm, and the like. Here, the ensemble machine learning algorithm is a support vector machine, a hidden Markov model, a regression, a neural network, a naive Bayes classification, etc. of supervised learning algorithms and k-means clustering, hierarchical clustering, Bayesian network, Linde-Buzo-Gray algorithm, principal components analysis, auto An algorithm for performing machine learning by combining various algorithms among unsupervised learning algorithms such as an encoder (autoencoder) may be represented.

프로세서(122)는, 다수의 사용자 단말(110-1, … , 110-n) 중 어느 하나의 사용자 단말(예를 들어, 110-1)로부터 소정 명칭을 입력 받고, 입력된 명칭과 대응하는 속성 정보 사이의 상관 관계를 나타내는 출력 데이터를 이용하여 입력된 명칭에 대응하는 속성 정보를 추출하도록 동작할 수 있다.The processor 122 receives a predetermined name from any one user terminal (eg, 110-1) among the plurality of user terminals 110-1, ..., 110-n, and attributes corresponding to the input name. It may operate to extract attribute information corresponding to the input name by using the output data indicating the correlation between the information.

또한, 프로세서(122)는, 명칭 및 대응하는 속성 정보를 포함하는 리스트를 입력 받고, 명칭과 대응하는 속성 정보 사이의 상관 관계를 나타내는 출력 데이터를 이용하여 명칭 및 대응하는 속성 정보가 잘못 매칭된 리스트의 오류를 검출하도록 동작할 수 있다.In addition, the processor 122 receives a list including a name and corresponding attribute information, and uses the output data indicating a correlation between the name and the corresponding attribute information, a list in which the name and the corresponding attribute information are erroneously matched can operate to detect errors in

송신부(123)는, 프로세서(122)에서 형성된 디지털 패킷(125) 형태의 입력된 명칭에 대응하는 속성 정보, 명칭 및 대응하는 속성 정보가 잘못 매칭된 리스트의 오류를 네트워크(N)를 통하여 데이터베이스(130) 및/또는 다수의 사용자 단말(110-1, ... , 110-n)로 전송할 수 있다.Transmission unit 123, the attribute information corresponding to the input name in the form of the digital packet 125 formed in the processor 122, the name and the corresponding attribute information are erroneously matched the list error through the network (N) database ( 130) and/or a plurality of user terminals 110-1, ..., 110-n.

도 3은 본 발명의 실시예에 따른 리스트 분류 방법의 절차를 보이는 흐름도이다. 도 3의 흐름도에서 프로세스 단계들, 방법 단계들, 알고리즘들 등이 순차적인 순서로 설명되었지만, 그러한 프로세스들, 방법들 및 알고리즘들은 임의의 적합한 순서로 작동하도록 구성될 수 있다. 다시 말하면, 본 발명의 다양한 실시예들에서 설명되는 프로세스들, 방법들 및 알고리즘들의 단계들이 본 발명에서 기술된 순서로 수행될 필요는 없다. 또한, 일부 단계들이 비동시적으로 수행되는 것으로서 설명되더라도, 다른 실시예에서는 이러한 일부 단계들이 동시에 수행될 수 있다. 또한, 도면에서의 묘사에 의한 프로세스의 예시는 예시된 프로세스가 그에 대한 다른 변화들 및 수정들을 제외하는 것을 의미하지 않으며, 예시된 프로세스 또는 그의 단계들 중 임의의 것이 본 발명의 다양한 실시예들 중 하나 이상에 필수적임을 의미하지 않으며, 예시된 프로세스가 바람직하다는 것을 의미하지 않는다.3 is a flowchart illustrating a procedure of a list classification method according to an embodiment of the present invention. Although process steps, method steps, algorithms, etc. are described in a sequential order in the flowchart of FIG. 3 , such processes, methods, and algorithms may be configured to operate in any suitable order. In other words, the steps of the processes, methods, and algorithms described in various embodiments of the invention need not be performed in the order described herein. Also, although some steps are described as being performed asynchronously, in other embodiments some of these steps may be performed concurrently. Further, the exemplification of a process by description in the drawings does not imply that the exemplified process excludes other changes and modifications thereto, and that the illustrated process or any of its steps may be used in any of the various embodiments of the present invention. It is not meant to be essential to one or more, nor does it imply that the illustrated process is preferred.

도 3에 도시한 바와 같이, 단계(S310)에서, 명칭 및 대응하는 속성 정보를 포함하는 학습 데이터를 입력 받는다. 예를 들어, 도 1 내지 도 2를 참조하면, 리스트 분류 시스템(100)의 다수의 사용자 단말(110-1, ... , 110-n)은, 호선별 자재 내역 및 그에 대응하는 속성 그룹 정보를 포함하는 자재 내역 리스트를 학습 데이터로 입력 받을 수 있지만, 이에 한정되지 않는다.As shown in FIG. 3 , in step S310 , learning data including a name and corresponding attribute information is received. For example, referring to FIGS. 1 and 2 , a plurality of user terminals 110-1, ..., 110-n of the list classification system 100 includes material details for each line and attribute group information corresponding thereto. It is possible to input a list of material details including

단계(S320)에서, 학습 데이터가 아스키 코드로 인코딩 된다. 예를 들어, 도 1 내지 도 2를 참조하면, 리스트 분류 시스템(100)의 서버(120)는, 학습 데이터가 포함하는 명칭 및 대응하는 속성 정보를 아스키 코드로 인코딩할 수 있다.In step S320, the training data is encoded in ASCII code. For example, referring to FIGS. 1 and 2 , the server 120 of the list classification system 100 may encode a name and corresponding attribute information included in the learning data into an ASCII code.

단계(S330)에서, 인코딩된 학습 데이터를 기계 학습 알고리즘을 이용하여 명칭과 대응하는 속성 정보 사이의 상관 관계가 출력 데이터로 추출된다. 예를 들어, 도 1 내지 도 2를 참조하면, 리스트 분류 시스템(100)의 서버(120)는, 인코딩된 명칭 및 대응하는 속성 정보를 기계 학습 알고리즘을 이용하여 명칭과 대응하는 속성 정보 사이의 상관 관계를 출력 데이터로 추출할 수 있다. 일 실시예에 따르면, 서버(120)는, 인코딩된 학습 데이터의 아스키 코드 배열을 시계열 신호 데이터와 같이 입력 받아 자재 내역(명칭)과 그에 대응하는 속성 그룹 정보 사이의 상관 관계를 출력 데이터로 추출할 수 있지만, 이에 한정되지 않는다.In step S330 , the correlation between the name and the corresponding attribute information is extracted as output data by using the encoded learning data with a machine learning algorithm. For example, referring to FIGS. 1 and 2 , the server 120 of the list classification system 100 uses a machine learning algorithm to convert the encoded name and the corresponding attribute information into a correlation between the name and the corresponding attribute information. Relationships can be extracted as output data. According to one embodiment, the server 120 receives the ASCII code arrangement of the encoded learning data as time series signal data, and extracts the correlation between the material details (name) and the corresponding attribute group information as output data. can, but is not limited thereto.

본 발명의 실시예에 따른 리스트 분류 방법 및 시스템을 통해, 복잡한 자연어 처리 알고리즘 없이도 언어의 연속성에 대한 정보를 학습하는 것이 가능하도록 아스키 코드의 배열을 시계열 신호 데이터처럼 입력 받아 기계학습을 진행할 수 있고, 자재 내역을 아스키 코드값으로 된 시계열 신호로 표현할 경우 직관적으로 분류가 쉬운 시계열 신호가 될 수 있으므로 기계학습에 유리하다. 또한 기본적인 기계학습 알고리즘이라 할 수 있는 KNN 알고리즘, 앙상블 알고리즘에 별 다른 피처 엔지니어링(feature engineering) 없이도 높은 수준의 예측 정확도를 보이는 기계학습이 가능하다.Through the list classification method and system according to an embodiment of the present invention, machine learning can be performed by receiving the ASCII code arrangement as time series signal data so that it is possible to learn information about language continuity without a complex natural language processing algorithm, When material details are expressed as a time series signal with ASCII code values, it can be a time series signal that is intuitively easy to classify, so it is advantageous for machine learning. In addition, machine learning with a high level of prediction accuracy is possible without special feature engineering in the KNN algorithm and ensemble algorithm, which are basic machine learning algorithms.

이상, 본 발명을 도면에 도시된 실시예를 참조하여 설명하였다. 그러나, 본 발명은 이에 한정되지 않고 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 본 발명과 균등한 범위에 속하는 다양한 변형예 또는 다른 실시예가 가능하다. 따라서, 본 발명의 진정한 보호범위는 이어지는 특허청구범위에 의해 정해져야 할 것이다.In the above, the present invention has been described with reference to the embodiments shown in the drawings. However, the present invention is not limited thereto, and various modifications or other embodiments within the scope equivalent to the present invention are possible by those of ordinary skill in the art to which the present invention pertains. Accordingly, the true scope of protection of the present invention should be defined by the following claims.

100: 리스트 분류 시스템 110-1, … ,110-n: 사용자 단말
120: 서버 130: 데이터베이스
121: 수신부 122: 프로세서
123: 송신부 124: 시스템 버스
125: 디지털 패킷 N: 네트워크100: List classification system 110-1, … ,110-n: user terminal
120: server 130: database
121: receiver 122: processor
123: transmitter 124: system bus
125: digital packet N: network

Claims

receiving learning data including a name and corresponding attribute information;
encoding the training data into an ASCII code; and
extracting the correlation between the name and the corresponding attribute information as output data by using the encoded learning data with a machine learning algorithm,
How to classify a list.

The method of claim 1,
receiving a predetermined name; and
Further comprising the step of extracting attribute information corresponding to the input name by using the output data,
How to classify a list.

The method of claim 1,
receiving a list including a predetermined name and corresponding attribute information; and
detecting an error in the list in which the predetermined name and the corresponding attribute information are incorrectly matched by using the output data;
How to classify a list.

The method of claim 1,
The step of extracting the output data is,
Including the step of receiving the ASCII code array of the encoded learning data as input as time series signal data and extracting the correlation between the name and the corresponding attribute information as output data,
How to classify a list.

The method of claim 1,
The name includes the material details for each ship used in the shipbuilding industry,
The corresponding attribute information includes an attribute group corresponding to the material details,
How to classify a list.

The method of claim 1,
The machine learning algorithm is
Including any one of a dense KNN (K-Nearest Neighbor) algorithm, a weighted KNN algorithm, an ensemble bagging tree algorithm, and an ensemble subspace KNN algorithm,
How to classify a list.

a database for storing learning data including names and corresponding attribute information;
Receive the learning data, encode a name included in the learning data into an ASCII code, and output a correlation between the name and the corresponding attribute information using the encoded learning data using a machine learning algorithm Including a server to extract data,
List classification system.

8. The method of claim 7,
The server is
receiving an input of a predetermined name and operating to extract attribute information corresponding to the inputted name using the output data;
List classification system.

8. The method of claim 7,
The server is
receiving a list including a predetermined name and corresponding attribute information, and operating to detect an error in the list in which the predetermined name and the corresponding attribute information are incorrectly matched by using the output data;
List classification system.

8. The method of claim 7,
The server is
receiving the ASCII code array of the encoded learning data as time series signal data and operating to extract the correlation between the name and the corresponding attribute information as output data,
List classification system.

8. The method of claim 7,
The name includes the material details for each ship used in the shipbuilding industry,
The corresponding attribute information includes an attribute group corresponding to the material details,
List classification system.

8. The method of claim 7,
The machine learning algorithm is
Including any one of a dense KNN (K-Nearest Neighbor) algorithm, a weighted KNN algorithm, an ensemble bagging tree algorithm, and an ensemble subspace KNN algorithm,
List classification system.

A computer program for executing the list classification method according to any one of claims 1 to 6 on a computer is recorded, a computer readable recording medium.