KR102307132B1

KR102307132B1 - Machine learning automation platform device for decision support in plant engineering

Info

Publication number: KR102307132B1
Application number: KR1020200022837A
Authority: KR
Inventors: 서정호; 이현경
Original assignee: (주)위세아이텍
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2021-09-30
Also published as: KR20210108074A

Abstract

엔지니어링 전주기 데이터를 활용한 데이터 분석 장치에 관한 것이며, 엔지니어링 전주기 데이터를 활용한 분석 장치는, 엔지니어링 산업의 전주기에서 발생하는 프로젝트 데이터를 수집하는 데이터 수집부, 신규 프로젝트 데이터를 입력받는 데이터 입력부, 상기 데이터 입력부에서 입력받은 상기 신규 프로젝트 데이터를 머신러닝 기반으로 학습된 학습 모듈에 적용하여 데이터 분석을 수행하는 머신러닝부 및 상기 머신러닝부의 분석 결과를 시각화하여 제공하는 시각화부를 포함할 수 있다. It relates to a data analysis device that utilizes the entire engineering cycle data, and the analysis device using the engineering whole cycle data includes a data collection unit that collects project data generated in the entire cycle of the engineering industry, and a data input unit that receives new project data. , a machine learning unit that performs data analysis by applying the new project data received from the data input unit to a learning module learned based on machine learning, and a visualization unit that visualizes and provides the analysis result of the machine learning unit.

Description

Machine learning automation platform device for decision support for each stage of plant engineering work

본원은 엔지니어링 전주기 데이터를 활용한 데이터 관리 및 데이터 분석 장치에 관한 것이다.The present application relates to a data management and data analysis device using data from the entire engineering cycle.

엔지니어링 산업은 사업 전주기에 해당하는 기획 단계부터 설계, 구매, 시공, 운영, O&M의 플랜트 설비 예측정비에 이르기까지 각 단계마다 다양한 데이터가 발생하고 있으며, 발생되는 데이터를 활용하여 사업에서 발생될 수 있는 리스크를 최소화하기 위하여 MH(Man Hour) 및 비용 등의 다양한 지표를 산정하고 있다.In the engineering industry, various data is generated at each stage from the planning stage corresponding to the entire business cycle to design, purchase, construction, operation, and predictive maintenance of plant facilities in O&M. Various indicators such as MH (man hour) and cost are calculated to minimize the risk.

하지만, 현재 엔지니어링 산업에서는 데이터를 수집하고 활용할 수 있는 시스템이 체계적으로 갖추어 지지 않았기 때문에 데이터를 활용하여 여러 지표를 산정을 할 때 인적 오류를 포함한 여러 문제점을 야기할 수 있을 뿐만 아니라 사업 중간에서 상황을 파악하는데 오랜 시간이 걸린다는 문제점이 있다. 또한, 이러한 문제점은 엔지니어링 사업을 수행하는 데 있어 큰 피해액을 가지고 올 수도 있다.However, in the current engineering industry, a system for collecting and utilizing data is not systematically equipped. Therefore, when calculating various indicators using data, it can cause various problems including human error, as well as change the situation in the middle of the business. The problem is that it takes a long time to figure out. In addition, these problems may bring a large amount of damage in carrying out the engineering business.

그러므로 엔지니어링 산업의 각 단계에서 발생하는 데이터의 효율적인 관리를 통하여 위험요인을 분석하고 사용자가 사전에 대처하고 의사결정을 지원할 수 있도록 엔지니어링 데이터 관리와 데이터 분석 장치에 대한 연구가 필요하다.Therefore, it is necessary to study engineering data management and data analysis devices so that risk factors can be analyzed through efficient management of data generated at each stage of the engineering industry, and users can respond in advance and support decision-making.

본원의 배경이 되는 기술은 한국등록특허공보 제10-1229274호에 개시되어 있다.The background technology of the present application is disclosed in Korean Patent Publication No. 10-1229274.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 엔지니어링 산업의 전주기에서 발생하는 프로젝트 데이터를 입력하고 사용자가 원하는 프로젝트 데이터 검색 기능을 지원하며, ITB 분석, 설계원가 예측, 설계오류 분석, 설계변경 분석, 플랜트 설비 예측정비를 엔지니어링 머신러닝 플랫폼을 활용하여 수행할 수 있는 엔지니어링 전주기 데이터를 활용한 데이터 관리 및 데이터 분석 장치를 제공하려는 것을 목적으로 한다.The present application is to solve the problems of the prior art described above, input project data occurring in the entire cycle of the engineering industry, support the user's desired project data search function, ITB analysis, design cost prediction, design error analysis, design The purpose of this is to provide a data management and data analysis device using data from the entire engineering cycle that can perform change analysis and predictive maintenance of plant facilities using an engineering machine learning platform.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 분석된 결과를 대시보드형태로 시각화하여 제공할 수 있는 엔지니어링 전주기 데이터를 활용한 데이터 관리 및 데이터 분석 장치를 제공하려는 것을 목적으로 한다.An object of the present application is to provide a data management and data analysis apparatus using data of the entire engineering cycle that can be provided by visualizing the analyzed results in the form of a dashboard in order to solve the problems of the prior art described above.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 분석된 결과를 특정 표준화된 포맷으로 다운로드 할 수 있도록 제공하고, API를 제공하여 다른 어플리케이션에서 결과를 활용할 수 있도록 확장성을 고려할 수 있는 엔지니어링 전주기 데이터를 활용한 데이터 관리 및 데이터 분석 장치를 제공하려는 것을 목적으로 한다.The present application is intended to solve the problems of the prior art described above, and provides an analysis result to be downloaded in a specific standardized format, and provides an API to provide scalability so that the results can be utilized in other applications. It aims to provide a data management and data analysis device using periodic data.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 사용자가 신규 프로젝트를 입력했을 때, 기 입력된 프로젝트와 비교하여 가장 유사한 프로젝트를 출력해주고, 검색한 프로젝트에 대한 통계적 분석 결과를 시각화하여 제공할 수 있는 엔지니어링 전주기 데이터를 활용한 데이터 관리 및 데이터 분석 장치를 제공하려는 것을 목적으로 한다.The present application is intended to solve the problems of the prior art described above, and when a user inputs a new project, it outputs the most similar project compared to the previously input project, and visualizes and provides statistical analysis results for the searched project. It aims to provide a data management and data analysis device using data from the entire engineering cycle that can be used.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problems to be achieved by the embodiments of the present application are not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 엔지니어링 전주기 데이터를 활용한 데이터 분석 장치는, 엔지니어링 산업의 전주기에서 발생하는 프로젝트 데이터를 수집하는 데이터 수집부, 신규 프로젝트 데이터를 입력받는 데이터 입력부, 상기 데이터 입력부에서 입력받은 상기 신규 프로젝트 데이터를 머신러닝 기반으로 학습된 학습 모듈에 적용하여 데이터 분석을 수행하는 머신러닝부 및 상기 머신러닝부의 분석 결과를 시각화하여 제공하는 시각화부를 포함할 수 있다. As a technical means for achieving the above technical task, the data analysis device using the engineering whole cycle data according to an embodiment of the present application is a data collection unit that collects project data generated in the entire cycle of the engineering industry, a new project A data input unit that receives data, a machine learning unit that performs data analysis by applying the new project data received from the data input unit to a learning module learned based on machine learning, and a visualization that visualizes the analysis results of the machine learning unit may include wealth.

또한, 상기 데이터 입력부는, 상기 엔지니어링 산업의 복수의 분석 단계 항목 중 분석을 수행할 어느 하나의 분석 단계 항목을 선택하는 사용자 입력 정보를 수신할 수 있다. Also, the data input unit may receive user input information for selecting any one analysis step item to be analyzed among a plurality of analysis step items of the engineering industry.

또한, 상기 엔지니어링 산업의 복수의 분석 단계 항목은, Bidding 분석 항목, Engineering 분석 항목, Construction & Commissioning 분석항목, O&M 분석 항목 중 적어도 어느 하나를 포함할 수 있다. In addition, the plurality of analysis step items of the engineering industry may include at least one of a Bidding analysis item, an Engineering analysis item, a Construction & Commissioning analysis item, and an O&M analysis item.

또한, 상기 머신러닝 기반으로 학습된 학습 모듈은, 회귀모델, 분류모델, 군집모델 및 딥러닝모델 중 적어도 어느 하나의 학습 모듈을 포함할 수 있다. In addition, the learning module learned based on machine learning may include at least one learning module of a regression model, a classification model, a cluster model, and a deep learning model.

또한, 상기 머신러닝부는, 상기 엔지니어링 산업의 복수의 분석 단계 항목 중 ITB 분석 항목에 대하여 ITB 문서 데이터에 포함된 독소조항을 탐지하기 위해 머신러닝 기반으로 학습된 학습 모듈에 상기 신규 프로젝트 데이터를 적용하여 상기 독소조항 분석을 수행할 수 있다. In addition, the machine learning unit applies the new project data to the learning module learned based on machine learning to detect the toxin clause included in the ITB document data for the ITB analysis item among the plurality of analysis step items of the engineering industry. The toxin clause analysis can be performed.

또한, 상기 머신러닝부는, 상기 엔지니어링 산업의 복수의 분석 단계 항목 중 설계원가 예측 분석 항목에 대하여 상기 머신러닝 기반으로 학습된 학습 모듈에 상기 신규 프로젝트 데이터를 적용하여 MH(Man Hour)를 예측 분석을 수행할 수 있다. In addition, the machine learning unit, predictive analysis of MH (Man Hour) by applying the new project data to the learning module learned based on the machine learning with respect to the design cost prediction analysis item among the plurality of analysis step items of the engineering industry can be done

또한, 상기 머신러닝부는, 상기 엔지니어링 산업의 복수의 분석 단계 항목 중 설계오류 분석 항목에 대하여 상기 머신러닝 기반으로 학습된 학습 모듈에 상기 신규 프로젝트 데이터를 적용하여 프로젝트 지연일 분석을 수행할 수 있다. In addition, the machine learning unit may perform a project delay date analysis by applying the new project data to the learning module learned based on the machine learning with respect to a design error analysis item among a plurality of analysis step items of the engineering industry.

또한, 상기 머신러닝부는, 상기 엔지니어링 산업의 복수의 분석 단계 항목 중 설계변경 분석 항목에 대하여 상기 머신러닝 기반으로 학습된 학습 모듈에 상기 신규 프로젝트 데이터를 적용하여 변경금액 분석을 수행할 수 있다. In addition, the machine learning unit may perform a change amount analysis by applying the new project data to the learning module learned based on the machine learning with respect to a design change analysis item among a plurality of analysis step items of the engineering industry.

또한, 상기 머신러닝부는, 상기 엔지니어링 산업의 복수의 분석 단계 항목 중 예측정비 분석 항목에 대하여 상기 머신러닝 기반으로 학습된 학습 모듈에 상기 신규 프로젝트 데이터를 적용하여 정비품목탐지 분석을 수행할 수 있다. In addition, the machine learning unit may perform maintenance item detection analysis by applying the new project data to the learning module learned based on the machine learning with respect to the predictive maintenance analysis item among the plurality of analysis step items of the engineering industry.

또한, 상기 머신러닝부는, 상기 엔지니어링 산업의 복수의 분석 단계 항목 각각에 대하여 복수의 학습 모델에 적용하여 수행된 학습 결과를 수집하고, 가장 높은 정확도를 나타내는 학습 모델의 학습 결과를 상기 엔지니어링 산업의 복수의 분석 단계 항목에 대한 분석 결과로 제공할 수 있다. In addition, the machine learning unit, for each of the plurality of analysis step items of the engineering industry, collects the learning results performed by applying to a plurality of learning models, and collects the learning results of the learning model showing the highest accuracy of the plurality of the engineering industry It can be provided as an analysis result for the items of the analysis stage of

또한, 상기 데이터 입력부는, 상기 데이터 수집부에서 수집된 복수의 프로젝트 데이터 중 검색을 수행할 프로젝트에 관한 복수의 검색 항목을 선택하는 사용자 입력 정보를 수신하고, 상기 머신러닝부는, 상기 사용자 입력 정보와 상기 복수의 프로젝트 데이터 간의 유사도 분석을 수행하고, 유사도 분석 결과에 기반하여 적어도 어느 하나의 프로젝트 리스트를 제공할 수 있다. In addition, the data input unit receives user input information for selecting a plurality of search items related to a project to be searched from among the plurality of project data collected by the data collection unit, and the machine learning unit includes the user input information and A similarity analysis between the plurality of project data may be performed, and at least one project list may be provided based on a similarity analysis result.

본원의 일 실시예에 따르면, 엔지니어링 전주기 데이터를 활용한 데이터 분석 방법은, 엔지니어링 산업의 전주기에서 발생하는 프로젝트 데이터를 수집하는 단계, 신규 프로젝트 데이터를 입력받는 단계, 입력받은 상기 신규 프로젝트 데이터를 머신러닝 기반으로 학습된 학습 모듈에 적용하여 데이터 분석을 수행하는 단계 및 분석 결과를 시각화하여 제공하는 단계를 포함할 수 있다. According to an embodiment of the present application, the data analysis method using the engineering whole cycle data includes the steps of collecting project data occurring in the entire cycle of the engineering industry, receiving new project data, and analyzing the new project data. It may include a step of performing data analysis by applying it to a learning module learned based on machine learning, and a step of providing a visualization result of the analysis.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary, and should not be construed as limiting the present application. In addition to the exemplary embodiments described above, additional embodiments may exist in the drawings and detailed description.

전술한 본원의 과제 해결 수단에 의하면, 엔지니어링 산업의 전주기에서 발생하는 데이터를 머신러닝부에서 분석하고 시각화하는 과정을 통해 리스크 최소화가 가능하다.According to the above-described problem solving means of the present application, it is possible to minimize risks through the process of analyzing and visualizing data generated in the entire cycle of the engineering industry in the machine learning unit.

전술한 본원의 과제 해결 수단에 의하면, 엔지니어링 산업의 각 단계에서 발생하는 데이터의 효율적인 관리를 통하여 위험요인을 분석하고 사용자가 사전에 대처하고 의사결정을 지원할 수 있다.According to the above-described problem solving means of the present application, risk factors can be analyzed through efficient management of data generated at each stage of the engineering industry, and the user can respond in advance and support decision-making.

다만, 본원에서 얻을 수 있는 효과는 상기된 바와 같은 효과들로 한정되지 않으며, 또 다른 효과들이 존재할 수 있다.However, the effects obtainable herein are not limited to the above-described effects, and other effects may exist.

도 1은 본원의 일 실시예에 따른 엔지니어링 전주기 데이터를 활용한 데이터 분석 장치의 개략적인 블록도이다.
도 2는 본원의 일 실시예에 따른 데이터 분석 장치의 복수의 분석 단계 항목 선택 화면을 개략적으로 나타낸 도면이다.
도 3은 본원의 일 실시예에 따른 데이터 분석 장치의 ITB 문서에 대한 분석 결과를 개략적으로 나타낸 도면이다.
도 4는 본원의 일 실시예에 따른 데이터 분석 장치의 설계원가 예측 결과를 개략적으로 나타낸 도면이다.
도 5는 본원의 일 실시예에 따른 데이터 분석 장치의 설계오류 분석 결과를 개략적으로 나타낸 도면이다.
도 6은 본원의 일 실시예에 따른 데이터 분석 장치의 설계변경 분석 결과를 개략적으로 나타낸 도면이다.
도 7은 본원의 일 실시예에 따른 데이터 분석 장치의 예측정비 분석 결과를 개략적으로 나타낸 도면이다.
도 8은 본원의 일 실시예에 따른 데이터 분석 장치의 API를 제공하는 화면을 개략적으로 나타낸 도면이다.
도 9는 본원의 일 실시예에 따른 데이터 분석 장치의 데이터 검색 제공 화면 및 검색 결과를 개략적으로 나타낸 도면이다.
도 10은 본원의 일 실시예에 따른 데이터 분석 장치의 유사 프로젝트에 대한 통계적 시각화 결과를 개략적으로 나타낸 도면이다.
도 11은 본원의 일 실시예에 따른 엔지니어링 전주기 데이터를 활용한 데이터 분석 방법에 대한 동작 흐름도이다.1 is a schematic block diagram of a data analysis apparatus utilizing engineering full-cycle data according to an embodiment of the present application.
2 is a diagram schematically showing a plurality of analysis step item selection screens of the data analysis apparatus according to an embodiment of the present application.
3 is a diagram schematically illustrating an analysis result of an ITB document of the data analysis apparatus according to an embodiment of the present application.
4 is a diagram schematically illustrating a design cost prediction result of the data analysis apparatus according to an embodiment of the present application.
5 is a diagram schematically illustrating a design error analysis result of the data analysis apparatus according to an embodiment of the present application.
6 is a diagram schematically illustrating a design change analysis result of the data analysis apparatus according to an embodiment of the present application.
7 is a diagram schematically illustrating a result of predictive maintenance analysis of the data analysis apparatus according to an embodiment of the present application.
8 is a diagram schematically illustrating a screen providing an API of a data analysis apparatus according to an embodiment of the present application.
9 is a diagram schematically illustrating a data search providing screen and a search result of the data analysis apparatus according to an exemplary embodiment of the present disclosure.
10 is a diagram schematically illustrating a statistical visualization result for a similar project of the data analysis apparatus according to an embodiment of the present application.
11 is an operation flowchart of a data analysis method using engineering full cycle data according to an embodiment of the present application.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art to which the present application pertains can easily implement them. However, the present application may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present application in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결" 또는 "간접적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is "connected" with another part, it is not only "directly connected" but also "electrically connected" or "indirectly connected" with another element interposed therebetween. "Including cases where

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when it is said that a member is positioned "on", "on", "on", "under", "under", or "under" another member, this means that a member is positioned on the other member. It includes not only the case where they are in contact, but also the case where another member exists between two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout this specification, when a part "includes" a component, it means that other components may be further included, rather than excluding other components, unless otherwise stated.

본원의 플랜트 엔지니어링 업무 단계별 의사결정 지원을 위한 머신러닝 자동화 플랫폼 장치는 본원의 명칭일 수 있고, 청구항 말미(데이터 분석 장치) 및 엔지니어링 전주기 데이터를 활용한 데이터 분석 장치처럼 달리 지칭될 수 있다. The machine learning automation platform device for supporting decision-making for each plant engineering task of the present application may be the name of the present application, and may be referred to differently like a data analysis device utilizing data at the end of a claim (data analysis device) and the entire engineering cycle data.

본원은 엔지니어링 산업의 전주기에서 발생하는 데이터를 활용하여 최적의 의사결정을 지원해주는 자동화된 머신러닝 기반 엔지니어링 플랫폼에 관한 것으로서, 엔지니어링 산업의 각 단계의 발생하는 데이터를 수집하고 해당데이터를 원클릭으로 분석하여 머신러닝 전주기에서 발생할 수 있는 리스크를 최소화할 수 있도록 하며, 입력된 데이터와 유사한 프로젝트를 검색하고 활용하는 방법 및 시스템에 관한 것이다.This application relates to an automated machine learning-based engineering platform that supports optimal decision-making by utilizing data generated in the entire cycle of the engineering industry. It is about a method and system to minimize risks that can occur in the entire machine learning cycle by analyzing it, and to search for and utilize projects similar to the input data.

엔지니어링은 과학적, 기술적 전문지식 통합적 활용하여 공학시스템의 기획, 설계, 개발, 구축, 운영에 필요한 공학기술적 서비스 제공한다. 엔지니어링 산업특성은 소수 발주자에 의한 일괄수주 방식 발주, 프로젝트 종합관리, 설계, 기자재 구매조달, 시공 등 통합 기술개발 필수, 경험지식활용과 활용기술이 접목되는 지식기반형 서비스 산업이다. 또한, 플랜트 엔지니어링은 일련의 기계 장치들이 연계되어 정상 운전 조건 하에서 원료부터 중간재 혹인 최종 제품의 연속적 제조를 시현하는 생산설비 및 관련 시스템이다. 플랜트의 기획, 설계, 시공, 운영 및 폐기에 이르는 전주기에 걸쳐 생산성, 성능, 품질에 직접적인 영향을 미치는 분야이다.Engineering provides engineering and technical services necessary for planning, design, development, construction, and operation of engineering systems by integrating scientific and technical expertise. The engineering industry is characterized as a knowledge-based service industry in which integrated technology development is essential, such as batch order ordering by a small number of clients, project management, design, equipment procurement, and construction. In addition, plant engineering is a production facility and related system in which a series of mechanical devices are linked to realize continuous manufacturing from raw materials to intermediate goods or final products under normal operating conditions. It is a field that directly affects productivity, performance, and quality throughout the entire life cycle of plant planning, design, construction, operation and disposal.

본원의 일 실시예에 따르면, 데이터 분석 장치(10)는 엔지니어링 산업의 전주기에서 발생하는 프로젝트 데이터를 수집하고, 사용자가 원하는 프로젝트 데이터 검색 기능을 지원할 수 있다. 또한, 데이터 분석 장치(10)는 ITB 분석, 설계원가 예측, 설계오류 분석, 설계변경 분석, 예측정비 등을 머신러닝 플랫폼을 활용하여 분석하여 제공할 수 있다. 데이터 분석 장치(10)는 분석된 결과를 사용자 단말(미도시)을 통해 제공할 수 있다. 또한, 데이터 분석 장치(10)는 사용자(관리자)가 분석 결과를 보고 빠른 판단으로 효과적인 의사결정을 할 수 있도록 대시보드 형태로 시각화하여 제공할 수 있다. According to an embodiment of the present application, the data analysis device 10 may collect project data generated in the entire cycle of the engineering industry, and support a project data search function desired by a user. In addition, the data analysis apparatus 10 may analyze and provide ITB analysis, design cost prediction, design error analysis, design change analysis, predictive maintenance, and the like using a machine learning platform. The data analysis apparatus 10 may provide the analyzed result through a user terminal (not shown). In addition, the data analysis apparatus 10 may provide visualization in the form of a dashboard so that a user (administrator) can view the analysis result and make an effective decision with quick judgment.

또한, 데이터 분석 장치(10)는 분석된 결과를 특정 표준화된 포맷으로 다운받을 수 있도록 제공할 수 있다. 또한, 데이터 분석 장치(10)는 API를 제공하여 다른 어플리케이션에서 결과를 활용할 수 있도록 하는 등의 확장성을 고려할 수 있다. In addition, the data analysis apparatus 10 may provide the analyzed result to be downloaded in a specific standardized format. In addition, the data analysis apparatus 10 may consider extensibility such as providing an API to utilize the results in other applications.

또한, 데이터 분석 장치(10)는 계속 변화하고 추가되는 엔지니어링 데이터의 특성을 고려하여, 단계별로 구분하여 신규 데이터를 추가할 수 있도록 하였으며, 입력된 데이터를 자동화된 머신러닝 기반의 학습 모델을 통해 손쉽게 분석하고 결과를 제공할 수 있다. In addition, in consideration of the characteristics of the continuously changing and added engineering data, the data analysis device 10 allows to add new data by dividing it into stages, and easily converts the input data through an automated machine learning-based learning model. Analyze and provide results.

본원의 일 실시예에 따르면, 데이터 분석 장치(10)는 사용자 단말(미도시)로 데이터 입력 메뉴, 데이터 관리 메뉴 및 데이터 분석 메뉴 등을 제공할 수 있다. 예를 들어, 데이터 분석 장치(10)가 제공하는 어플리케이션 프로그램을 사용자 단말(미도시)이 다운로드하여 설치하고, 설치된 어플리케이션을 통해 데이터 입력 메뉴, 데이터 관리 메뉴 및 데이터 분석 메뉴가 제공될 수 있다.According to an embodiment of the present application, the data analysis apparatus 10 may provide a data input menu, a data management menu, and a data analysis menu to a user terminal (not shown). For example, a user terminal (not shown) may download and install an application program provided by the data analysis apparatus 10 , and provide a data input menu, a data management menu, and a data analysis menu through the installed application.

데이터 분석 장치(10)는 사용자 단말(미도시)과 데이터, 콘텐츠, 각종 통신 신호를 네트워크를 통해 송수신하고, 데이터 저장 및 처리의 기능을 가지는 모든 종류의 서버, 단말, 또는 디바이스를 포함할 수 있다.The data analysis apparatus 10 may include all kinds of servers, terminals, or devices that transmit and receive data, content, and various communication signals to and from a user terminal (not shown) through a network, and have functions of data storage and processing. .

사용자 단말(미도시)은 네트워크를 통해 말 통합 장치(110)와 연동되는 디바이스로서, 예를 들면, 스마트폰(Smartphone), 스마트패드(Smart Pad), 태블릿 PC, 웨어러블 디바이스 등과 PCS(Personal Communication System), GSM(Global System for Mobile communication), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말기 같은 모든 종류의 무선 통신 장치 및 데스크탑 컴퓨터, 스마트 TV와 같은 고정용 단말기일 수도 있다. A user terminal (not shown) is a device that interworks with the horse integrated apparatus 110 through a network, for example, a smartphone, a smart pad, a tablet PC, a wearable device, and the like, a Personal Communication System (PCS). ), GSM (Global System for Mobile communication), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000 , W-CDMA (W-Code Division Multiple Access), and Wibro (Wireless Broadband Internet) terminals of all types of wireless communication devices and desktop computers and fixed terminals such as smart TVs.

데이터 분석 장치(10) 및 사용자 단말(미도시)간의 정보 공유를 위한 네트워크의 일 예로는 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, 5G 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 유무선 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, Wifi 네트워크, NFC(Near Field Communication) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함될 수 있으며, 이에 한정된 것은 아니다.An example of a network for sharing information between the data analysis device 10 and the user terminal (not shown) is a 3rd Generation Partnership Project (3GPP) network, a Long Term Evolution (LTE) network, a 5G network, and a World Interoperability for Microwave Access (WIMAX) network. ) network, wired and wireless Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), Bluetooth (Bluetooth) network, Wifi network, NFC ( A Near Field Communication) network, a satellite broadcasting network, an analog broadcasting network, a Digital Multimedia Broadcasting (DMB) network, etc. may be included, but is not limited thereto.

도 1은 본원의 일 실시예에 따른 엔지니어링 전주기 데이터를 활용한 데이터 분석 장치의 개략적인 블록도이다.1 is a schematic block diagram of a data analysis apparatus utilizing engineering full-cycle data according to an embodiment of the present application.

도 1을 참조하면, 데이터 분석 장치(10)는 데이터 수집부(11), 데이터 입력부(12), 머신러닝부(13) 및 시각화부(14)를 포함할 수 있다. 다만, 데이터 분석 장치(10)의 구성이 이에 한정되는 것은 아니다. 예를 들어, 데이터 분석 장치(10)는 데이터 수집부(11)에서 수집된 복수의 프로젝트 데이터를 저장하기 위한 데이터베이스(미도시)를 포함할 수 있다. 또한, 데이터 분석 장치(10)는 사용자 단말(미도시)로 데이터 분석 결과를 제공하기 위한 데이터 제공부(미도시)를 포함할 수 있다. 데이터베이스(미도시)는 독조소항 분석을 위한 독소조항 사전, 설계오류 Delay 심각도 구분 항목, 설계변경 사업비 심각도 구분 항목, 설계변경 핵심단어 사전 등을 포함할 수 있다. Referring to FIG. 1 , the data analysis apparatus 10 may include a data collection unit 11 , a data input unit 12 , a machine learning unit 13 , and a visualization unit 14 . However, the configuration of the data analysis apparatus 10 is not limited thereto. For example, the data analysis apparatus 10 may include a database (not shown) for storing a plurality of project data collected by the data collection unit 11 . Also, the data analysis apparatus 10 may include a data providing unit (not shown) for providing a data analysis result to a user terminal (not shown). The database (not shown) may include a toxin clause dictionary, a design error delay severity classification item, a design change project cost severity classification item, a design change key word dictionary, etc.

본원의 일 실시예에 따르면, 데이터 수집부(11)는 엔지니어링 산업의 전주기에서 발생하는 프로젝트 데이터를 수집할 수 있다. 예를 들어, 데이터 수집부(11)는 계약서, 시방서, ITB, 해양 및 육상 플랜트 데이터, ERP(전사적 자원관리), PMIS(사업관리 정보시스템), 상용데이터, 공공데이터, 빅데이터 통합 정보, 공공데이터, Open API 등을 수집할 수 있다. 데이터 수집부(11)는 엔지니어링 산업의 전주기에서 발생하는 다양한 데이터를 수집할 수 있다. 또한, 데이터 수집부(11)는 엔지니어링 산업의 프로젝트 데이터뿐만 아니라 다양한 분야의 산업의 전주기에서 발생하는 프로젝트 데이터를 수집할 수 있다. 프로젝트 데이터는, 엔지니어링 산업에서 발생하는 시계열 데이터일 수 있다. 또한, 프로젝트 데이터는 데이터베이스(미도시)에 저장된 데이터일 수 있다. 또한, 데이터 수집부(11)는 네트워크를 통해 외부 서버에서 엔지니어링 산업의 전주기에서 발생하는 프로젝트 데이터를 수집할 수 있다. 일례로, 외부 서버는, 엔지니어링 산업을 수행하는 기관의 서버일 수 있다. According to an embodiment of the present application, the data collection unit 11 may collect project data generated in the entire cycle of the engineering industry. For example, the data collection unit 11 includes contracts, specifications, ITB, offshore and land plant data, ERP (Enterprise Resources Management), PMIS (Project Management Information System), commercial data, public data, big data integrated information, public It can collect data, Open API, etc. The data collection unit 11 may collect various data generated in the entire cycle of the engineering industry. In addition, the data collection unit 11 may collect project data generated in the entire cycle of industries in various fields as well as project data of the engineering industry. The project data may be time series data generated in the engineering industry. In addition, the project data may be data stored in a database (not shown). In addition, the data collection unit 11 may collect project data generated in the entire cycle of the engineering industry from an external server through a network. As an example, the external server may be a server of an institution performing an engineering industry.

본원의 다른 일 실시예에 따르면, 데이터 수집부(11)는 데이터 분석 장치(10)에서 분석된 데이터를 관리할 수 있다. 달리 말해, 데이터 수집부(11)는 데이터 입력부(12)에서 입력받은 신규 프로젝트 데이터를 데이터베이스(미도시)에 저장할 수 있다. 또한, 데이터 수집부(11)는 머신러닝부(13)에서 분석된 다양한 데이터의 분석 결과를 데이터베이스(미도시)에 저장할 수 있다. 데이터 수집부(11)는 엔지니어링 산업에서 발생하는 다양한 데이터를 수집할 뿐만 아니라, 엔지니어링 산업에서 발생하는 데이터를 분석한 분석 데이터를 데이터베이스(미도시)에 저장하여 관리할 수 있다. According to another exemplary embodiment of the present disclosure, the data collection unit 11 may manage data analyzed by the data analysis apparatus 10 . In other words, the data collection unit 11 may store the new project data received from the data input unit 12 in a database (not shown). Also, the data collection unit 11 may store analysis results of various data analyzed by the machine learning unit 13 in a database (not shown). The data collection unit 11 may not only collect various data generated in the engineering industry, but also store and manage analysis data obtained by analyzing data generated in the engineering industry in a database (not shown).

본원의 일 실시예에 따르면, 데이터 입력부(12)는 신규 프로젝트 데이터를 입력 받을 수 있다. 신규 프로젝트 데이터는, 계약서, 시방서, ITB 등 엔지니어링 산업에서 발생하는 문서를 포함할 수 있다. 예를 들어, 데이터 입력부(12)는 사용자 단말(미도시)로부터 신규 프로젝트 데이터를 입력 받을 수 있다. 또한, 데이터 입력부(12)는 신규 프로젝트 데이터 및 엔지니어링 산업의 복수의 단계에서 발생하는 데이터를 각각 입력받을 수 있다. 또한, 데이터 입력부(12)는 기존의 엔지니어링 산업의 프로젝트 데이터가 아닌 신규 작성된 프로젝트 데이터를 입력 받을 수 있다. 달리 말해, 데이터 입력부(12)는 데이터베이스(미도시)에 저장된 데이터 셋으로부터 복수의 프로젝트와 관련된 데이터를 입력 받을 수 있다. 또한, 데이터 입력부(12)는 데이터 수집부(11)에서 수집된 프로젝트 데이터를 입력 받을 수 있다. According to an embodiment of the present application, the data input unit 12 may receive new project data. New project data may include documents originating in the engineering industry, such as contracts, specifications, and ITBs. For example, the data input unit 12 may receive new project data from a user terminal (not shown). In addition, the data input unit 12 may receive new project data and data generated in a plurality of stages of the engineering industry, respectively. In addition, the data input unit 12 may receive newly created project data rather than the existing project data of the engineering industry. In other words, the data input unit 12 may receive data related to a plurality of projects from a data set stored in a database (not shown). Also, the data input unit 12 may receive project data collected by the data collection unit 11 .

도 2는 본원의 일 실시예에 따른 데이터 분석 장치의 복수의 분석 단계 항목 선택 화면을 개략적으로 나타낸 도면이다.2 is a diagram schematically showing a plurality of analysis step item selection screens of the data analysis apparatus according to an embodiment of the present application.

본원의 일 실시예에 따르면, 데이터 입력부(12)는 엔지니어링 산업의 복수의 분석 단계 항목 중 분석을 수행할 어느 하나의 분석 단계 항목을 선택하는 사용자 입력 정보를 수신할 수 있다. 또한, 엔지니어링 산업의 복수의 분석 단계 항목은, Bidding 분석 항목, Engineering 분석 항목, Construction & Commissioning 분석항목, O&M 분석 항목 중 적어도 어느 하나를 포함할 수 있다. 도 2를 참조하면, Bidding 분석 항목은 ITB 분석 항목을 포함할 수 있다. 또한, Engineering 분석 항목, 설계원가 분석 항목, 설계오류 분석 항목, 설계변경 분석 항목을 포함할 수 있다. 또한, Construction & Commissioning 분석항목은 추가 항목을 포함할 수 있다. 또한, O&M 분석 항목은 예측정비 분석 항목을 포함할 수 있다. 엔지니어링 산업의 복수의 분석 단계 항목 각각은 추가 항목을 포함할 수 있다. 추가 항목은 새로운 주제의 데이터를 입력하기 위한 항목일 수 있다. 데이터 입력부(12)는 복수의 데이터를 포함하는 신규 프로젝트 전부를 제공받을 수 있으며, 각각의 프로젝트 데이터 또한, 따로 입력받을 수 있다. 달리 말해, 데이터 입력부(12)는 사용자가 Bidding 분석만을 수행하고자 하는 경우, Bidding 분석 항목에 포함된 추가 항목의 사용자 선택 정보에 기반하여 Bidding 분석을 수행하기 위한 데이터를 입력 받을 수 있다. According to an embodiment of the present application, the data input unit 12 may receive user input information for selecting any one analysis step item to be analyzed among a plurality of analysis step items in the engineering industry. In addition, the plurality of analysis step items of the engineering industry may include at least one of a Bidding analysis item, an Engineering analysis item, a Construction & Commissioning analysis item, and an O&M analysis item. Referring to FIG. 2 , the bidding analysis item may include an ITB analysis item. In addition, it may include an engineering analysis item, a design cost analysis item, a design error analysis item, and a design change analysis item. In addition, the Construction & Commissioning analysis item may include additional items. In addition, the O&M analysis item may include a predictive maintenance analysis item. Each of the plurality of analysis step items in the engineering industry may include an additional item. The additional item may be an item for entering data of a new subject. The data input unit 12 may receive all of the new projects including a plurality of data, and each project data may also be input separately. In other words, when the user intends to perform only the bidding analysis, the data input unit 12 may receive data for performing the bidding analysis based on user selection information of an additional item included in the bidding analysis item.

일예로, 데이터 입력부(12)는 사용자 단말(미도시)로부터 복수의 분석 항목 중 적어도 어느 하나의 항목을 선택한 사용자 입력 정보를 수신할 수 있다. 도 2를 참조하면, 데이터 입력부(12)는 복수의 분석 항목 중 Engineering 분석 항목에 포함된 설계원가 분석 항목을 선택한 사용자 입력 정보를 사용자 단말(미도시)로부터 수신할 수 있다. 데이터 입력부(12)는 사용자 단말(미도시)로부터 제공받은 사용자 입력 정보를 머신러닝부(13)로 제공할 수 있다. For example, the data input unit 12 may receive user input information for selecting at least one of a plurality of analysis items from a user terminal (not shown). Referring to FIG. 2 , the data input unit 12 may receive user input information for selecting a design cost analysis item included in an engineering analysis item among a plurality of analysis items from a user terminal (not shown). The data input unit 12 may provide user input information provided from a user terminal (not shown) to the machine learning unit 13 .

달리 말해, 시각화부(14)는 사용자 단말(미도시)로 엔지니어링 산업의 복수의 분석 단계 항목을 제공하고, 데이터 입력부(12)는 사용자 단말(미도시)로부터 엔지니어링 산업의 복수의 분석 단계 항목 중 분석을 수행할 적어도 어느 하나의 분석 항목을 제공받고, 머신러닝부(13)는 입력받은 분석 항목에 대응하여 신규 프로젝트 데이터의 학습을 수행할 수 있다. In other words, the visualization unit 14 provides a plurality of analysis step items of the engineering industry to a user terminal (not shown), and the data input unit 12 is a user terminal (not shown) of a plurality of analysis step items of the engineering industry. At least one analysis item to be analyzed is provided, and the machine learning unit 13 may learn new project data in response to the received analysis item.

본원의 일 실시예에 따르면, 머신러닝부(13)는 데이터 입력부(12)에서 입력받은 신규 프로젝트 데이터를 머신러닝 기반으로 학습된 학습 모듈에 적용하여 데이터 분석을 수행할 수 있다. 머신러닝부(13)는 데이터 입력부(12)에서 입력받은 신규 프로젝트 데이터와 사용자의 분석 항목 선택 정보에 기반하여 머신러닝 기반으로 학습된 학습 모듈에 신규 데이터를 적용하여 데이터 분석을 수행할 수 있다. According to an embodiment of the present application, the machine learning unit 13 may perform data analysis by applying the new project data received from the data input unit 12 to a learning module learned based on machine learning. The machine learning unit 13 may perform data analysis by applying the new data to the learning module learned based on machine learning based on the new project data input from the data input unit 12 and the user's analysis item selection information.

본원의 다른 일 실시예에 따르면, 머신러닝부(13)는 데이터 수집부(11)에서 수집한 프로젝트 데이터를 머신러닝 기반으로 학습된 학습 모듈에 적용하여 데이터 분석을 수행할 수 있다. 달리 말해, 머신러닝부(13)는 데이터 입력부(12)에서 입력받은 데이터가 아닌 데이터베이스(미도시)에 저장된 기존의 프로젝트 데이터를 머신러닝 기반으로 학습된 학습 모듈에 적용하여 데이터 분석을 수행할 수 있다. 머신러닝부(13)는 머신러닝 기반으로 학습된 학습 모듈을 통해 ITB 문서 데이터는 독소조항을 탐지하는 학습을 수행할 수 있다. 또한, 머신러닝부(13)는 머신러닝 기반으로 학습된 학습 모듈을 통해 설계원가 분석에서 MH(Man Hour)를 예측하는 학습을 수행할 수 있다. 또한, 머신러닝부(13)는 머신러닝 기반으로 학습된 학습 모듈을 통해 설계오류 분석에서 지연일 분석을 예측하는 학습을 수행할 수 있다. 또한, 머신러닝부(13)는 머신러닝 기반으로 학습된 학습 모듈을 통해 설계변경 분석에서 변경금액을 분석하는 학습을 수행할 수 있다. 또한, 머신러닝부(13)는 머신러닝 기반으로 학습된 학습 모듈을 통해 예측정비 분석에서 정비품목탐지를 분석하는 학습을 수행할 수 있다. According to another embodiment of the present application, the machine learning unit 13 may perform data analysis by applying the project data collected by the data collection unit 11 to a learning module learned based on machine learning. In other words, the machine learning unit 13 can perform data analysis by applying the existing project data stored in the database (not shown) to the learning module learned based on machine learning, not the data input from the data input unit 12 . have. The machine learning unit 13 may perform learning to detect the poison clause of the ITB document data through the learning module learned based on machine learning. In addition, the machine learning unit 13 may perform learning to predict MH (Man Hour) in design cost analysis through a learning module learned based on machine learning. In addition, the machine learning unit 13 may perform learning to predict the delay analysis in the design error analysis through the learning module learned based on machine learning. In addition, the machine learning unit 13 may perform learning to analyze the change amount in the design change analysis through the learning module learned based on machine learning. In addition, the machine learning unit 13 may perform learning to analyze the detection of maintenance items in the predictive maintenance analysis through the learning module learned based on machine learning.

또한, 머신러닝 기반으로 학습된 학습 모듈은, 회귀모델, 분류모델, 군집모델 및 딥러닝모델 중 적어도 어느 하나의 학습 모듈을 포함할 수 있다. 또한, 머신러닝부(13)는 피처 엔지니어링 기능을 포함할 수 있다. 피처 엔지니어링 기능은, 데이터 탐색, 데이터 변환, 특질 선정 기능을 포함할 수 있다. In addition, the learning module learned based on machine learning may include at least one learning module of a regression model, a classification model, a cluster model, and a deep learning model. In addition, the machine learning unit 13 may include a feature engineering function. Feature engineering functions may include data exploration, data transformation, and feature selection functions.

일예로, 머신러닝부(13)는 알고리즘별(학습모델별)로 하이퍼 파라미터를 적용하여 모델 성능을 높일 수 있다. 하이퍼 파라미터는 각 층의 뉴런 수, 배치 크기, 매개변수 갱신 시의 학습률, 가중치 감소 등 모델의 성능을 좌우하는 값이다. 머신러닝부(13)는 하이퍼 파라미터의 최적화를 위해, 하이퍼 파라미터 값의 범위를 설정할 수 있다. 또한, 머신러닝부(13)는 설정된 범위에서 하이퍼 파라미터의 값을 무작위로 추출할 수 있다. (단계1) 또한, 머신러닝부(13)는 앞서 샘플링 한 하이퍼 파라미터 값을 사용하여 모델의 학습을 수행한 후 검증 데이터로 정확도를 평가할 수 있다. (단계2) 검증 데이터는 하이퍼 파라미터의 적절성을 평가하기 위한 데이터일 수 있다. 머신러닝부(13) 앞서 설명된 단계 1 및 단계 2를 미리 설정된 횟수만큼 반복하여 정확도를 판단하고 하이퍼 파라미터의 범위를 재설정할 수 있다. 머신러닝부(13)는 하이퍼 파라미터의 범위를 좁히는 방향으로 재설정할 수 있다. 머신러닝부(13)는 좁혀진 범위에서 하이퍼 파라미터의 값을 선택할 수 있다. 앞서 설명된 하이퍼 파라미터를 적용하여 모델 성능을 높이는 실시예는 일 실시예 일뿐, 이에 한정되는 것은 아니며, 보다 다양한 실시예가 적용될 수 있다. As an example, the machine learning unit 13 may increase model performance by applying hyperparameters for each algorithm (for each learning model). Hyperparameters are values that influence the performance of the model, such as the number of neurons in each layer, batch size, learning rate when updating parameters, and weight reduction. The machine learning unit 13 may set a range of hyperparameter values to optimize hyperparameters. In addition, the machine learning unit 13 may randomly extract a hyperparameter value from a set range. (Step 1) In addition, the machine learning unit 13 may evaluate the accuracy with the verification data after learning the model using the previously sampled hyperparameter values. (Step 2) The verification data may be data for evaluating the adequacy of hyperparameters. The machine learning unit 13 may repeat steps 1 and 2 described above a preset number of times to determine the accuracy and reset the range of the hyper parameter. The machine learning unit 13 may reset the hyper-parameter in a narrowing direction. The machine learning unit 13 may select the hyperparameter value from the narrowed range. The embodiment in which the model performance is increased by applying the hyperparameters described above is only one embodiment, and is not limited thereto, and more various embodiments may be applied.

도 3은 본원의 일 실시예에 따른 데이터 분석 장치의 ITB 문서에 대한 분석 결과를 개략적으로 나타낸 도면이다.3 is a diagram schematically illustrating an analysis result of an ITB document of the data analysis apparatus according to an embodiment of the present application.

본원의 일 실시예에 따르면, 머신러닝부(13)는 엔지니어링 산업의 복수의 분석 단계 항목 중 ITB 분석 항목에 대하여 ITB 문서 데이터에 포함된 독소조항을 탐지하기 위해 머신러닝 기반으로 학습된 학습 모듈에 신규 프로젝트 데이터를 적용하여 독소조항 분석을 수행할 수 있다. 일예로, ITB 분석은 ITB(Invitation To Bid: 입찰안내서) 내에서 독소조항을 탐지하는 것으로, 머신러닝부(13)는 신규 프로젝트 데이터에 포함된 ITB 문서를 학습 모듈에 적용하여 독소조항을 분석할 수 있다. 여기서, 학습 모듈은 회귀모델, 분류모델, 군집모델 및 딥러닝모델 중 적어도 어느 하나를 포함할 수 있다. 독소조항은 일반적으로 법률이나 공식 문서 등에서 본래 의도하는 바를 교묘하게 제한하는 내용을 말한다. 즉, 법률의 경우 그 법률이 의도하는 목적이 있지만 이론적 혹은 현실적으로 그 의도를 막는 문구가 삽입되어 있는 것을 말한다.According to an embodiment of the present application, the machine learning unit 13 is a learning module learned based on machine learning to detect a toxin clause included in the ITB document data for the ITB analysis item among a plurality of analysis step items in the engineering industry. Toxin clause analysis can be performed by applying new project data. As an example, the ITB analysis is to detect the toxin clause within the ITB (Invitation To Bid: bidding guide), and the machine learning unit 13 applies the ITB document included in the new project data to the learning module to analyze the toxin clause. can Here, the learning module may include at least one of a regression model, a classification model, a cluster model, and a deep learning model. In general, the toxin clause refers to a content that cleverly limits the original intention in a law or official document. That is, in the case of a law, it means that the law has an intended purpose, but a phrase that blocks the intention in theory or reality is inserted.

예시적으로, 머신러닝부(13)는 미리 구축된 독소조항 사전과의 비교를 통해 ITB 문서 데이터에 포함된 독소조항 분석을 수행할 수 있다. 머신러닝부(13)는 입력받은 ITB 문서 데이터에서 미리 저장된 독소조항 사전에 포함된 독소조항 단어 중 어느 하나와 매칭을 통해 독소조항 분석을 수행할 수 있다. 독소조항 사전에는 사용자가 미리 설정한 단어들이 포함될 수 있다. 예를 들어, 머신러닝부(13)는 ITB 문서를 데이터 입력부(12)로부터 제공받을 수 있다. 머신러닝부(13)는 독소조항 사전을 이용하여, 표준화를 수행하기 위해 입력받은 ITB문서에 포함된 용어들의 형태소를 분리할 수 있다. 또한, 머신러닝부(13)는 분리된 용어에 포함된 복수의 단어를 독소조항 사전에 포함된 독소조항 단어들과의 비교를 위해 인공지능 기반의 알고리즘을 이용하여 유사도 연산을 수행할 수 있다. 여기서, 인공지능 기반의 알고리즘은 Fuzzy Data Matching 알고리즘일 수 있으나, 이에 한정되는 것은 아니다. Fuzzy Data Matching 알고리즘은 편집거리(레펜슈타인, Levenshtein Distance)를 기반으로 계산된 결과값을 사용하여 데이터 간에 매칭을 수행하는 알고리즘이다. 다만, 앞서 설명된 인공지능 기반의 알고리즘은 일 실시예일뿐, 이에만 한정되는 것은 아니며, 기 개발되었거나 향후 개발되는 다양한 신경망 체계를 적용할 수 있다. Illustratively, the machine learning unit 13 may perform a toxin clause analysis included in the ITB document data through comparison with a pre-built toxin clause dictionary. The machine learning unit 13 may perform the toxin clause analysis by matching any one of the toxin clause words included in the toxin clause dictionary stored in advance in the received ITB document data. The toxin clause dictionary may include words preset by the user. For example, the machine learning unit 13 may receive the ITB document from the data input unit 12 . The machine learning unit 13 may separate morphemes of terms included in the input ITB document for standardization by using the toxin clause dictionary. In addition, the machine learning unit 13 may perform a similarity calculation using an artificial intelligence-based algorithm to compare a plurality of words included in the separated term with the toxin clause words included in the toxin clause dictionary. Here, the AI-based algorithm may be a fuzzy data matching algorithm, but is not limited thereto. The Fuzzy Data Matching algorithm is an algorithm that performs matching between data using the result calculated based on the editing distance (Levenshtein Distance). However, the AI-based algorithm described above is only an example, and is not limited thereto, and various neural network systems that have been developed or developed in the future may be applied.

본원의 일 실시예에 따르면, 도 3의 (a)를 참조하면, 머신러닝부(13)는 카테고리별로 구분된 복수의 독소조항 분석 선택 항목을 시각화부(14)를 통해서 사용자 단말(미도시)로 제공할 수 있다. 사용자는 사용자 단말(미도시)에 제공된 ITB 분석 항목을 통해, ITB 분석을 수행할 ITB 문서를 선택할 수 있다. 예를 들어, 도3의(a)에서는 ITB_AA문서가 선택되었다. 사용자는 복수의 카테고리 중 분석을 수행할 카테고리(Category)를 선택할 수 있다. 각각의 카테고리에는 독소조항 분석을 수행하기 위해 서로 다른 항목의 독소조항 분석 선택 항목들이 포함될 수 있다. 도 3 의(a)를 참조하면, 머신러닝부(13)는 사용자 단말(미도시)로부터 제공받은 제1독소조항 분석 선택 항목(Fit for purpose), 제2독소조항 분석 선택 항목(Open-ended cluse), 제3독소조항 분석 선택 항목(LD execution procedure), 제4독조소항 분석 선택 항목(Payment options: pay-when-paid vs Pay-if-paid), 제5독소조항 분석 선택 항목(Pre-payment: LOC with BL; SB LoC; LoC at-sight; Usance; Document against Payment DA/DP), 제6독소조항 분석 선택 항목(Liability for EPC Corporate; Joint liability, Several, Joint & Several)에 선택 입력 정보에 기반하여 ITB 분석을 수행할 수 있다. 도 3 의(b)를 참조하면, 머신러닝부(13)는 ITB 문서에 포함된 데이터의 문장에서, 제1독소조항 분석 선택 항목(Fit for purpose)에 대응하여, 주어, 동사, 목적어 각각에 대응하는 독소조항을 추출할 수 있다. 또한, 머신러닝부(13)는 ITB 문서에 포함된 데이터의 문장에서, 제5독소조항 분석 선택 항목(Pre-payment)에 대응하는 주어, 동사, 목적어 각각에 대응하는 독소조항을 추출할 수 있다. 또한, 시각화부(14)는 머신러닝부(13)에서 독소조항 분석 선택 항목에 대응하여 추출한 분석 내용을 사용자 단말(미도시)로 제공할 수 있다. According to an embodiment of the present application, referring to FIG. 3A , the machine learning unit 13 displays a plurality of toxin clause analysis selection items divided by category through the visualization unit 14 in a user terminal (not shown). can be provided as The user may select an ITB document to perform ITB analysis through the ITB analysis item provided to the user terminal (not shown). For example, in FIG. 3( a ), the ITB_AA document is selected. The user may select a category to be analyzed from among a plurality of categories. Each category may include toxin clause analysis selection items of different items in order to perform toxin clause analysis. Referring to FIG. 3A , the machine learning unit 13 provides a first toxin clause analysis selection item (Fit for purpose) and a second toxin clause analysis selection item (Open-ended) provided from a user terminal (not shown). cluse), 3rd poison clause analysis selection item (LD execution procedure), 4th poison clause analysis selection item (Payment options: pay-when-paid vs Pay-if-paid), 5th poison clause analysis selection item (Pre -payment: LOC with BL; SB LoC; LoC at-sight; Usance; Document against Payment DA/DP) Based on the information, an ITB analysis can be performed. Referring to (b) of FIG. 3 , the machine learning unit 13 applies the subject, verb, and object to each of the first toxin clause analysis selection item (Fit for purpose) in the sentence of the data included in the ITB document. Corresponding toxin clauses can be extracted. In addition, the machine learning unit 13 may extract a toxin clause corresponding to each of the subject, verb, and object corresponding to the fifth toxin clause analysis selection item (pre-payment) from the sentence of data included in the ITB document. . In addition, the visualization unit 14 may provide the analysis content extracted in response to the toxin clause analysis selection item in the machine learning unit 13 to the user terminal (not shown).

도 4는 본원의 일 실시예에 따른 데이터 분석 장치의 설계원가 예측 결과를 개략적으로 나타낸 도면이다.4 is a diagram schematically illustrating a design cost prediction result of the data analysis apparatus according to an embodiment of the present application.

본원의 일 실시예에 따르면, 머신러닝부(13)는 엔지니어링 산업의 복수의 분석 단계 항목 중 설계원가 예측 분석 항목에 대하여 머신러닝 기반으로 학습된 학습 모듈에 신규 프로젝트 데이터를 적용하여 신규 프로젝트 데이터에 대한 MH(Man Hour)를 예측하는 데이터 분석을 수행할 수 있다. 또한, 머신러닝부(13)는 MH(Man Hour)가 예측된 결과에 기반하여 국가(나라)별로 엔지니어링 단가를 입력한 입력정보데 기반하여 설계시수를 예측하는 분석을 수행할 수 있다. According to an embodiment of the present application, the machine learning unit 13 applies new project data to a learning module learned based on machine learning for a design cost prediction analysis item among a plurality of analysis step items in the engineering industry to apply the new project data to the new project data. It is possible to perform data analysis to predict MH (Man Hour). In addition, the machine learning unit 13 may perform an analysis of predicting the number of design hours based on input information in which an engineering unit price is input for each country (country) based on a result predicted by the MH (Man Hour).

예시적으로, 도 4를 참조하면, 머신러닝부(13)는 신규 프로젝트 데이터에 포함된 설계원가 예측에 필요한 데이터를 학습 모듈에 입력하고, MH(Man Hour)를 예측할 수 있다. 머신러닝부는, 구분된 항목(Discipline) 각각에 대응하여 MH(Man Hour)를 예측할 수 있다. 시각화부(14)는 예측된 MH(Man Hour)를 사용자 단말(미도시)로 제공할 수 있다. 데이터 입력부(12)는 사용자 단말(미도시)로부터 복수의 국가(나라) 중 적어도 어느 하나를 선택한 사용자 입력 정보를 수신할 수 있다. 머신러닝부(13)는 사용자의 국가(나라)를 선택한 입력 정보에 기반하여, 설계시수를 예측할 수 있다. 설계 시수는 엔지니어의 노동시간일 수 있다. For example, referring to FIG. 4 , the machine learning unit 13 may input data necessary for predicting design cost included in the new project data to the learning module and predict MH (Man Hour). The machine learning unit may predict a man hour (MH) corresponding to each of the classified items (discipline). The visualization unit 14 may provide the predicted Man Hour (MH) to the user terminal (not shown). The data input unit 12 may receive user input information for selecting at least one of a plurality of countries (countries) from a user terminal (not shown). The machine learning unit 13 may predict the number of design hours based on input information of selecting the user's country (country). The number of design hours may be an engineer's working hours.

도 5는 본원의 일 실시예에 따른 데이터 분석 장치의 설계오류 분석 결과를 개략적으로 나타낸 도면이다.5 is a diagram schematically illustrating a design error analysis result of the data analysis apparatus according to an embodiment of the present application.

본원의 일 실시예에 따르면, 머신러닝부(13)는 엔지니어링 산업의 복수의 분석 단계 항목 중 설계변경 분석 항목에 대하여 머신러닝 기반으로 학습된 학습 모듈에 신규 프로젝트 데이터를 적용하여 프로젝트 지연일 분석을 수행할 수 있다. 머신러닝부(13)는 신규 프로젝트에서 발생할 수 있는 프로젝트 지연일에 대하여 설계오류 Delay 심각도 구분 항목에 기반하여 신규 프로젝트의 설계오류 지연 심각도 분석을 수행할 수 있다. Delay 심각도 구분 항목은 제1공사기간 및 제2공사기간 각각에 대하여 안전, 경계, 위험, 심각으로 구분하여 수치화한 항목을 포함할 수 있다. 머신러닝 기반으로 학습된 학습 모듈은 프로젝트 기간을 입력으로 하고, 프로젝트 지연일 심각도를 출력으로 하는 학습을 통해 구축된 모듈일 수 있다. According to an embodiment of the present application, the machine learning unit 13 analyzes the project delay date by applying the new project data to the learning module learned based on machine learning for the design change analysis item among the plurality of analysis step items in the engineering industry. can be done The machine learning unit 13 may perform the design error delay severity analysis of the new project based on the design error delay severity classification item for the project delay date that may occur in the new project. The delay severity classification item may include items quantified by classifying safety, alertness, risk, and severity for each of the first and second construction periods. The learning module learned based on machine learning may be a module built through learning that takes the project period as an input and the severity of the project delay day as an output.

예시적으로 도 5를 참조하면, 머신러닝부(13)는 신규 프로젝트 데이터에서 공사기간 일자를 추출할 수 있다. 머신러닝부(13)는 추출된 공사기간 일자(프로젝트 기간(PJT Months))가 제1공사기간(예를 들어, 26개월 이상) 또는 제2공사기간(예를 들어, 26개월 이하)에 해당하는지 여부를 판단할 수 있다. 머신러닝부(13)는 신규 프로젝트 데이터를 이용하여 복수의 오류 사유 분석 중 적어도 어느 하나의 오류 사유로 분류하고, 분류된 오류 사유를 특정 오류 유형과 대응시킬 수 있다. 달리 말해, 머신러닝부(13)는 신규 프로젝트 데이터의 오류 사유를 분석하고, 프로젝트 지연일 분석을 수행할 수 있다. 예를 들어, 머신러닝부(13)는 오류 사유가 제1오류 사유(valve access to be considered)일 경우, 오류 유형을 M1으로 분류할 수 있다. 또한 머신러닝부(13)는 오류 사유가 제2오류 사유(drain to be provide)일 경우, 오류 유형을 D1으로 분류할 수 있다. 또한, 머신러닝부(13)는 오류 사유가 제3오류 사유(clear clash between popings)일 경우, 오류 유형을 H1으로 분류할 수 있다. 또한, 머신러닝부(13)는 오류 사유가 제4오류 사유(low pocket remove on tagged piping to be considered)일 경우, 오류 유형을 O1으로 분류할 수 있다. 또한, 머신러닝부(13)는 오류 사유가 제5오류 사유(duplicated line number with P-101113 suction line to be corrected)일 경우, 오류 유형을 C1으로 분류할 수 있다. 머신러닝부(13)는 오류 유형, 오류 사유, 지연일(Delay), 프로젝트 기간(PJT Months)을 기반으로 설계오류 지연 심각도 분석을 수행할 수 있다. 머신러닝부(13)는 오류 유형, 오류 사유, 지연일(Delay), 프로젝트 기간(PJT Months)을 고려하여, 설계오류 지연 심각도 분석을 안전, 경계, 위험, 심각 중 적어도 하나로 분석을 수행할 수 있다. 일예로, 머신러닝부(13)는 신규 프로젝트 데이터에서 프로젝트 기간(PJT Months)을 추출하고, 프로젝트 기간을 입력으로 하는 학습 모델에 적용하여, 안전, 경계, 위험, 심각 중 적어도 하나를 출력하는 학습을 수행할 수 있다. For example, referring to FIG. 5 , the machine learning unit 13 may extract a construction period date from the new project data. In the machine learning unit 13, the extracted construction period date (project period (PJT Months)) corresponds to the first construction period (eg, 26 months or more) or the second construction period (eg, 26 months or less) You can decide whether to do it or not. The machine learning unit 13 may classify at least one error reason among a plurality of error reasons analysis by using the new project data, and may associate the classified error reason with a specific error type. In other words, the machine learning unit 13 may analyze the reason for the error of the new project data and perform the project delay date analysis. For example, when the error reason is the first error reason (valve access to be considered), the machine learning unit 13 may classify the error type as M1. In addition, when the error reason is the second error reason (drain to be provide), the machine learning unit 13 may classify the error type as D1. Also, when the error reason is a third error reason (clear clash between popings), the machine learning unit 13 may classify the error type as H1. In addition, when the error reason is a fourth error reason (low pocket remove on tagged piping to be considered), the machine learning unit 13 may classify the error type as O1. Also, when the error reason is a fifth error reason (duplicated line number with P-101113 suction line to be corrected), the machine learning unit 13 may classify the error type as C1. The machine learning unit 13 may perform a design error delay severity analysis based on an error type, an error reason, a delay date (Delay), and a project period (PJT Months). The machine learning unit 13 considers the error type, error reason, delay date, and project period (PJT Months), and analyzes the design error delay severity analysis as at least one of safety, boundary, risk, and severe. have. As an example, the machine learning unit 13 extracts the project period (PJT Months) from the new project data, applies it to a learning model with the project period as an input, and outputs at least one of safety, alertness, risk, and severe learning. can be performed.

도 6은 본원의 일 실시예에 따른 데이터 분석 장치의 설계변경 분석 결과를 개략적으로 나타낸 도면이다. 6 is a diagram schematically illustrating a design change analysis result of the data analysis apparatus according to an embodiment of the present application.

본원의 일 실시예에 따르면, 머신러닝부(13)는 엔지니어링 산업의 복수의 분석 단계 항목 중 설계변경 분석 항목에 대하여 머신러닝 기반으로 학습된 학습 모듈에 신규 프로젝트 데이터를 적용하여 변경금액 분석을 수행할 수 있다. 머신러닝부(13)는 신규 프로젝트에서 설계변경으로 발생할 수 있는 프로젝트 설계변경 금액에 대하여 설계변경 사업비 심각도 구분 항목에 기반하여 신규 프로젝트의 변경 금액 분석을 수행할 수 있다. 달리 말해, 머신러닝부(13)는 엔지니어링 산업의 복수의 분석 단계 항목 중 설계변경 분석 항목에 대응하여 머신러닝 기반으로 학습된 학습 모듈에 신규 프로젝트 데이터를 적용하고, 신규 프로젝트 데이터에 포함된 텍스트 분석을 수행할 수 있다. 머신러닝부(13)는 설계변경 핵심단어 사전과 신규 프로젝트 데이터에 포함된 텍스트와의 유사도 연산을 수행하고, 설계변경 심각도 구분 항목에 기반하여 신규 프로젝터의 변경 금액 분석을 수행할 수 있다. According to an embodiment of the present application, the machine learning unit 13 performs a change amount analysis by applying new project data to a learning module learned based on machine learning for a design change analysis item among a plurality of analysis step items in the engineering industry. can do. The machine learning unit 13 may analyze the change amount of the new project based on the design change project cost severity classification item with respect to the project design change amount that may occur as a design change in the new project. In other words, the machine learning unit 13 applies the new project data to the learning module learned based on machine learning in response to the design change analysis item among the plurality of analysis step items in the engineering industry, and analyzes the text included in the new project data. can be performed. The machine learning unit 13 may perform a similarity calculation between the design change key word dictionary and the text included in the new project data, and analyze the change amount of the new projector based on the design change severity classification item.

도 6을 참조하면, 머신러닝부(13)는 신규 프로젝트 데이터에 포함된 텍스트 분석을 통해 설계오류 단어를 추출할 수 있다. 머신러닝부(13)는 미리 저장된 설계변경 핵심단어 사전과 신규 프로젝트 데이터에 포함된 텍스트 분석을 통해 추출된 설계오류 단어의 비교 분석을 통해 설계변경오류 유형을 추출할 수 있다. 머신러닝부(13)는 설계변경 핵심단어 사전과 설계변경으로 인한 사업비 구간별 심각도 구분을 통해 설계변경 심각도 분석을 수행할 수 있다. 설계변경 핵심단어 사전은 공종, 설계오류 유형, 설계오류 핵심단어로 분류되어 저장된 모델일 수 있다. Referring to FIG. 6 , the machine learning unit 13 may extract a design error word through text analysis included in the new project data. The machine learning unit 13 may extract the design change error type through comparative analysis of the design error word extracted through the text analysis included in the pre-stored design change key word dictionary and the new project data. The machine learning unit 13 may perform the design change severity analysis through the design change key word dictionary and the severity classification for each project cost section due to the design change. The design change key word dictionary may be a stored model classified into construction type, design error type, and design error key word.

예를 들어, 머신러닝부(13)는 신규 프로젝트 데이터에 포함된 텍스트 분석을 통해 용어들의 형태소를 분리할 수 있다. 머신러닝부(13)는 분리된 용어들로부터 설계변경 핵심단어 사전과의 유사도 연산을 통해 설계오류 핵심단어를 추출할 수 있다. 머신러닝부(13)는 추출된 설계오류 핵심단어를 기반으로 설계오류 유형을 구분할 수 있다. 머신러닝부(13)는 신규 프로젝트 데이터에 포함된 텍스트 분석을 통해 공종, 설계오류, 유형, 설계오류 핵심단어로 구분하여 데이터 셋을 구축할 수 있다. 머신러닝부(13)는 설계변경 사업비 심각도 구분 항목에 대응하여 설계변경 시 발생하는 변경 금액 분석을 수행할 수 있다. For example, the machine learning unit 13 may separate morphemes of terms through text analysis included in the new project data. The machine learning unit 13 may extract a design error key word from the separated terms through a similarity calculation with the design change key word dictionary. The machine learning unit 13 may classify a design error type based on the extracted design error key word. The machine learning unit 13 may construct a data set by classifying it into key words of construction type, design error, type, and design error through text analysis included in the new project data. The machine learning unit 13 may perform an analysis of the change amount generated when the design is changed in response to the design change project cost severity classification item.

도 7은 본원의 일 실시예에 따른 데이터 분석 장치의 예측정비 분석 결과를 개략적으로 나타낸 도면이다. 7 is a diagram schematically illustrating a result of predictive maintenance analysis of the data analysis apparatus according to an embodiment of the present application.

본원의 일 실시예에 따르면, 머신러닝부(13)는 엔지니어링 산업의 복수의 분석 단계 항목 중 예측정비 분석 항목에 대하여 머신러닝 기반으로 학습된 학습 모듈에 신규 프로젝트 데이터를 적용하여 정비품목탐지 분석을 수행할 수 있다. 예시적으로, 머신러닝부(13)는 Decision Tree 알고리즘, Random Forest 알고리즘, SVM 알고리즘 및 KNN알고리즘 중 적어도 어느 하나에 기초하여 학습 모델을 생성할 수 있다. 일예로, Decision Tree알고리즘은 분기마다 변수영역을 두 개로 구분하는 알고리즘이고, Random Forest알고리즘은 수많은 Decision Tree들이 Forest를 구성하여 각각의 예측결과를 하나의 결과변수로 평균화하는 알고리즘이다. SVM알고리즘은 데이터의 분포공간에서 가장 큰 폭의 경계를 구분하여 데이터가 속하는 분류를 판단하는 비확률적 알고리즘이고, KNN알고리즘은 학습데이터를 그룹으로 묶고 새로운 데이터를 주위 대부분의 데이터가 속한 그룹으로 분류하는 알고리즘이다. 예시적으로, 머신러닝부(13)에서 SVM알고리즘을 이용한 정비품목 탐지 분석의 정확도는 0.896의 정확도를 나타낸다.According to an embodiment of the present application, the machine learning unit 13 performs maintenance item detection analysis by applying new project data to a learning module learned based on machine learning for predictive maintenance analysis items among a plurality of analysis step items in the engineering industry. can be done For example, the machine learning unit 13 may generate a learning model based on at least one of a decision tree algorithm, a random forest algorithm, an SVM algorithm, and a KNN algorithm. For example, the Decision Tree algorithm divides the variable region into two for each branch, and the Random Forest algorithm is an algorithm in which numerous decision trees form a forest and average each prediction result into one result variable. The SVM algorithm is a non-probabilistic algorithm that determines the classification to which data belongs by classifying the boundary of the largest width in the data distribution space. It is an algorithm that Illustratively, the accuracy of the maintenance item detection and analysis using the SVM algorithm in the machine learning unit 13 represents an accuracy of 0.896.

도 7을 참조하면, 머신러닝부(13)는 다양한 알고리즘을 적용하여 정비품목탐지 분석을 수행할 수 있다. 머신러닝부(13)는 SVM 알고리즘을 이용하여 예측정비 분석 항목에 대한 정비품목탐지 분석을 수행할 수 있다. 머신러닝부(13)는 O&M 분석 항목에 포함된 예측정비 분석 항목을 SVM 알고리즘에 적용하여 정비품목탐지 분석을 수행하고, 분석 결과를 confusion matrix로 나타낼 수 있다. 시각화부(14)는 confusion matrix로 나타낸 분석 결과를 사용자 단말(미도시)로 제공할 수 있다. 머신러닝부(13)는 다양한 알고리즘을 수행한 결과를 기반으로 복수의 분석 항목에 대한 최종 모델을 확정할 수 있다. Referring to FIG. 7 , the machine learning unit 13 may perform maintenance item detection and analysis by applying various algorithms. The machine learning unit 13 may perform a maintenance item detection analysis on the predictive maintenance analysis item using the SVM algorithm. The machine learning unit 13 may perform maintenance item detection analysis by applying the predictive maintenance analysis item included in the O&M analysis item to the SVM algorithm, and may represent the analysis result as a confusion matrix. The visualization unit 14 may provide an analysis result represented by a confusion matrix to a user terminal (not shown). The machine learning unit 13 may determine a final model for a plurality of analysis items based on the results of performing various algorithms.

본원의 일 실시예에 따르면, 머신러닝부(13)는 엔지니어링 산업의 복수의 분석 단계 항목 각각에 대하여 복수의 학습 모델에 적용하여 수행된 학습 결과를 수집할 수 있다. 또한, 머신러닝부(13)는 가장 높은 정확도를 나타내는 학습 모델의 학습 결과를 엔지니어링 산업의 복수의 분석 단계 항목에 대한 분석 결과로 제공할 수 있다. 예를 들어, 머신러닝부(13)는 ITB 분석 항목, 설계원가 분석 항목, 설계오류 분석 항목, 설계변경 분석 항목, 예측정비 분석 항목 각각에 대하여 복수의 머신러닝 학습 모델을 적용하여 분석을 수행할 수 있다. 머신러닝부(13)는 복수의 머신러닝 학습 모델을 적용하여 복수의 분석 항목 각각에 대한 분석을 수행하고, 가장 높은 정확도를 나타내는 학습모델을 해당 분석 항목의 학습 모델로 선정할 수 있다. 달리 말해, 머신러닝부(13)는 ITB분석 항목에 제1학습모델을 적용하여 독소조항을 분석할 수 있다. 또한, 머신러닝부(13)는 설계원가 분석 항목에 제2학습 모델을 적용하여 MH(Man Hour)를 예측 분석을 수행할 수 있다. 또한, 머신러닝부(13)는 설계오류 분석 항목에 제3학습 모델을 적용하여 프로젝트 지연일 분석을 수행할 수 있다. 또한, 머신러닝부(13)는 설계오류 분석 항목에 제4학습 모델을 적용하여 변경금액 분석을 수행할 수 있다. 또한, 머신러닝부(13)는 예측정비 분석 항목에 제5학습 모델을 적용하여 정비품목탐지 분석을 수행할 수 있다. According to an embodiment of the present application, the machine learning unit 13 may collect a learning result performed by applying to a plurality of learning models for each of a plurality of analysis step items in the engineering industry. In addition, the machine learning unit 13 may provide a learning result of the learning model showing the highest accuracy as an analysis result for a plurality of analysis step items in the engineering industry. For example, the machine learning unit 13 performs analysis by applying a plurality of machine learning learning models to each of the ITB analysis item, the design cost analysis item, the design error analysis item, the design change analysis item, and the predictive maintenance analysis item. can The machine learning unit 13 may apply a plurality of machine learning learning models to perform an analysis on each of a plurality of analysis items, and select a learning model showing the highest accuracy as a learning model of the corresponding analysis item. In other words, the machine learning unit 13 may analyze the toxin clause by applying the first learning model to the ITB analysis item. In addition, the machine learning unit 13 may perform predictive analysis of MH (Man Hour) by applying the second learning model to the design cost analysis item. In addition, the machine learning unit 13 may perform project delay analysis by applying the third learning model to the design error analysis item. In addition, the machine learning unit 13 may perform a change amount analysis by applying the fourth learning model to the design error analysis item. In addition, the machine learning unit 13 may perform a maintenance item detection analysis by applying the fifth learning model to the predictive maintenance analysis item.

예시적으로, 머신러닝부(13)는 분석 결과 데이터에 기초하여 복수의 머신러닝 알고리즘 각각의 평가지수를 산출할 수 있다. 머신러닝부(13)는 회귀 알고리즘 유형에 대응하는 복수의 머신러닝 알고리즘 각각의 결과값에 대해 평균 제곱근 오차(RMSE: Root Mean Squared Error) 값을 이용하여 평가지수를 산출할 수 있다. 평균 제곱근 오차가 낮을수록 회귀 알고리즘의 정확성 및 신뢰성이 높은 알고리즘이므로, 머신러닝부(13)는 회귀 알고리즘 유형에 대응하는 복수의 머신러닝 알고리즘 각각의 평가지수를 산출하여 평균 제곱근 오차가 낮은 순으로 순위를 산출할 수 있다. For example, the machine learning unit 13 may calculate an evaluation index of each of the plurality of machine learning algorithms based on the analysis result data. The machine learning unit 13 may calculate an evaluation index by using a root mean squared error (RMSE) value for each result value of a plurality of machine learning algorithms corresponding to the regression algorithm type. Since the lower the root mean square error is, the higher the accuracy and reliability of the regression algorithm, the machine learning unit 13 calculates the evaluation index of each of the plurality of machine learning algorithms corresponding to the regression algorithm type, and ranks them in the order of the lowest root mean square error. can be calculated.

일예로, 복수의 머신러닝 학습 모델은 회기 알고리즘, 분류 알고리즘, 군집 알고리즘, 딥러닝 알고리즘에 기반하여 생성될 수 있다. 비지도 학습이란 학습용 데이터를 구축하는 것이 아니라 데이터 자체를 분석하거나 군집하면서 학습하는 알고리즘을 의미한다. 머신러닝부(13)는 군집 알고리즘에 기초하여 분석 패턴을 군집하여 산출할 수 있고, 분석 패턴 각각의 군집간 분리도에 기초하여 새로운 분석 패턴을 검출할 수 있다. 예시적으로, 비지도 학습을 위한 군집 알고리즘에는 로지스틱 회귀 알고리즘, 랜덤 포레스트 알고리즘, SVM(Support Vector Machine)알고리즘, 의사결정 알고리즘 및 군집 알고리즘이 이용될 수 있다. 또한, 머신러닝부(13)는 상술한 알고리즘 외에도 Extra Tree알고리즘, XG Boost알고리즘 및 Deep Learning 알고리즘, K-means 클러스터링 알고리즘, SOM(Self-Organizing-Maps) 알고리즘 EM & Canopy 알고리즘과 같은 군집 알고리즘을 통해 비지도 학습을 수행할 수 있다. Random Forest알고리즘은 수많은 Decision Tree들이 Forest를 구성하여 각각의 예측결과를 하나의 결과변수로 평균화하는 알고리즘이고, SVM알고리즘은 데이터의 분포공간에서 가장 큰 폭의 경계를 구분하여 데이터가 속하는 분류를 판단하는 비확률적 알고리즘이다. Extra Tree알고리즘은 Random forest와 비슷하나 속도가 Random forest에 비해 빠른 알고리즘이며, XGBoost알고리즘은 Random Forest의 Tree는 독립적이라면 XGBoost의 Tree의 결과를 다음 트리에 적용하는 boost방식의 알고리즘이다. Deep Learning알고리즘은 다층구조의 Neural Network을 기반으로 변수의 패턴이 결과에 미치는 영향을 가중치로 조절하며 학습하는 알고리즘이다. 또한, K-means 클러스터링 알고리즘은 전통적인 분류기법으로 대상집단을 거리의 평균값(유사도)을 기준으로 K개의 군집으로 반복 세분화하는 기법이고, SOM알고리즘은 인공신경망을 기반으로 훈련집합의 입력 패턴을 가중치로 학습하여 군집화하는 기법이다. 또한 EM & Canopy 알고리즘은 주어진 초깃값으로 가능성이 최대인 것부터 반복 과정을 통해 파라미터 값을 갱신하여 군집화 하는 기법을 의미한다.For example, a plurality of machine learning learning models may be generated based on a regression algorithm, a classification algorithm, a clustering algorithm, and a deep learning algorithm. Unsupervised learning refers to an algorithm that learns while analyzing or clustering data itself, rather than constructing learning data. The machine learning unit 13 may cluster and calculate the analysis patterns based on the clustering algorithm, and may detect a new analysis pattern based on the degree of separation between clusters of each analysis pattern. For example, a logistic regression algorithm, a random forest algorithm, a support vector machine (SVM) algorithm, a decision-making algorithm, and a clustering algorithm may be used as the clustering algorithm for unsupervised learning. In addition to the above algorithms, the machine learning unit 13 uses the Extra Tree algorithm, the XG Boost algorithm and the Deep Learning algorithm, the K-means clustering algorithm, the SOM (Self-Organizing-Maps) algorithm, and the EM & Canopy algorithm through clustering algorithms. Unsupervised learning can be performed. The Random Forest algorithm is an algorithm that consists of a forest of numerous decision trees and averages each prediction result into a single result variable. It is a non-stochastic algorithm. Extra Tree Algorithm is similar to Random Forest, but it is faster than Random Forest. XGBoost Algorithm is a boost algorithm that applies the result of XGBoost Tree to the next tree if the Tree of Random Forest is independent. The deep learning algorithm is an algorithm that learns by controlling the effect of variable patterns on the results with weights based on a multi-layered neural network. In addition, the K-means clustering algorithm is a traditional classification technique that iteratively subdivides the target group into K clusters based on the average value (similarity) of the distance. It is a learning and clustering technique. In addition, the EM & Canopy algorithm refers to a method of clustering by updating parameter values through an iterative process starting with the maximum possibility with a given initial value.

본원의 다른 일 실시예에 따르면, 머신러닝부(13)는 데이터 입력부(12)에서 입력받은 신규 프로젝트 데이터에 대한 전처리를 수행할 수 있다. 예를 들어, 머신러닝부(13)는 프로젝트 데이터로부터 데이터 정제(이상값 탐지, 데이터 수정) 및 ETL(추출, 변환, 적재) 등을 수행할 수 있다. 또한, 머신러닝부(13)는 신규 프로젝트 데이터가 비정형 데이터인 경우 데이터 정규화를 수행할 수 있다. 데이터 정규화는 신규 프로젝트 데이터에 포함된 데이터의 변수 값들을 일정 기준으로 통일시키는 것일 수 있다. 또한, 머신러닝부(13)는 학습 모델을 구축하고, 구축된 학습 모델을 평가하고, 모델을 최적화할 수 있다. According to another embodiment of the present application, the machine learning unit 13 may perform pre-processing on the new project data input from the data input unit 12 . For example, the machine learning unit 13 may perform data purification (outlier detection, data correction) and ETL (extraction, transformation, loading) and the like from project data. In addition, the machine learning unit 13 may perform data normalization when the new project data is unstructured data. Data normalization may be to unify variable values of data included in new project data based on a certain standard. In addition, the machine learning unit 13 may build a learning model, evaluate the built learning model, and optimize the model.

예시적으로, 머신러닝부(13)는 데이터베이스(미도시) 및 데이터 입력부(12)로부터 분석 대상 데이터를 제공받을 수 있다. 머신러닝부(13)는 데이터베이스(미도시)에 미리 저장된 프로젝트 데이터를 분석 대상 데이터로 제공받을 수 있다. 또한, 머신러닝부(13)는 데이터 입력부(12)로부터 신규 프로젝트 데이터를 분석 대상 데이터로 제공받을 수 있다. 머신러닝부(13)는 피처 엔지니어링에 기반하여 분석 대상 데이터로부터 특질 변수를 추출할 수 있다. 분석 대상 데이터에는 숫자와 같이 수치형 변수를 포함하는 정형화된 데이터뿐만 아니라, 기호, 단어, 문장 등 텍스트 기반의 비정형 데이터를 포함할 수 있다. 머신러닝부(13)는 분석 대상 데이터에 포함된 정형 데이터 또는 비정형 데이터로부터 특질 변수를 추출할 수 있다.For example, the machine learning unit 13 may receive analysis target data from a database (not shown) and the data input unit 12 . The machine learning unit 13 may receive project data stored in advance in a database (not shown) as analysis target data. In addition, the machine learning unit 13 may receive the new project data from the data input unit 12 as analysis target data. The machine learning unit 13 may extract a characteristic variable from the analysis target data based on feature engineering. The data to be analyzed may include text-based unstructured data such as symbols, words, and sentences as well as standardized data including numeric variables such as numbers. The machine learning unit 13 may extract a characteristic variable from structured data or unstructured data included in the analysis target data.

본원의 일 실시예에 따르면, 데이터 입력부(12)는 비정형 데이터로부터 특질 변수 산출시 사용자에게 최적화된 특질 변수를 산출할 수 있도록 단어 사전을 등록하는 기능을 수행할 수 있다. 머신러닝부(13)는 분석 대상 데이터 중 비정형 데이터의 경우, 비지도 학습 기반 자연어 처리 알고리즘에 기초하여 특질 변수를 추출할 수 있다. 비정형 데이터로부터의 특질 변수 산출에 대해 구체적으로 살펴보면, 머신러닝부(13)는 분석 대상 데이터가 비정형 데이터를 포함하는 경우, 사용자 최적화된 단어 사전으로부터 최적 벡터값을 추출할 수 있다. 또한, 머신러닝부(13)는 최적 벡터값에 대한 주성분 분석(PCA: Principal Component Analysis)을 통해 비정형 데이터에 포함된 텍스트로부터 명사를 추출할 수 있다. 주성분 분석을 위한 알고리즘에는 Soynlp 알고리즘이 활용될 수 있으나, 이에 한정되는 것은 아니다. Soynlp 알고리즘은 별개의 학습 데이터를 요구하지 않으면서 분석 대상 데이터에 존재하는 단어를 추출할 수 있을 뿐만 아니라, 문장을 단어열로 분해하거나 품사를 판별할 수 있는 비지도 학습 기반 알고리즘이다. 머신러닝부(13)는 단어 사전에 기반하여 상기 명사를 포함하는 단어의 빈도에 따라 부여된 점수에 따라 상기 특질 변수를 산출할 수 있다. 예시적으로, 머신러닝부(13)는 상기 단어 사전에 기반하여 텍스트로부터 추출된 명사 즉 단어의 빈도가 높을수록 해당 단어에 높은 점수를 부여할 수 있다. 점수 부여 방식은 미리 설정된 빈도에 따라 차등적으로 점수를 부여하는 절대적인 방식을 통해 이루어질 수 있고, 분석 대상 데이터에서 등장하는 단어의 상대적인 비율에 따라 점수를 부여하는 상대적인 방식을 통해서도 이루어질 수 있다. (예를 들어, 등장 빈도가 상위 10%인 단어는 상위 30%인 단어보다 높은 점수가 부여될 수 있다)According to an embodiment of the present disclosure, the data input unit 12 may perform a function of registering a word dictionary so that a feature variable optimized for a user can be calculated when a feature variable is calculated from the unstructured data. The machine learning unit 13 may extract a characteristic variable based on an unsupervised learning-based natural language processing algorithm in the case of unstructured data among the data to be analyzed. Specifically, the calculation of the feature variable from the unstructured data will be described. When the analysis target data includes unstructured data, the machine learning unit 13 may extract an optimal vector value from the user-optimized word dictionary. Also, the machine learning unit 13 may extract a noun from the text included in the unstructured data through principal component analysis (PCA) on the optimal vector value. The Soynlp algorithm may be used as an algorithm for principal component analysis, but is not limited thereto. The Soynlp algorithm is an unsupervised learning-based algorithm that can not only extract words from the data to be analyzed without requiring separate learning data, but also decompose sentences into word sequences or discriminate parts of speech. The machine learning unit 13 may calculate the characteristic variable according to a score given according to the frequency of the word including the noun based on the word dictionary. For example, the machine learning unit 13 may give a higher score to the noun extracted from the text based on the word dictionary, that is, the higher the frequency of the word. The scoring method may be achieved through an absolute method of differentially assigning points according to a preset frequency, or a relative method of assigning points according to a relative ratio of words appearing in the data to be analyzed. (For example, a word in the top 10% of occurrences may be given a higher score than a word in the top 30%)

본원의 일 실시예에 따르면, 시각화부(14)는 머신러닝부(13)의 분석 결과를 시각화하여 제공할 수 있다. 시각화부(14)는 머신러닝부(13)에서 분석한 결과를 분석 항목에 대응하여 서로 다른 GUI 를 적용하여 시각화하여 제공할 수 있다. According to an exemplary embodiment of the present application, the visualization unit 14 may provide a visualization result of the analysis of the machine learning unit 13 . The visualization unit 14 may visualize and provide the results analyzed by the machine learning unit 13 by applying different GUIs to the analysis items.

일예로, 도 2를 참조하면, 시각화부(14)는 복수의 분석 단계 항목을 사용자 단말(미도시)로 제공할 수 있다. 또한, 도 3의 (a)를 참조하면, 시각화부(14)는 ITB 분석을 위한 ITB 문서 선택 항목, 카테고리 선택항목, 독소조항 분석 선택 항목을 사용자 단말(미도시)로 제공할 수 있다. 또한, 도 3 의(b)를 참조하면, 시각화부(14)는 머신러닝부(13)에서 수행된 ITB 분석 결과를 사용자 단말(미도시)로 제공할 수 있다. 또한, 도 4를 참조하면, 시각화부(14)는 설계원가 예측 결과를 사용자 단말(미도시)로 제공할 수 있다. 또한, 도 5를 참조하면, 시각화부(14)는 설계오류 분석 결과를 사용자 단말(미도시)로 제공할 수 있다. 또한, 도 6을 참조하면, 시각화부(14)는 설계변경 분석 결과를 사용자 단말(미도시)로 제공할 수 있다. 또한, 도 7을 참조하면, 시각화부(14)는 예측정비 분석 결과를 사용자 단말(미도시)로 제공할 수 있다.For example, referring to FIG. 2 , the visualization unit 14 may provide a plurality of analysis step items to a user terminal (not shown). In addition, referring to FIG. 3A , the visualization unit 14 may provide an ITB document selection item, a category selection item, and a toxin clause analysis selection item for ITB analysis to a user terminal (not shown). In addition, referring to FIG. 3B , the visualization unit 14 may provide the ITB analysis result performed by the machine learning unit 13 to the user terminal (not shown). Also, referring to FIG. 4 , the visualization unit 14 may provide a design cost prediction result to a user terminal (not shown). Also, referring to FIG. 5 , the visualization unit 14 may provide a design error analysis result to a user terminal (not shown). Also, referring to FIG. 6 , the visualization unit 14 may provide a design change analysis result to a user terminal (not shown). Also, referring to FIG. 7 , the visualization unit 14 may provide a predictive maintenance analysis result to a user terminal (not shown).

또한, 시각화부(14)는 머신러닝부(13)에서 추출된 분석 대상 데이터의 특성을 고려하여 추출된 특질 변수의 변수 속성 및 변수 값 분포를 시각화하여 제공할 수 있다. 변수 속성 및 변수 값 분포는 통계적으로 산출될 수 있으며, Key-valuew 형태로 제공될 수 있다. 머신러닝부(13)는 분석 대상 데이터에 따른 변수가 수치형 변수인 경우, 특성에 따라 통계량을 산출할 수 있다. 또한, 머신러닝부(13)는 분석 대상 데이터에 따른 변수가 카테고리형 변수인 경우 변수의 카테고리별 특질 변수의 카운트 수 및 카운트 비율을 산출할 수 있다. 시각화부(14)는 분석 대상 데이터의 변수 속성, 변수 값 분포 및 변수 설명을 사용자 단말(미도시)로 제공할 수 있다. 또한, 시각화부(14)는 통계량, 카테고리별 카운트 수 및 카운트 비율을 사용자 단말(미도시)로 제공할 수 있다. 사용자 단말(미도시)은 시각화부(14)에서 제공받은 데이터를 출력할 수 있다. Also, the visualization unit 14 may visualize and provide the variable properties and variable value distribution of the extracted characteristic variable in consideration of the characteristics of the analysis target data extracted by the machine learning unit 13 . Variable properties and variable value distribution may be statistically calculated and may be provided in the form of key-valuew. When the variable according to the analysis target data is a numeric variable, the machine learning unit 13 may calculate a statistic according to a characteristic. In addition, when the variable according to the analysis target data is a categorical variable, the machine learning unit 13 may calculate the count number and count ratio of the characteristic variable for each category of the variable. The visualization unit 14 may provide a variable attribute, a variable value distribution, and a variable description of the analysis target data to a user terminal (not shown). Also, the visualization unit 14 may provide statistics, the number of counts for each category, and a count ratio to a user terminal (not shown). The user terminal (not shown) may output the data provided by the visualization unit 14 .

본원의 일 실시예에 따르면, 정형 데이터 즉, 수치화된 변수의 통계량과 비정형 데이터의 카테고리별 카운트 수 및 비율을 산출함으로써 변수에 대한 이해도가 향상될 수 있으며, 이를 특질 변수 산출에 활용함으로써 머신러닝 알고리즘의 입력 데이터로서 보다 신뢰성 있는 특질 변수가 산출될 수 있다. 또한, 사용자는 특질 변수 산출과정에서 분석 대상 데이터의 특성을 용이하게 파악할 수 있다.According to an embodiment of the present application, understanding of variables can be improved by calculating the statistics of structured data, that is, numerical variables, and counts and ratios for each category of unstructured data, and by utilizing this for calculating characteristic variables, a machine learning algorithm A more reliable feature variable can be calculated as input data of . In addition, the user can easily grasp the characteristics of the data to be analyzed in the process of calculating the characteristic variable.

도 8은 본원의 일 실시예에 따른 데이터 분석 장치의 API를 제공하는 화면을 개략적으로 나타낸 도면이다. 도 8을 참조하면, 데이터 분석 장치(10)는 Bidding 분석 단계, Engineering 분석 단계, Construction & Commissioning 분석 단계, O&M 분석 단계 각각의 분석 결과에 대해 확장성을 고려하여 API를 호출한 결과를 나타낼 수 있다. 8 is a diagram schematically illustrating a screen providing an API of a data analysis apparatus according to an embodiment of the present application. Referring to FIG. 8 , the data analysis device 10 may indicate the result of calling the API in consideration of scalability for each analysis result of the bidding analysis step, the engineering analysis step, the construction & commissioning analysis step, and the O&M analysis step. .

도 9는 본원의 일 실시예에 따른 데이터 분석 장치의 데이터 검색 제공 화면 및 검색 결과를 개략적으로 나타낸 도면이고, 도 10은 본원의 일 실시예에 따른 데이터 분석 장치의 유사 프로젝트에 대한 통계적 시각화 결과를 개략적으로 나타낸 도면이다.9 is a view schematically showing a data search providing screen and a search result of the data analysis apparatus according to an embodiment of the present application, and FIG. 10 is a statistical visualization result for a similar project of the data analysis apparatus according to an embodiment of the present application It is a schematic drawing.

본원의 일 실시예에 따르면, 데이터 입력부(12)는 데이터 수집부(11)에서 수집된 복수의 프로젝트 데이터 중 검색을 수행할 프로젝트에 관한 복수의 검색 항목을 선택하는 사용자 입력 정보를 수신할 수 있다. 또한, 머신러닝부(13)는 사용자 입력 정보와 복수의 프로젝트 데이터 간의 유사도 분석을 수행하고, 유사도 분석 결과에 기반하여 적어도 어느 하나의 프로젝트 리스트를 제공할 수 있다. According to an embodiment of the present disclosure, the data input unit 12 may receive user input information for selecting a plurality of search items related to a project to be searched among a plurality of project data collected by the data collection unit 11 . . In addition, the machine learning unit 13 may perform a similarity analysis between the user input information and a plurality of project data, and provide at least one project list based on the similarity analysis result.

예시적으로 도 9를 참조하면, 시각화부(14)는 사용자 단말(미도시)로 프로젝트 검색을 수행할 복수의 검색 항목을 제공할 수 있다. 데이터 입력부(12)는 사용자 단말(미도시)로부터 복수의 검색 항목 중 적어도 어느 하나를 선택한 사용자 입력 정보를 수신할 수 있다. 복수의 검색 항목은, 플랜트 종류, 프로젝트 코드, 프로젝트 명칭, 프로젝트 유형, 현장위치, 사업분야, 규모(용량), 프로젝트 기간, 발주처 등을 포함할 수 있다. 머신러닝부(13)는 사용자 입력 정보에 기반하여 데이터베이스(미도시)에 저장된 프로젝트 데이터와의 유사도 분석을 수행할 수 있다. 일예로, 유사도 분석은, Fuzzy Matching 알고리즘을 기반으로 수행될 수 있다. Fuzzy Matching 알고리즘은 편집거리(레펜슈타인, Levenshtein Distance)를 기반으로 계산된 결과값을 사용하여 데이터 간의 유사도를 계산해 주는 알고리즘일 수 있다. 머신러닝부(13)는 유사도 분석 결과에 기반하여 유사 프로젝트 리스트를 제공할 수 있다. 또한, 데이터 입력부(12)는 유사도 분석 결과에 기반하여 제공된 유사 프로젝트 리스트 중 적어도 어느 하나를 선택하는 사용자 입력 정보를 수신할 수 있다. 시각화부(14)는 사용자 입력 정보에 기반하여 유사 프로젝트를 제공할 수 있다. For example, referring to FIG. 9 , the visualization unit 14 may provide a plurality of search items for performing a project search to a user terminal (not shown). The data input unit 12 may receive user input information for selecting at least one of a plurality of search items from a user terminal (not shown). The plurality of search items may include a plant type, a project code, a project name, a project type, a site location, a business field, a size (capacity), a project period, an ordering party, and the like. The machine learning unit 13 may perform similarity analysis with project data stored in a database (not shown) based on user input information. As an example, the similarity analysis may be performed based on a fuzzy matching algorithm. The fuzzy matching algorithm may be an algorithm for calculating the similarity between data using a result calculated based on the editing distance (Levenshtein Distance). The machine learning unit 13 may provide a similar project list based on the similarity analysis result. Also, the data input unit 12 may receive user input information for selecting at least one of the provided similar project list based on the similarity analysis result. The visualization unit 14 may provide a similar project based on user input information.

또한, 도 10을 참조하면, 시각화부(14)는 유사 프로젝트에 대한 통계적 시각화 결과를 제공할 수 있다. 시각화부(14)는 검색 결과에 대해 프로젝트 지연일, MM 등을 통계적 기법으로 도출하고, 이를 시각화하여 제공할 수 있다. Also, referring to FIG. 10 , the visualization unit 14 may provide statistical visualization results for similar projects. The visualization unit 14 may derive the project delay date, MM, and the like with respect to the search result using a statistical technique, and visualize it and provide it.

이하에서는 상기에 자세히 설명된 내용을 기반으로, 본원의 동작 흐름을 간단히 살펴보기로 한다.Hereinafter, an operation flow of the present application will be briefly reviewed based on the details described above.

도 11은 본원의 일 실시예에 따른 엔지니어링 전주기 데이터를 활용한 데이터 분석 방법에 대한 동작 흐름도이다.11 is an operation flowchart of a data analysis method using engineering full cycle data according to an embodiment of the present application.

도 11에 도시된 엔지니어링 전주기 데이터를 활용한 데이터 분석 방법은 앞서 설명된 데이터 분석 장치(10)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 데이터 분석 장치(10)에 대하여 설명된 내용은 엔지니어링 전주기 데이터를 활용한 데이터 분석 방법에 대한 설명에도 동일하게 적용될 수 있다.The data analysis method using the engineering whole cycle data shown in FIG. 11 may be performed by the data analysis apparatus 10 described above. Therefore, even if omitted below, the description of the data analysis apparatus 10 may be equally applied to the description of the data analysis method utilizing the data of the entire engineering cycle.

단계 S101에서, 데이터 분석 장치(10)는 엔지니어링 산업의 전주기에서 발생하는 프로젝트 데이터를 수집할 수 있다. In step S101, the data analysis device 10 may collect project data generated in the entire cycle of the engineering industry.

단계 S102에서, 데이터 분석 장치(10)는 신규 프로젝트 데이터를 입력 받을 수 있다. In step S102, the data analysis device 10 may receive new project data.

단계 S103에서, 데이터 분석 장치(10)는 입력받은 신규 프로젝트 데이터를 머신러닝 기반으로 학습된 학습 모듈에 적용하여 데이터 분석을 수행할 수 있다. In step S103, the data analysis apparatus 10 may perform data analysis by applying the received new project data to a learning module learned based on machine learning.

단계 S104에서, 데이터 분석 장치(10)는 데이터 분석 결과를 시각화하여 제공할 수 있다. In step S104 , the data analysis apparatus 10 may provide a visualization result of data analysis.

상술한 설명에서, 단계 S101 내지 S104는 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S101 to S104 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present application. In addition, some steps may be omitted as necessary, and the order between steps may be changed.

본원의 일 실시 예에 따른 엔지니어링 전주기 데이터를 활용한 데이터 분석 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The data analysis method using the engineering whole cycle data according to an embodiment of the present application may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. Examples of the computer readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

또한, 전술한 엔지니어링 전주기 데이터를 활용한 데이터 분석 방법은 기록 매체에 저장되는 컴퓨터에 의해 실행되는 컴퓨터 프로그램 또는 애플리케이션의 형태로도 구현될 수 있다.In addition, the data analysis method utilizing the above-described engineering full cycle data may be implemented in the form of a computer program or application executed by a computer stored in a recording medium.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present application is for illustration, and those of ordinary skill in the art to which the present application pertains will understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present application.

10: 데이터 분석 장치
11: 데이터 수집부
12: 데이터 입력부
13: 머신러닝부
14: 시각화부10: data analysis device
11: data collection unit
12: data input
13: Machine Learning Department
14: visualization unit

Claims

In the data analysis device using the engineering whole cycle data,
a data collection unit that collects project data occurring in the entire cycle of the engineering industry;
a data input unit for receiving new project data;
a machine learning unit for performing data analysis by applying the new project data received from the data input unit to a learning module learned based on machine learning; and
a visualization unit that visualizes and provides the analysis result of the machine learning unit;
including,
The machine learning unit,
Toxin clause analysis is performed by applying the new project data to the first learning module learned based on machine learning to detect the toxin clause included in the ITB document data for the ITB analysis item among the plurality of analysis step items of the engineering industry However, the toxin clause analysis uses the toxin clause dictionary to separate the morphemes of the terms included in the input ITB document to perform standardization, and includes a plurality of words included in the separated term in the toxin clause dictionary stored in advance Comparison is performed with the words of the toxin clause, but the comparison is performed by an artificial intelligence-based algorithm Fuzzy Data Matching algorithm,
Predictive analysis of MH (Man Hour) is performed by applying the new project data to the second learning module learned based on machine learning for the design cost prediction analysis item among the plurality of analysis step items of the engineering industry,
Performing project delay date analysis by applying the new project data to the third learning module learned based on machine learning for the design error analysis item among the plurality of analysis step items of the engineering industry,
A change amount analysis is performed by applying the new project data to a fourth learning module learned based on machine learning for a design change analysis item among a plurality of analysis step items of the engineering industry,
A maintenance item detection analysis is performed by applying the new project data to a fifth learning module learned based on machine learning for a predictive maintenance analysis item among a plurality of analysis step items of the engineering industry,
Reset the value of the hyper parameter to increase the performance of the first to fifth learning modules,
Randomly extract the value of the hyperparameter from a set range, perform module learning using the sampled hyperparameter value, evaluate the accuracy with validation data, which is data for evaluating the adequacy of the hyperparameter,
The process of randomly extracting the value of the hyper parameter, learning the module, and repeating the process of evaluating the accuracy with the verification data a preset number of times, determining the accuracy and resetting the range of the hyper parameter, The reset is reset in the direction of narrowing the range of the hyperparameter,
For each of the plurality of analysis step items of the engineering industry, the learning results performed by applying to a plurality of learning models are collected, and the learning results of the learning model representing the highest accuracy learning results are applied to the plurality of analysis step items of the engineering industry. provided as an analysis result for
When the plurality of learning models are of the regression algorithm type, the evaluation index of each of the plurality of machine learning algorithms corresponding to the regression algorithm type is calculated, and the ranking is calculated in the order of the lowest root mean square error, and a plurality of analysis step items of the engineering industry A data analysis device that is to be provided as an analysis result for the.

According to claim 1,
The data input unit,
Of the plurality of analysis step items of the engineering industry, which receives user input information for selecting any one analysis step item to be analyzed, the data analysis device.

3. The method of claim 2,
A plurality of analysis step items of the engineering industry,
Bidding analysis items, Engineering analysis items, Construction & Commissioning analysis items, O & M analysis items, the data analysis device comprising at least one of the items.

4. The method of claim 3,
The learning module learned based on the machine learning,
A data analysis device comprising at least one learning module of a regression model, a classification model, a cluster model, and a deep learning model.

delete

5. The method of claim 4,
The machine learning unit,
For each of the plurality of analysis step items of the engineering industry, the learning results performed by applying to a plurality of learning models are collected, and the learning results of the learning model representing the highest accuracy learning results are applied to the plurality of analysis step items of the engineering industry. A data analysis device that is to be provided as an analysis result for.

5. The method of claim 4,
The data input unit,
Receive user input information for selecting a plurality of search items related to a project to be searched among a plurality of project data collected by the data collection unit,
The machine learning unit,
A data analysis apparatus that performs a similarity analysis between the user input information and the plurality of project data, and provides at least one project list based on a similarity analysis result.

In a data analysis method using data of the entire engineering cycle in which each step is performed by a data analysis device implemented by a computer,
collecting project data occurring in the entire cycle of the engineering industry;
receiving new project data;
performing data analysis by applying the received new project data to a learning module learned based on machine learning; and
Visualizing and providing analysis results;
including,
The step of performing the data analysis is,
Toxin clause analysis is performed by applying the new project data to the first learning module learned based on machine learning to detect the toxin clause included in the ITB document data for the ITB analysis item among the plurality of analysis step items of the engineering industry However, the toxin clause analysis uses the toxin clause dictionary to separate the morphemes of the terms included in the input ITB document to perform standardization, and includes a plurality of words included in the separated term in the toxin clause dictionary stored in advance Comparison is performed with the words of the toxin clause, but the comparison is performed by an artificial intelligence-based algorithm Fuzzy Data Matching algorithm,
Predictive analysis of MH (Man Hour) is performed by applying the new project data to the second learning module learned based on machine learning for the design cost prediction analysis item among the plurality of analysis step items of the engineering industry,
Performing project delay date analysis by applying the new project data to the third learning module learned based on machine learning for the design error analysis item among the plurality of analysis step items of the engineering industry,
A change amount analysis is performed by applying the new project data to a fourth learning module learned based on machine learning for a design change analysis item among a plurality of analysis step items of the engineering industry,
A maintenance item detection analysis is performed by applying the new project data to a fifth learning module learned based on machine learning for a predictive maintenance analysis item among a plurality of analysis step items of the engineering industry,
Reset the value of the hyper parameter to increase the performance of the first to fifth learning modules,
Randomly extract the value of the hyperparameter from a set range, perform module learning using the sampled hyperparameter value, evaluate the accuracy with validation data, which is data for evaluating the adequacy of the hyperparameter,
The process of randomly extracting the value of the hyper parameter, learning the module, and repeating the process of evaluating the accuracy with the verification data a preset number of times, determining the accuracy and resetting the range of the hyper parameter, resetting is reset in the direction of narrowing the range of the hyperparameter,
For each of the plurality of analysis step items of the engineering industry, the learning results performed by applying to a plurality of learning models are collected, and the learning results of the learning model representing the highest accuracy learning results are applied to the plurality of analysis step items of the engineering industry. provided as an analysis result for
When the plurality of learning models are of the regression algorithm type, the evaluation index of each of the plurality of machine learning algorithms corresponding to the regression algorithm type is calculated, and the ranking is calculated in the order of the lowest root mean square error, and a plurality of analysis step items of the engineering industry To provide as an analysis result for the, data analysis method.