KR20220056288A - A crowdsourcing system for artificial intelligence data composition - Google Patents
A crowdsourcing system for artificial intelligence data composition Download PDFInfo
- Publication number
- KR20220056288A KR20220056288A KR1020200140357A KR20200140357A KR20220056288A KR 20220056288 A KR20220056288 A KR 20220056288A KR 1020200140357 A KR1020200140357 A KR 1020200140357A KR 20200140357 A KR20200140357 A KR 20200140357A KR 20220056288 A KR20220056288 A KR 20220056288A
- Authority
- KR
- South Korea
- Prior art keywords
- data
- artificial intelligence
- deep learning
- crowdsourcing system
- ontology
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Educational Administration (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Game Theory and Decision Science (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Development Economics (AREA)
- Databases & Information Systems (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
본 발명은 원시 데이터를 수집하여 형태소 분석을 통해 키워드를 추출하고, 딥러닝을 통해 지식 데이터를 추출하여 온톨로지 지식 베이스로 구축하는, 인공지능 데이터 구성을 위한 크라우드소싱 시스템에 관한 것이다.The present invention relates to a crowdsourcing system for constructing artificial intelligence data that collects raw data, extracts keywords through morphological analysis, extracts knowledge data through deep learning, and builds it into an ontology knowledge base.
인공지능(AI) 데이터를 구성하기 위해서 많은 비용과 시간이 소모된다. 특히, 언택트 사회에서 이러한 데이터를 만들기 위해서 이러한 비용과 시간이 배가 되고 있다.It consumes a lot of money and time to construct artificial intelligence (AI) data. In particular, in order to create such data in an untact society, these costs and time are multiplying.
따라서 이러한 사회 현상 속에서 AI 데이터는 장소와 시간에 관계없이 만들어야 비용을 절감할 수 있다.Therefore, in this social phenomenon, AI data can be created regardless of place and time to reduce costs.
본 발명의 목적은 상술한 바와 같은 문제점을 해결하기 위한 것으로, 원시 데이터를 수집하여 형태소 분석을 통해 키워드를 추출하고, 딥러닝을 통해 지식 데이터를 추출하여 온톨로지 지식 베이스로 구축하는, 인공지능 데이터 구성을 위한 크라우드소싱 시스템을 제공하는 것이다.An object of the present invention is to solve the above-mentioned problems, collect raw data, extract keywords through morphological analysis, extract knowledge data through deep learning, and build an ontology knowledge base, artificial intelligence data configuration It is to provide a crowdsourcing system for
상기 목적을 달성하기 위해 본 발명은 인공지능 데이터 구성을 위한 크라우드소싱 시스템에 관한 것으로서, 데이터를 전처리하는 데이터 전처리부; 전처리된 데이터에 대하여 형태소 분석을 하는 형태소 분석부; 입력 데이터에서 키워드를 추출하는 키워드 추출부; 입력 데이터에 대하여 딥러닝을 수행하는 딥러닝 처리부; 딥러닝 수행된 데이터를 온톨로지 지식 베이스로 구축하는 온톨로지 구성부; 및, 질의를 입력받고, 질의에 대한 검색을 수행하는 검색부를 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention relates to a crowdsourcing system for constructing artificial intelligence data, comprising: a data pre-processing unit for pre-processing data; a morpheme analysis unit that performs morphological analysis on the preprocessed data; a keyword extraction unit for extracting keywords from input data; a deep learning processing unit that performs deep learning on input data; an ontology configuration unit that builds the deep learning-performed data into an ontology knowledge base; and a search unit that receives a query and searches for the query.
또한, 본 발명은 인공지능 데이터 구성을 위한 크라우드소싱 시스템에 있어서, 상기 유사이미지 도출부는 인공지능 기반 패턴별 유사도 학습을 진행하여 유사 프레임을 도출하는 것을 특징으로 한다.In addition, in the crowdsourcing system for constructing artificial intelligence data, the present invention is characterized in that the similar image derivation unit derives a similar frame by performing similarity learning for each AI-based pattern.
상술한 바와 같이, 본 발명에 따른 인공지능 데이터 구성을 위한 크라우드소싱 시스템에 의하면, 인공지능(AI) 데이터가 장소와 시간에 관계없이 생성될 수 있으므로, 인공지능을 위한 데이터 구축에 소요되는 비용을 절감할 수 있는 효과가 얻어진다.As described above, according to the crowdsourcing system for constructing artificial intelligence data according to the present invention, since artificial intelligence (AI) data can be generated regardless of place and time, the cost of constructing data for artificial intelligence can be reduced. savings can be obtained.
도 1은 비정형 데이터기반의 AI데이터 전체 프로세스를 나타낸 도면.
도 2는 Hwp, PDF 파싱 프로세스(데이터 수집기)를 나타낸 도면.
도 3은 전처리 프로세스를 나타낸 도면.
도 4는 데이터 전처리 모니터링를 나타낸 도면.
도 5는 형태소 분석 프로세스를 나타낸 도면.
도 6은 형태소 분석 모니터링을 나타낸 도면.
도 7 및 도 8은 키워드 추출 프로세스를 나타낸 도면.
도 9는 딥러닝 프로세스를 나타낸 도면.
도 10은 딥러닝 모니터링을 나타낸 도면.1 is a diagram showing the entire process of AI data based on unstructured data.
Figure 2 is a diagram showing the Hwp, PDF parsing process (data collector).
3 is a diagram showing a pre-processing process.
4 is a diagram showing data pre-processing monitoring.
5 is a diagram illustrating a morphological analysis process.
6 is a diagram showing monitoring of morphological analysis.
7 and 8 are diagrams showing a keyword extraction process.
9 is a diagram illustrating a deep learning process.
10 is a diagram showing deep learning monitoring.
이하, 본 발명의 실시를 위한 구체적인 내용을 도면에 따라서 설명한다.Hereinafter, specific contents for carrying out the present invention will be described with reference to the drawings.
또한, 본 발명을 설명하는데 있어서 동일 부분은 동일 부호를 붙이고, 그 반복 설명은 생략한다.In addition, in demonstrating this invention, the same part is attached|subjected by the same code|symbol, and the repetition description is abbreviate|omitted.
먼저, 본 발명의 일실시예에 따른 전체 프로세스를 설명한다.First, the entire process according to an embodiment of the present invention will be described.
○ AI데이터를 구성하기 위해선 많은 비용과 시간이 소모되며 언택트 사회에서 이러한 데이터를 만들기 위해선 이러한 비용과 시간이 배가 되고 있음○ It takes a lot of money and time to construct AI data, and to create such data in an untact society, these costs and time are multiplying.
○ 이러한 사회 현상 속에서 AI데이터는 장소와 시간에 관계없이 만들어야 비용을 절감할 수 있음○ In this social phenomenon, AI data can be created regardless of place and time to reduce costs.
○ 본 방법은 AI 비정형데이터 구성을 위한 프로세스이며, [그림 23]는 이러한 전체 프로세스를 도식화 한 결과임○ This method is a process for constructing AI unstructured data, and [Figure 23] is the result of schematizing the entire process.
○ 데이터 수집기는 데이터를 기계가 사용자가 원하는 도메인의 데이터들을 자동으로 수집하는 프로세스를 뜻함○ Data collector refers to the process in which the machine automatically collects the data of the domain that the user wants.
○ 데이터 전처리란, 데이터 분석 이전 분석에 사용되는 데이터를 정제하는 과정을 뜻함○ Data preprocessing refers to the process of refining data used for analysis before data analysis.
○ 전처리 과정을 통해 자연어의 오류를 바로잡거나 전처리 다음 프로세스에 어울리는 형태로 처리○ Correct errors in natural language through pre-processing, or process them in a form suitable for the process following pre-processing
○ 데이터 수집기에서 수집한 자료 중 정형화되지 않은 데이터(Hwp, Pdf 등)과 같은 데이터들은 각 관리자가 설계한 메타구조에 맞춰 자동으로 데이터베이스화되며, 전처리 과정에서 이러한 자연어들에서 띄어쓰기 오류, 특수태그 처리, 파싱에러 수정 등을 처리하여 기존 자연어에 의미에 맞도록 기계가 자동으로 수정을 진행함○ Among the data collected by the data collector, data such as unstructured data (Hwp, Pdf, etc.) are automatically databased according to the meta structure designed by each manager, and spaces errors and special tags are processed in these natural languages during pre-processing. , parsing error correction, etc., so that the machine automatically corrects it to match the meaning of the existing natural language
○ 데이터 전처리 모니터링이란, 데이터 전처리 전/후의 결과를 관리자가 보고 기계에서 잘못된 처리 부분들에 대해 이슈를 제기하여 향후 전처리 시 이것들이 패턴화되어 해당 프로그램에 대한 신뢰성 향상을 시켜줌○ Data pre-processing monitoring means that the manager reports the results before and after data pre-processing and raises issues about the wrong processing parts in the machine. These patterns are patterned during future pre-processing to improve the reliability of the program.
○ 형태소 분석이란, 자연어로 이루어진 텍스트를 작은 의미단위(형태소)로 나누어 태그를 부여하는 과정을 뜻함○ Morphological analysis refers to the process of assigning tags by dividing texts made of natural language into small semantic units (morphemes).
○ 형태소 분석 과정을 통해 명사, 동사, 형용사 등으로 분류하고 키워드를 추출하는 등의 기능을 수행함○ It performs functions such as classifying into nouns, verbs, and adjectives and extracting keywords through the morpheme analysis process.
○ 형태소 분석 프로세스는 분류된 명사 단어를 추출하여 의미 있는 키워드 후보군을 작성할 수 있음○ The morpheme analysis process can create meaningful keyword candidates by extracting classified noun words
○ 형태소 분석 모니터링이란, 기존 자연어 형태의 문장들과 형태소 분석 결과를 관리자가 보고 기계에서 잘못 분해된 부분들에 대해 이슈를 제기하여 형태소 분석 패턴 수정을 할 수 있는 체계이며 이러한 패턴 수정/추가는 향후 형태소 분석 프로세스에 정확성 증가를 시켜줌○ Morphological analysis monitoring is a system in which the administrator can view the sentences and morpheme analysis results in the existing natural language form and raise issues about the parts that are erroneously decomposed in the machine to correct the morpheme analysis pattern. Increases accuracy in the morphological analysis process
○ 키워드 추출이란, 각 문서별 대표되는 키워드를 자동으로 추출하여 키워드에 대한 주요성 가중치를 추출함○ Keyword extraction is to automatically extract the representative keywords for each document and extract the weight of the key words.
○ 키워드 추출 모니터링이란, 문서 원문과 해당 문서에서 추출된 키워드를 관리자가 보고 적합/부적합/핵심용어 등에 대한 키워드 혹은 형태소에서 단어 분해가 잘못된 것들에 대한 이슈제기 가능○ Keyword extraction monitoring means that the administrator can view the original text of the document and keywords extracted from the document, and raise issues with keywords or morphemes related to appropriate/inappropriate/key terms.
○ 문서 유사도(딥러닝) 추출이란, 형태소 분석된 자연어를 기반으로 앞 옆에 위치한 단어들을 기반으로 단어를 벡터화하여 벡터화된 단어들을 기반으로 코사인 유사도를 이용하여 문서들 간 유사도 자동 추출○ Document similarity (deep learning) extraction is automatic extraction of similarity between documents using cosine similarity based on vectorized words based on vectorized words based on morphologically analyzed natural language.
○ 문서 유사도(딥러닝) 모니터링이란, 딥러닝 결과 상이한 도메인의 문서 간 유사도로 추출된 결과에 대해 관리자가 검토하여 해당 문서 간 연관도에 대한 적합/부적합 여부를 선택할 수 있으며 이는 실제 서비스 반영되어 연관 문서 추론에 사용됨○ Document similarity (deep learning) monitoring means that the administrator can review the results extracted as similarities between documents in different domains as a result of deep learning and select whether the relevant documents are suitable or not suitable for the degree of relevance between the documents, which is reflected in the actual service. Used for document inference
이상, 본 발명자에 의해서 이루어진 발명을 실시 예에 따라 구체적으로 설명하였지만, 본 발명은 실시 예에 한정되는 것은 아니고, 그 요지를 이탈하지 않는 범위에서 여러 가지로 변경 가능한 것은 물론이다.As mentioned above, although the invention made by the present inventors has been described in detail according to the embodiments, the present invention is not limited to the embodiments, and various changes can be made without departing from the gist of the present invention.
10 : 사용자 단말 11 : 클라이언트
30 : 크라우드소싱 서버
40 : 데이터베이스
80 : 네트워크10: user terminal 11: client
30: Crowdsourcing Server
40: database
80: network
Claims (2)
데이터를 전처리하는 데이터 전처리부;
전처리된 데이터에 대하여 형태소 분석을 하는 형태소 분석부;
입력 데이터에서 키워드를 추출하는 키워드 추출부;
입력 데이터에 대하여 딥러닝을 수행하는 딥러닝 처리부;
딥러닝 수행된 데이터를 온톨로지 지식 베이스로 구축하는 온톨로지 구성부; 및,
질의를 입력받고, 질의에 대한 검색을 수행하는 검색부를 포함하는 것을 특징으로 하는 인공지능 데이터 구성을 위한 크라우드소싱 시스템.
In the crowdsourcing system for constructing artificial intelligence data,
a data pre-processing unit for pre-processing data;
a morpheme analysis unit that performs morphological analysis on the preprocessed data;
a keyword extraction unit for extracting keywords from input data;
a deep learning processing unit that performs deep learning on input data;
an ontology configuration unit that builds the deep learning-performed data into an ontology knowledge base; and,
A crowdsourcing system for constructing artificial intelligence data, comprising a search unit for receiving a query and performing a search for the query.
상기 전처리부는 수집한 자료 중 정형화되지 않은 데이터들을 사전에 설계된 메타 구조에 맞춰 자동으로 데이터베이스화 하는 것을 특징으로 하는 인공지능 데이터 구성을 위한 크라우드소싱 시스템.
According to claim 1,
Crowdsourcing system for constructing artificial intelligence data, characterized in that the preprocessor automatically converts unstructured data among the collected data into a database according to a pre-designed meta structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020200140357A KR20220056288A (en) | 2020-10-27 | 2020-10-27 | A crowdsourcing system for artificial intelligence data composition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020200140357A KR20220056288A (en) | 2020-10-27 | 2020-10-27 | A crowdsourcing system for artificial intelligence data composition |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20220056288A true KR20220056288A (en) | 2022-05-06 |
Family
ID=81584450
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020200140357A KR20220056288A (en) | 2020-10-27 | 2020-10-27 | A crowdsourcing system for artificial intelligence data composition |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20220056288A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20190103951A (en) | 2019-02-14 | 2019-09-05 | 주식회사 머니브레인 | Method, computer device and computer readable recording medium for building or updating knowledgebase models for interactive ai agent systen, by labeling identifiable but not-learnable data in training data set |
KR102091240B1 (en) | 2016-11-23 | 2020-03-20 | 한국전자통신연구원 | Data processing apparatus and method for merging deterministic and non-deterministic knowledge information processing |
KR20200033707A (en) | 2018-09-20 | 2020-03-30 | 삼성전자주식회사 | Electronic device, and Method of providing or obtaining data for training thereof |
-
2020
- 2020-10-27 KR KR1020200140357A patent/KR20220056288A/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102091240B1 (en) | 2016-11-23 | 2020-03-20 | 한국전자통신연구원 | Data processing apparatus and method for merging deterministic and non-deterministic knowledge information processing |
KR20200033707A (en) | 2018-09-20 | 2020-03-30 | 삼성전자주식회사 | Electronic device, and Method of providing or obtaining data for training thereof |
KR20190103951A (en) | 2019-02-14 | 2019-09-05 | 주식회사 머니브레인 | Method, computer device and computer readable recording medium for building or updating knowledgebase models for interactive ai agent systen, by labeling identifiable but not-learnable data in training data set |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111723215B (en) | Device and method for establishing biotechnological information knowledge graph based on text mining | |
Amjad et al. | “Bend the truth”: Benchmark dataset for fake news detection in Urdu language and its evaluation | |
US10339453B2 (en) | Automatically generating test/training questions and answers through pattern based analysis and natural language processing techniques on the given corpus for quick domain adaptation | |
US8027948B2 (en) | Method and system for generating an ontology | |
US11210468B2 (en) | System and method for comparing plurality of documents | |
CN109635297B (en) | Entity disambiguation method and device, computer device and computer storage medium | |
CN106570171A (en) | Semantics-based sci-tech information processing method and system | |
Benabdallah et al. | Extraction of terms and semantic relationships from Arabic texts for automatic construction of an ontology | |
CN110096599B (en) | Knowledge graph generation method and device | |
CN102214189A (en) | Data mining-based word usage knowledge acquisition system and method | |
US20220358379A1 (en) | System, apparatus and method of managing knowledge generated from technical data | |
CN111966792B (en) | Text processing method and device, electronic equipment and readable storage medium | |
CN114266256A (en) | Method and system for extracting new words in field | |
Tahir et al. | Corpulyzer: A novel framework for building low resource language corpora | |
CN106372232B (en) | Information mining method and device based on artificial intelligence | |
CN107577713A (en) | Text handling method based on electric power dictionary | |
KR20200136636A (en) | Morphology-Based AI Chatbot and Method How to determine the degree of sentence | |
CN112560425A (en) | Template generation method and device, electronic equipment and storage medium | |
WO2012091541A1 (en) | A semantic web constructor system and a method thereof | |
CN110309258B (en) | Input checking method, server and computer readable storage medium | |
KR20220056288A (en) | A crowdsourcing system for artificial intelligence data composition | |
CN115759037A (en) | Intelligent auditing frame and auditing method for building construction scheme | |
CN109597879B (en) | Service behavior relation extraction method and device based on 'citation relation' data | |
Mahajani et al. | Ranking-based sentence retrieval for text summarization | |
Zhang et al. | Topic level disambiguation for weak queries |