KR20200083404A

KR20200083404A - Method, system and computer program for artificial intelligence answer

Info

Publication number: KR20200083404A
Application number: KR1020200076781A
Authority: KR
Inventors: 김동환
Original assignee: 주식회사 포티투마루
Priority date: 2018-09-19
Filing date: 2020-06-23
Publication date: 2020-07-08
Also published as: KR102261199B1

Abstract

According to an embodiment of the present invention, provided is a system for artificial intelligence query and answer which comprises: a user query receiving unit for receiving a user query from a user terminal; a first query expansion unit for analyzing a user query to generate a question template and determining whether the user query matches the generated question template; a second query expansion unit for generating a similar question template by using natural language processing and deep learning models when the user query and the generated question template do not match; a training data construction unit for generating training data for training the second query expansion unit by using a neural machine translation (NMT) engine; and a query response unit for transmitting a user query result derived through the first or second query expansion unit to the user terminal. Accordingly, a natural language-based sentence can be accurately understood and a search result suitable for the intent can be provided.

Description

Artificial Intelligence Question and Answer System, Method and Computer Program {METHOD, SYSTEM AND COMPUTER PROGRAM FOR ARTIFICIAL INTELLIGENCE ANSWER}

본 발명은 인공 지능 질의 응답 시스템, 방법 및 컴퓨터 프로그램에 관한 것으로, 보다 상세하게는 자연어 기반의 문장을 정확하게 이해하고 의도에 맞는 검색 결과를 제공하기 위하여 NMT 엔진을 이용하여 학습 데이터를 구축하고 패러프레이징 엔진을 학습하는 인공 지능 질의 응답 시스템, 방법 및 컴퓨터 프로그램에 관한 것이다.The present invention relates to an artificial intelligence query response system, method, and computer program, and more specifically, constructs and paraphrases learning data using an NMT engine to accurately understand natural language-based sentences and provide search results suitable for intentions. Artificial intelligence question and answer system, method and computer program for learning engine.

자연어로 표현되는 데이터를 다루는 다양한 응용 서비스를 구현하기 위해서는 언어학적 지식, 언어별 구조적 지식 및 언어의 복잡한 자질을 이해하고 엔지니어링 하는 과정이 필요하기 때문에, 새로운 언어나 도메인 추가 등의 작업을 하는데 진입장벽이 존재한다.In order to implement various application services that deal with data expressed in natural language, it is necessary to understand and engineer linguistic knowledge, language-specific structural knowledge, and complex qualities of the language, so it is an entry barrier to work such as adding a new language or domain. This exists.

특히, 전통적인 NLU(Natural Language Understanding) 방식은 사람이 직접 추출 한(hand-crafted) 특징에 강하게 의존하는 특성이 있다. 이 때문에, 특징 추출에 시간이 많이 소요되고, 새로운 패턴이나 오타, 맞춤법 오류 등 여러 다양한 경우에 대처하지 못하는 한계가 존재한다.In particular, the traditional NLU (Natural Language Understanding) method has a characteristic that strongly relies on the characteristics hand-crafted. For this reason, it takes a lot of time to extract features, and there are limitations that cannot cope with various cases such as new patterns, typos, and spelling errors.

이런 문제를 해결하기 위해, 딥러닝 기반의 NLU 처리 방식이 최근 고려되고 있다. 딥러닝 기반의 NLU 방식은 데이터로부터 자질(Feature)을 자동으로 학습하는 방식으로, 기존보다 폭넓은 문맥 정보 처리가 가능한 장점을 가지고 있다. 이를 통해, 기존에 학습하지 않았던 신조어나 오타에도 전통적인 규칙/통계 기반의 NLU와 비교하여 강건(robust)하여, 기존의 전통적인 NLU의 단점을 일정부분 보완할 수 있다.To solve this problem, a deep learning-based NLU processing method has been recently considered. The deep learning-based NLU method is a method of automatically learning features from data, and has the advantage of being able to process a wider range of context information than before. Through this, even in a new word or typo that has not been previously learned, it can be robust compared to a traditional rule/statistical based NLU, thereby partially compensating for the disadvantages of the existing traditional NLU.

한편, 음성인식 스피커를 필두로 한 스마트 머신 보급 확대, 인공지능 기술의 발전에 따라 정보 검색 방식이 기존 키워드 입력 기반, 문서 리스트를 확인했던 기존의 검색 방법에서 자연어 기반의 문장 입력, 구체적인 응답 형태로 검색의 트렌드가 변화하고 있다.On the other hand, with the expansion of the spread of smart machines with voice-recognition speakers, and the development of artificial intelligence technology, the information retrieval method is based on the existing keyword input, the existing search method that checks the document list, the natural language-based sentence input, and the specific response form Search trends are changing.

KR 10-1851787 B1KR 10-1851787 B1

본 발명은 자연어 기반의 문장을 정확하게 이해하고 의도에 맞는 검색 결과를 제공하는 것을 일 목적으로 한다.An object of the present invention is to accurately understand a natural language-based sentence and provide a search result suitable for an intention.

본 발명은 NMT 엔진을 이용하여 학습 데이터를 구축하고 패러프레이징 엔진을 학습하여 검색의 정확도를 높이는 것을 다른 목적으로 한다.Another object of the present invention is to build learning data using an NMT engine and to improve the accuracy of a search by learning a paraphrase engine.

본 발명의 일 실시예에 따르면 사용자 단말로부터 사용자 질의를 수신하는 사용자 질의 수신부; 사용자 질의를 분석하여 질문 템플릿을 생성하고, 상기 사용자 질의와 상기 생성된 질문 템플릿이 일치하는지 여부를 판단하는 제1 질의 확장부; 상기 사용자 질의와 상기 생성된 질문 템플릿이 일치하지 않는 경우, 자연어 처리 및 딥러닝 모델을 사용하여 유사 질문 템플릿을 생성하는 제2 질의 확장부; 상기 제2 질의 확장부를 학습(training)시키기 위한 학습 데이터를 NMT(Neural Machine Translation) 엔진을 이용하여 생성하는 학습 데이터 구축부; 및 상기 제1 질의 확장부 또는 상기 제2 질의 확장부를 통해 도출된 사용자 질의 결과를 상기 사용자 단말로 전달하는 질의 응답부; 를 포함하는 인공 지능 질의 응답 시스템이 제공된다.According to an embodiment of the present invention, a user query receiver for receiving a user query from a user terminal; A first query expansion unit that analyzes a user query to generate a question template, and determines whether the user query and the generated question template match; A second query expansion unit that generates a similar question template using natural language processing and a deep learning model when the user query and the generated question template do not match; A training data construction unit that generates training data for training the second query expansion unit using a Neural Machine Translation (NMT) engine; And a query response unit that delivers a user query result derived through the first query expansion unit or the second query expansion unit to the user terminal. An artificial intelligence query response system is provided.

본 발명에 있어서, 상기 질문 템플릿 및 상기 유사 질문 템플릿은 엔티티(Entity), 어트리뷰트(attribute) 및 즉답(instant answer) 로 이루어진 시맨틱 트리플(semantic triple) 기반의 질문 템플릿일 수 있다.In the present invention, the question template and the similar question template may be a semantic triple-based question template consisting of an entity, an attribute, and an instant answer.

본 발명에 있어서, 상기 학습 데이터 구축부는, 상기 NMT 엔진을 사용하여 한국어인 제1 문장을 특정 외국어로 번역하고, 특정 외국어로 번역한 제1 문장을 다시 한국어로 번역하여 제2 문장을 획득하며, 생성된 제2 문장을 학습 데이터로 구축할 수 있다.In the present invention, the learning data construction unit uses the NMT engine to translate a first sentence in Korean into a specific foreign language, and translates the first sentence translated into a specific foreign language back into Korean to obtain a second sentence, The generated second sentence can be constructed as learning data.

본 발명에 있어서, 상기 제2 질의 확장부는, 상기 사용자 질의를 자연어 처리하는 자연어 확장 모듈; 및In the present invention, the second query expansion unit, a natural language expansion module for processing the user query in natural language; And

상기 자연어 처리된 사용자 질의를 패러프레이징(paraphrasing)을 통해 유사 질문 템플릿을 생성하는 패러프레이징 엔진; 을 포함할 수 있다.A paraphrase engine that generates a similar question template through paraphrasing the natural language processed user query; It may include.

본 발명에 있어서, 상기 사용자 질의와 상기 생성된 질문 템플릿이 일치하지 않는 경우, 상기 생성된 질문 템플릿에 대응하는 즉답을 상기 사용자 단말에 제공할 수 있다.In the present invention, when the user query and the generated question template do not match, an immediate answer corresponding to the generated question template may be provided to the user terminal.

본 발명의 일 실시예에 따르면, 사용자 단말로부터 사용자 질의를 수신하는 사용자 질의 수신 단계; 사용자 질의를 분석하여 질문 템플릿을 생성하고, 상기 사용자 질의와 상기 생성된 질문 템플릿이 일치하는지 여부를 판단하는 제1 질의 확장 단계; 상기 사용자 질의와 상기 생성된 질문 템플릿이 일치하지 않는 경우, 자연어 처리 및 딥러닝 모델을 사용하여 유사 질문 템플릿을 생성하는 제2 질의 확장 단계; 상기 제2 질의 확장부를 학습(training)시키기 위한 학습 데이터를 NMT(Neural Machine Translation) 엔진을 이용하여 생성하는 학습 데이터 구축 단계; 및 상기 제1 질의 확장 단계 또는 상기 제2 질의 확장 단계를 통해 도출된 사용자 질의 결과를 상기 사용자 단말로 전달하는 질의 응답 단계; 를 포함하는 인공 지능 질의 응답 방법이 제공된다.According to an embodiment of the present invention, a user query receiving step of receiving a user query from a user terminal; A first query expansion step of analyzing a user query to generate a question template and determining whether the user query and the generated question template match; A second query expansion step of generating a similar question template using natural language processing and a deep learning model when the user query and the generated question template do not match; A learning data construction step of generating learning data for training the second query expansion unit using a Neural Machine Translation (NMT) engine; And a query response step of transmitting a user query result derived through the first query expansion step or the second query expansion step to the user terminal. An artificial intelligence query answering method is provided.

본 발명에 있어서, 상기 학습 데이터 구축 단계는, 상기 NMT 엔진을 사용하여 한국어인 제1 문장을 특정 외국어로 번역하고, 특정 외국어로 번역한 제1 문장을 다시 한국어로 번역하여 제2 문장을 획득하며, 생성된 제2 문장을 학습 데이터로 구축할 수 있다.In the present invention, in the step of constructing the learning data, the first sentence in Korean is translated into a specific foreign language using the NMT engine, and the first sentence translated into a specific foreign language is translated into Korean again to obtain a second sentence. , The generated second sentence can be constructed as learning data.

본 발명에 있어서, 상기 제2 질의 확장 단계는, 상기 사용자 질의를 자연어 처리하는 자연어 확장 모듈; 및 상기 자연어 처리된 사용자 질의를 패러프레이징(paraphrasing)을 통해 유사 질문 템플릿을 생성하는 패러프레이징 엔진; 을 포함할 수 있다.In the present invention, the second query expansion step includes: a natural language expansion module for processing the user query in natural language; And a paraphrase engine that generates a similar question template through paraphrasing the natural language-processed user query. It may include.

본 발명의 일 실시예에 따르면 사용자 단말로부터 사용자 질의를 수신하는 사용자 질의 수신부; 패러프레이징 엔진을 사용하여 상기 사용자 질의의 유사 질문 템플릿을 생성하는 질의 확장부; 상기 질의 확장부를 학습(training)시키기 위한 학습 데이터를 NMT(Neural Machine Translation) 엔진을 이용하여 생성하는 학습 데이터 구축부;를 포함하고, 상기 학습 데이터 구축부는, 상기 NMT 엔진을 사용하여 한국어인 제1 문장을 특정 외국어로 번역하고, 특정 외국어로 번역한 제1 문장을 다시 한국어로 번역하여 제2 문장을 획득하며, 생성된 제2 문장을 학습 데이터로 구축하는, 인공 지능 질의 응답 시스템이 제공된다.According to an embodiment of the present invention, a user query receiver for receiving a user query from a user terminal; A query expansion unit generating a similar question template of the user query using a paraphrase engine; Includes; a learning data construction unit for generating training data for training the query expansion unit using a NMT (Neural Machine Translation) engine, wherein the learning data construction unit is the first Korean language using the NMT engine An artificial intelligence query response system is provided that translates a sentence into a specific foreign language, translates a first sentence translated into a specific foreign language back into Korean, obtains a second sentence, and builds the generated second sentence into learning data.

본 발명의 일 실시예에 따르면, 사용자 단말로부터 사용자 질의를 수신하는 사용자 질의 수신부; 패러프레이징 엔진을 사용하여 상기 사용자 질의의 유사 질문 템플릿을 생성하는 질의 확장부; 상기 질의 확장부를 학습(training)시키기 위한 학습 데이터를 NMT(Neural Machine Translation) 엔진을 이용하여 생성하는 학습 데이터 구축부;를 포함하고, 상기 학습 데이터 구축부는, 상기 NMT 엔진을 사용하여 한국어인 제1 문장을 특정 외국어로 번역하고, 특정 외국어로 번역한 제1 문장을 다시 한국어로 번역하여 제2 문장을 획득하며, 생성된 제2 문장을 학습 데이터로 구축하는, 인공 지능 질의 응답 시스템이 제공된다.According to an embodiment of the present invention, a user query receiver for receiving a user query from a user terminal; A query expansion unit generating a similar question template of the user query using a paraphrase engine; Includes; a learning data construction unit for generating training data for training the query expansion unit using a NMT (Neural Machine Translation) engine, wherein the learning data construction unit is the first Korean language using the NMT engine An artificial intelligence query response system is provided that translates a sentence into a specific foreign language, translates a first sentence translated into a specific foreign language back into Korean, obtains a second sentence, and builds the generated second sentence into learning data.

본 발명에 있어서, 상기 유사 질문 템플릿은 엔티티(Entity), 어트리뷰트(attribute) 및 즉답(instant answer) 로 이루어진 시맨틱 트리플(semantic triple) 기반의 질문 템플릿일 수 있다. In the present invention, the similar question template may be a semantic triple-based question template consisting of an entity, an attribute, and an instant answer.

본 발명의 일 실시예에 따르면 사용자 단말로부터 사용자 질의를 수신하는 사용자 질의 수신 단계; 패러프레이징 엔진을 사용하여 상기 사용자 질의의 유사 질문 템플릿을 생성하는 질의 확장 단계; 상기 질의 확장부를 학습(training)시키기 위한 학습 데이터를 NMT(Neural Machine Translation) 엔진을 이용하여 생성하는 학습 데이터 구축 단계;를 포함하고, 상기 학습 데이터 구축 단계는, 상기 NMT 엔진을 사용하여 한국어인 제1 문장을 특정 외국어로 번역하고, 특정 외국어로 번역한 제1 문장을 다시 한국어로 번역하여 제2 문장을 획득하며, 생성된 제2 문장을 학습 데이터로 구축하는, 인공 지능 질의 응답 방법일 수 있다.According to an embodiment of the present invention, a user query receiving step of receiving a user query from a user terminal; A query expansion step of generating a similar question template of the user query using a paraphrase engine; And a learning data construction step of generating training data for training the query expansion unit using a NMT (Neural Machine Translation) engine, wherein the training data construction step comprises: It may be an artificial intelligence query answering method in which one sentence is translated into a specific foreign language, the first sentence translated into a particular foreign language is translated into Korean again to obtain a second sentence, and the generated second sentence is constructed as learning data. .

본 발명에 있어서, 상기 유사 질문 템플릿은 엔티티(Entity), 어트리뷰트(attribute) 및 즉답(instant answer) 로 이루어진 시맨틱 트리플(semantic triple) 기반의 질문 템플릿일 수 있다.In the present invention, the similar question template may be a semantic triple-based question template consisting of an entity, an attribute, and an instant answer.

본 발명의 일 실시예에 따르면 사용자 단말로부터 사용자 질의를 수신하는 사용자 질의 수신부; 패러프레이징 엔진을 사용하여 상기 사용자 질의의 유사 질문 템플릿을 생성하는 질의 확장부; 상기 질의 확장부를 학습(training)시키기 위한 학습 데이터를 NMT(Neural Machine Translation) 엔진을 이용하여 생성하는 학습 데이터 구축부;를 포함하고, 상기 학습 데이터 구축부는, 상기 신경망 기반의 NMT 엔진으로 사용자 로그 데이터를 번역 및 재번역하여 학습 데이터를 생성하는 NMT 엔진 관리부; 및 상기 NMT 엔진 관리부에 의해 생성된 학습 데이터를 저장하고, 상기 생성된 학습 데이터를 이용하여 상기 패러프레이징 엔진에 적용될 수 있는 패러프레이징 모델의 학습을 진행하고, 상기 패러프레이징 모델을 테스트 및 검증하는 학습 데이터 관리부; 를 포함할 수 있다.According to an embodiment of the present invention, a user query receiver for receiving a user query from a user terminal; A query expansion unit generating a similar question template of the user query using a paraphrase engine; Includes; learning data construction unit for generating training data for training the query expansion unit using a NMT (Neural Machine Translation) engine; the learning data construction unit, the user log data to the neural network based NMT engine NMT engine management unit for translation and re-translation to generate learning data; And learning to store the training data generated by the NMT engine management unit, to train a paraphrase model that can be applied to the paraphrase engine using the generated training data, and to test and verify the paraphrase model. Data management department; It may include.

본 발명에 의하면, 자연어 기반의 문장을 정확하게 이해하고 의도에 맞는 검색 결과를 제공할 수 있다.According to the present invention, it is possible to accurately understand a sentence based on a natural language and provide search results suited to the intention.

본 발명에 의하면, 기존의 NMT 엔진을 이용하여 자동적으로 다량의 정확한 학습 데이터를 구축할 수 있다.According to the present invention, it is possible to automatically construct a large amount of accurate learning data using an existing NMT engine.

도 1 은 본 발명의 일 실시예에 따른 네트워크 환경의 예를 도시한 도면이다.
도 2 는 본 발명의 일 실시예에 있어서, 사용자 단말 및 서버의 내부 구성을 설명하기 위한 블록도이다.
도 3 은 시맨틱 트리플 기반의 검색 결과를 설명하기 위한 것이다.
도 4 는 시맨틱 트리플 기반의 검색 수행의 일 예를 도시한다.
도 5는 본 발명의 일 실시예에 따른 프로세서의 내부 구성을 나타낸 것이다.
도 6 는 본 발명의 일 실시예에 따른 인공지능 질의 응답 방법을 시계열적으로 나타낸 도면이다.
도 7 은 본 발명의 일 실시예에 따른 인공 지능 질의 응답 시스템의 전체적인 구조를 나타낸 도면이다.
도 8 은 본 발명의 일 실시예에 따라 학습 데이터 및 패러프레이징 모델을 구축하는 것을 설명하기 위한 도면이다.1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.
2 is a block diagram illustrating an internal configuration of a user terminal and a server in an embodiment of the present invention.
3 is for explaining the semantic triple-based search results.
4 shows an example of performing semantic triple-based search.
5 illustrates an internal configuration of a processor according to an embodiment of the present invention.
6 is a time series diagram of an artificial intelligence query response method according to an embodiment of the present invention.
7 is a view showing the overall structure of an artificial intelligence query response system according to an embodiment of the present invention.
8 is a diagram for explaining building a training data and a paraphrase model according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이러한 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 본 명세서에 기재되어 있는 특정 형상, 구조 및 특성은 본 발명의 정신과 범위를 벗어나지 않으면서 일 실시예로부터 다른 실시예로 변경되어 구현될 수 있다. 또한, 각각의 실시예 내의 개별 구성요소의 위치 또는 배치도 본 발명의 정신과 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 행하여지는 것이 아니며, 본 발명의 범위는 특허청구범위의 청구항들이 청구하는 범위 및 그와 균등한 모든 범위를 포괄하는 것으로 받아들여져야 한다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 구성요소를 나타낸다.For a detailed description of the present invention, which will be described later, reference is made to the accompanying drawings that illustrate, by way of example, specific embodiments in which the present invention may be practiced. These embodiments are described in detail enough to enable those skilled in the art to practice the present invention. It should be understood that the various embodiments of the invention are different, but need not be mutually exclusive. For example, specific shapes, structures, and characteristics described in this specification may be implemented by changing from one embodiment to another without departing from the spirit and scope of the present invention. In addition, it should be understood that the position or arrangement of individual components within each embodiment may be changed without departing from the spirit and scope of the present invention. Therefore, the detailed description to be described later is not intended to be taken in a limiting sense, and the scope of the present invention should be taken to cover the scope claimed by the claims of the claims and all equivalents thereto. In the drawings, similar reference numerals denote the same or similar components throughout several aspects.

도 1 은 본 발명의 일 실시예에 따른 네트워크 환경의 예를 도시한 도면이다.1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.

도 1의 네트워크 환경은 복수의 사용자 단말들(110, 120, 130, 140), 서버(150) 및 네트워크(160)를 포함하는 예를 나타내고 있다. 이러한 도 1은 발명의 설명을 위한 일례로 사용자 단말의 수나 서버의 수가 도 1과 같이 한정되는 것은 아니다. The network environment of FIG. 1 shows an example including a plurality of user terminals 110, 120, 130, 140, a server 150 and a network 160. 1 is not limited to the number of user terminals or the number of servers as shown in FIG. 1 as an example for explaining the present invention.

복수의 사용자 단말들(110, 120, 130, 140)은 컴퓨터 장치로 구현되는 고정형 단말이거나 이동형 단말일 수 있다. 복수의 사용자 단말들(110, 120, 130, 140)의 예를 들면, 스마트폰(smart phone), 휴대폰, 네비게이션, 컴퓨터, 노트북, 디지털방송용 단말, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 태블릿 PC 등이 있다. 일례로 사용자 단말 1(110)은 무선 또는 유선 통신 방식을 이용하여 네트워크(160)를 통해 다른 사용자 단말들(120, 130, 140) 및/또는 서버(150)와 통신할 수 있다.The plurality of user terminals 110, 120, 130, and 140 may be fixed terminals or mobile terminals implemented as computer devices. For example, a plurality of user terminals (110, 120, 130, 140), a smart phone (smart phone), mobile phones, navigation, computers, notebooks, digital broadcasting terminals, PDA (Personal Digital Assistants), PMP (Portable Multimedia Player) ), tablet PC, etc. For example, the user terminal 1 110 may communicate with other user terminals 120, 130, 140 and/or the server 150 through the network 160 using a wireless or wired communication method.

통신 방식은 제한되지 않으며, 네트워크(160)가 포함할 수 있는 통신망(일례로, 이동통신망, 유선 인터넷, 무선 인터넷, 방송망)을 활용하는 통신 방식뿐만 아니라 기기들간의 근거리 무선 통신 역시 포함될 수 있다. 예를 들어, 네트워크(160)는, PAN(personal area network), LAN(local area network), CAN(campus area network), MAN(metropolitan area network), WAN(wide area network), BBN(broadband network), 인터넷 등의 네트워크 중 하나 이상의 임의의 네트워크를 포함할 수 있다. 또한, 네트워크(160)는 버스 네트워크, 스타 네트워크, 링 네트워크, 메쉬 네트워크, 스타-버스 네트워크, 트리 또는 계층적(hierarchical) 네트워크 등을 포함하는 네트워크 토폴로지 중 임의의 하나 이상을 포함할 수 있으나, 이에 제한되지 않는다.The communication method is not limited, and a communication method using a communication network (for example, a mobile communication network, a wired Internet, a wireless Internet, and a broadcast network) that the network 160 may include may include short-range wireless communication between devices. For example, the network 160 includes a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). , Any one or more of the networks such as the Internet. In addition, the network 160 may include any one or more of a network topology including a bus network, a star network, a ring network, a mesh network, a star-bus network, a tree or a hierarchical network, etc. It is not limited.

서버(150)는 복수의 사용자 단말들(110, 120, 130, 140)과 네트워크(160)를 통해 통신하여 명령, 코드, 파일, 컨텐츠, 서비스 등을 제공하는 컴퓨터 장치 또는 복수의 컴퓨터 장치들로 구현될 수 있다.The server 150 is a computer device or a plurality of computer devices that communicate with a plurality of user terminals 110, 120, 130, 140 through a network 160 to provide commands, codes, files, contents, services, and the like. Can be implemented.

일례로, 서버(150)는 네트워크(160)를 통해 접속한 사용자 단말 1(110)로 어플리케이션의 설치를 위한 파일을 제공할 수 있다. 이 경우 사용자 단말 1(110)은 서버(150)로부터 제공된 파일을 이용하여 어플리케이션을 설치할 수 있다. 또한 사용자 단말 1(110)이 포함하는 운영체제(Operating System, OS) 및 적어도 하나의 프로그램(일례로 브라우저나 설치된 어플리케이션)의 제어에 따라 서버(150)에 접속하여 서버(150)가 제공하는 서비스나 컨텐츠를 제공받을 수 있다. 예를 들어, 사용자 단말1(110)이 어플리케이션의 제어에 따라 네트워크(160)를 통해 컨텐츠 열람을 서버(150)로 전송하면, 서버(150)는 시맨틱 트리플 기반의 지식 확장 시스템을 이용한 유니크 인스턴트 응답을 사용자 단말 1(110)로 전송할 수 있고, 사용자 단말 1(110)은 어플리케이션의 제어에 따라 유니크 인스턴트 응답을 표시할 수 있다. 다른 예로, 서버(150)는 데이터 송수신을 위한 통신 세션을 설정하고, 설정된 통신 세션을 통해 복수의 사용자 단말들(110, 120, 130, 140)간의 데이터 송수신을 라우팅할 수도 있다.In one example, the server 150 may provide a file for installing the application to the user terminal 1 110 accessed through the network 160. In this case, the user terminal 1 110 may install an application using a file provided from the server 150. In addition, the service provided by the server 150 is accessed by accessing the server 150 under the control of an operating system (OS) included in the user terminal 1 110 and at least one program (eg, a browser or an installed application). Content can be provided. For example, when the user terminal 1 110 transmits the content viewing through the network 160 to the server 150 under the control of the application, the server 150 unique response using the semantic triple-based knowledge expansion system It can be transmitted to the user terminal 1 (110), the user terminal 1 (110) can display a unique instant response under the control of the application. As another example, the server 150 may establish a communication session for data transmission and reception, and route data transmission and reception between a plurality of user terminals 110, 120, 130, and 140 through the established communication session.

도 2 는 본 발명의 일 실시예에 있어서, 사용자 단말 및 서버의 내부 구성을 설명하기 위한 블록도이다.2 is a block diagram illustrating an internal configuration of a user terminal and a server in an embodiment of the present invention.

도 2에서는 하나의 사용자 단말에 대한 예로서 사용자 단말 1(110), 그리고 하나의 서버에 대한 예로서 서버(150)의 내부 구성을 설명한다. 다른 사용자 단말들(120, 130, 140)들 역시 동일한 또는 유사한 내부 구성을 가질 수 있다.2 illustrates an internal configuration of the user terminal 1 110 as an example for one user terminal and the server 150 as an example for one server. Other user terminals 120, 130, 140 may also have the same or similar internal configuration.

사용자 단말 1(110)과 서버(150)는 메모리(211, 221), 프로세서(212, 222), 통신 모듈(213, 223) 그리고 입출력 인터페이스(214, 224)를 포함할 수 있다. 메모리(211, 221)는 컴퓨터에서 판독 가능한 기록 매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 또한, 메모리(211, 221)에는 운영체제와 적어도 하나의 프로그램 코드(일례로 사용자 단말 1(110)에 설치되어 구동되는 브라우저나 상술한 어플리케이션 등을 위한 코드)가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 드라이브 메커니즘(drive mechanism)을 이용하여 메모리(211, 221)와는 별도의 컴퓨터에서 판독 가능한 기록 매체로부터 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록 매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록 매체를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록 매체가 아닌 통신 모듈(213, 223)을 통해 메모리(211, 221)에 로딩될 수도 있다. 예를 들어, 적어도 하나의 프로그램은 개발자들 또는 어플리케이션의 설치 파일을 배포하는 파일 배포 시스템(일례로 상술한 서버(150))이 네트워크(160)를 통해 제공하는 파일들에 의해 설치되는 프로그램(일례로 상술한 어플리케이션)에 기반하여 메모리(211, 221)에 로딩될 수 있다.The user terminal 1 110 and the server 150 may include memories 211 and 221, processors 212 and 222, communication modules 213 and 223, and input/output interfaces 214 and 224. The memory 211, 221 is a computer-readable recording medium, and may include a non-permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. In addition, an operating system and at least one program code (for example, a code for a browser or an application described above, which is installed and driven in the user terminal 1 110) may be stored in the memories 211 and 221. These software components can be loaded from a computer-readable recording medium separate from the memories 211 and 221 using a drive mechanism. Such a separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, disk, tape, DVD/CD-ROM drive, and memory card. In other embodiments, software components may be loaded into memory 211 and 221 through communication modules 213 and 223 rather than a computer-readable recording medium. For example, at least one program is a program (an example) in which a file distribution system (for example, the server 150 described above) for distributing installation files of developers or applications is installed by files provided through the network 160. It can be loaded into the memory (211, 221) based on the above-described application).

프로세서(212, 222)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(211, 221) 또는 통신 모듈(213, 223)에 의해 프로세서(212, 222)로 제공될 수 있다. 예를 들어 프로세서(212, 222)는 메모리(211, 221)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.Processors 212 and 222 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. Instructions may be provided to processors 212 and 222 by memory 211 and 221 or communication modules 213 and 223. For example, the processors 212 and 222 may be configured to execute instructions received according to program codes stored in a recording device such as the memories 211 and 221.

통신 모듈(213, 223)은 네트워크(160)를 통해 사용자 단말 1(110)과 서버(150)가 서로 통신하기 위한 기능을 제공할 수 있으며, 다른 사용자 단말(일례로 사용자 단말 2(120)) 또는 다른 서버(일례로 서버(150))와 통신하기 위한 기능을 제공할 수 있다. 일례로, 사용자 단말 1(110)의 프로세서(212)가 메모리(211)와 같은 기록 장치에 저장된 프로그램 코드에 따라 생성한 요청이 통신 모듈(213)의 제어에 따라 네트워크(160)를 통해 서버(150)로 전달될 수 있다. 역으로, 서버(150)의 프로세서(222)의 제어에 따라 제공되는 제어 신호나 명령, 컨텐츠, 파일 등이 통신 모듈(223)과 네트워크(160)를 거쳐 사용자 단말 1(110)의 통신 모듈(213)을 통해 사용자 단말 1(110)로 수신될 수 있다. 예를 들어 통신 모듈(213)을 통해 수신된 서버(150)의 제어 신호나 명령 등은 프로세서(212)나 메모리(211)로 전달될 수 있고, 컨텐츠나 파일 등은 사용자 단말 1(110)이 더 포함할 수 있는 저장 매체로 저장될 수 있다.The communication modules 213 and 223 may provide a function for the user terminal 1 110 and the server 150 to communicate with each other through the network 160, and other user terminals (for example, the user terminal 2 120). Alternatively, a function for communicating with another server (eg, the server 150) may be provided. In one example, the request generated by the processor 212 of the user terminal 1 110 according to the program code stored in the recording device such as the memory 211 is controlled through the network 160 under the control of the communication module 213 server ( 150). Conversely, control signals, commands, contents, files, etc. provided under the control of the processor 222 of the server 150 are communicated through the communication module 223 and the network 160 to the communication module of the user terminal 1 110 ( 213) may be received by the user terminal 1 (110). For example, control signals or commands of the server 150 received through the communication module 213 may be transmitted to the processor 212 or the memory 211, and the user terminal 1 110 may be used for content or files. It may be stored as a storage medium that may further include.

입출력 인터페이스(214, 224)는 입출력 장치(215)와의 인터페이스를 위한 수단일 수 있다. 예를 들어, 입력 장치는 키보드 또는 마우스 등의 장치를, 그리고 출력 장치는 어플리케이션의 통신 세션을 표시하기 위한 디스플레이와 같은 장치를 포함할 수 있다. 다른 예로 입출력 인터페이스(214)는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 보다 구체적인 예로, 사용자 단말 1(110)의 프로세서(212)는 메모리(211)에 로딩된 컴퓨터 프로그램의 명령을 처리함에 있어서 서버(150)나 사용자 단말 2(120)가 제공하는 데이터를 이용하여 구성되는 서비스 화면이나 컨텐츠가 입출력 인터페이스(214)를 통해 디스플레이에 표시될 수 있다.The input/output interfaces 214 and 224 may be means for interfacing with the input/output device 215. For example, the input device may include a device such as a keyboard or mouse, and the output device may include a device such as a display for displaying a communication session of an application. As another example, the input/output interface 214 may be a means for an interface with a device in which functions for input and output are integrated into one, such as a touch screen. As a more specific example, the processor 212 of the user terminal 1 110 is configured using data provided by the server 150 or the user terminal 2 120 in processing a command of a computer program loaded in the memory 211. The service screen or content to be displayed may be displayed on the display through the input/output interface 214.

또한, 다른 실시예들에서 사용자 단말 1(110) 및 서버(150)는 도 2의 구성요소들보다 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 사용자 단말 1(110)은 상술한 입출력 장치(215) 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), GPS(Global Positioning System) 모듈, 카메라, 각종 센서, 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다.In addition, in other embodiments, the user terminal 1 110 and the server 150 may include more components than those of FIG. 2. However, there is no need to clearly show most prior art components. For example, the user terminal 1 110 is implemented to include at least a portion of the input/output device 215 described above, or other configuration such as a transceiver, global positioning system (GPS) module, camera, various sensors, database, etc. It may further include elements.

본 발명의 인공 지능 질의 응답 방법 및 시스템은 서버(150)에 의해 구현될 수 있으며, 보다 상세히 인공 지능 질의 응답방법은 서버(150)의 프로세서(222)가 처리하는 명령에 의해 구현될 수 있다.The artificial intelligence query response method and system of the present invention may be implemented by the server 150, and in more detail, the artificial intelligence query response method may be implemented by an instruction processed by the processor 222 of the server 150.

본 발명이 해결하고자 하는 과제는, 인공지능 질의응답 방법, 장치 및 프로그램을 제공함으로써 사용자의 다양한 질의에 대해 응답할 수 있는 시스템을 구축하는 것이다.The problem to be solved by the present invention is to provide a system for responding to various queries of a user by providing an artificial intelligence question-and-answer method, apparatus, and program.

인공지능 스피커를 필두로 한 AI 기반의 스마트머신들의 대두로 인해, 기존의 포털 검색과 다른, QA(Question Answering) 검색이 등장하고 있으며, 정보 검색 및 입력 수단이 터치, 키워드 입력 등이 아닌, 음성으로 변화함에 따라, 기존 포털 검색의 키워드 기반 검색과 달리, 자연어 기반의 문장을 이해해야 할 필요성이 증대 되었다.Due to the emergence of AI-based smart machines led by artificial intelligence speakers, QA (Question Answering) search, which is different from the existing portal search, has appeared, and information search and input means are not touch, keyword input, etc. As it changed, the need to understand natural language-based sentences increased, unlike the keyword-based search of the existing portal search.

이 때문에 자연어 기반의 문장을 정확히 이해하고 의도에 맞는 검색결과를 전달하기 위해서, 새로운 패턴이나 오타, 맞춤법 오류 등 여러 다양한 경우에도 대처 가능한 인공지능 기반의 질의응답 방법, 장치 및 프로그램이 필요하게 되었다.For this reason, artificial intelligence-based question-and-answer methods, devices, and programs that can cope with various cases such as new patterns, typos, and spelling errors are needed to accurately understand natural language-based sentences and deliver search results that match their intentions.

본 발명은 상기 기술한 QA 검색 외에도 사용자의 의도를 파악하고 원하는 결과를 제공하는 모든 시스템에 응용 가능한 특성을 가지고 있어, 다양한 형태로 응용될 수 있다. 일례로, 지식 QA에서 정답으로 제공되는 instant answer 대신 Slot Filling 방식을 활용한다면, 사용자 의도에 따라 특정한 기능을 제공하는 API로 필요 정보를 제공하는 방식으로 응용 가능하다. 이를 통해, 본 발명은 홈 IoT, 스마트 토이/홈 로봇, 커넥티드 카 등의 다양한 범위에 사용될 수 있는, 활용 범위가 높은 기술이라 할 수 있겠다. 따라서, 이하의 명세서에서는 QA 검색 방식을 위주로 설명하지만, 본 발명은 반드시 여기에 한정되지 않으며, 적용 가능한 모든 시스템에 응용 가능하다.In addition to the QA search described above, the present invention has characteristics applicable to all systems that grasp the user's intention and provide desired results, and thus can be applied in various forms. For example, if Slot Filling is used instead of the instant answer provided as the correct answer in Knowledge QA, it can be applied in a manner that provides necessary information as an API that provides specific functions according to the user's intention. Through this, the present invention can be said to be a technology with high utilization, which can be used for various ranges such as home IoT, smart toy/home robot, and connected car. Therefore, in the following specification, the QA search method is mainly described, but the present invention is not necessarily limited thereto, and is applicable to all applicable systems.

먼저 본 발명을 본격적으로 설명하기 전에, 본 발명의 인공지능 질의 응답 방법과 기존 검색 엔진과의 차이점을 살펴보기로 한다. 본 발명의 일 실시예에 따른 인공지능 질의 응답 시스템은, 시맨틱 트리플 기반의 질문 템플릿을 이용하여 유니크 인스턴트 응답(Unique Instant Answer)을 제공할 수 있다. 본 발명의 인공지능 질의 응답 방법은 검색 결과를 문서 형태가 아닌, 유니크 인스턴트 응답(Unique Instant Answer), 즉 즉답 형태로 제공한다는 점에서 기존의 검색 엔진과 차이점이 존재할 수 있다. 또한, 본 발명의 인공지능 질의 응답 방법은 시맨틱 트리플 기반 검색 결과를 제공하기 위한 학습 데이터를 구축할 수 있다.First, before explaining the present invention in full detail, the differences between the artificial intelligence query response method of the present invention and the existing search engine will be described. The artificial intelligence query response system according to an embodiment of the present invention may provide a unique instant answer using a semantic triple-based question template. The artificial intelligence query response method of the present invention may have a difference from the existing search engine in that the search result is provided in a unique instant answer, that is, in the form of an immediate answer, rather than in the form of a document. In addition, the artificial intelligence query response method of the present invention can construct learning data for providing semantic triple-based search results.

도 3 은 시맨틱 트리플 기반의 검색 결과를 설명하기 위한 것이다.3 is for explaining the semantic triple-based search results.

도 3 을 참조하면, 기존의 검색 엔진(As-Is, Searh)은 입력 방식이 키워드이고, 검색 결과로 문서리스트를 제공하고, 검색 플랫폼은 PC 혹은 모바일 에서 동작한다.Referring to FIG. 3, an existing search engine (As-Is, Searh) has an input method as a keyword, provides a document list as a search result, and the search platform operates on a PC or mobile.

이에 반해, 본 발명의 인공 지능 질의 응답 방법(To-Be, Question-Answering)은, 입력 방식이 자연어 기반의 문장이고, 검색 결과로서 구체적인 응답, 즉 인스턴트 유니크 응답을 제공할 수 있으며, 플랫폼은 PC 혹은 모바일에 한정되지 않고 어디서나 구현될 수 있다.In contrast, in the artificial intelligence query response method (To-Be, Question-Answering) of the present invention, the input method is a natural language-based sentence, and a specific response, ie, an instant unique response, can be provided as a search result, and the platform is a PC. Or it is not limited to mobile and can be implemented anywhere.

보다 상세히, 본 발명의 인공 지능 질의 응답 방법은 기존 검색 엔진이 키워드를 입력하는데 반해 자연어 기반의 문장을 입력 가능하도록 함으로써, 사용자가 사람에게 질문하듯이 자연스럽게 정보를 탐색할 수 있도록 한다. 또한, 본 발명의 인공 지능 질의 응답 방법은 검색 결과로 구체적인 응답을 제공함으로써, 기존의 검색 엔진이 제공하는 문서 리스트에서 사용자가 직접 검색 결과를 찾아야 하는 불편을 경감시키고 최적의 검색 결과를 제공할 수 있다. 또한, 본 발명의 인공 지능 질의 응답 방법은 플랫폼으로서 PC 혹은 모바일에 한정되지 않고 스마트 머신 기반으로 어디서나 즉시 정보를 탐색할 수 있다는 장점이 존재한다. In more detail, the artificial intelligence query response method of the present invention enables a user to search information as naturally as a user asks a question by allowing a natural search engine-based sentence to be input while a conventional search engine inputs a keyword. In addition, the artificial intelligence query response method of the present invention provides a specific response as a search result, thereby reducing the inconvenience of a user having to search for a search result directly from a document list provided by an existing search engine and providing an optimal search result. have. In addition, the artificial intelligence query response method of the present invention is not limited to a PC or mobile as a platform, and has the advantage of being able to instantly search for information anywhere on a smart machine basis.

도 4 는 시맨틱 트리플 기반의 검색 수행의 일 예를 도시한다.4 shows an example of performing semantic triple-based search.

도 4 에 도시된 지식 DB(400)는 실제 사용자들의 질의문을 모사한 시맨틱 트리플 형태로 데이터를 저장한 특수한 형태의 지식기반(Knowledge Base) 데이터베이스로 별도의 추론과정없이 유니크 인스턴트 응답(Unique instant answer)을 검색할 수 있다. 지식 DB(400)는 entity(432)-attribute(434) - instant answer(438)의 형태를 지닌다. 후술하는 본 발명의 실시예에서, 지식 DB(400)는 서버(150) 내부, 혹은 외부에 존재하며 프로세서(222)와 통신하여 데이터를 제공할 수 있는 데이터베이스일 수 있다.The knowledge DB 400 shown in FIG. 4 is a special type of knowledge base database that stores data in the form of semantic triples simulating real users' queries, and it is a unique instant answer without a separate reasoning process. ). Knowledge DB 400 has the form of entity 432-attribute 434-instant answer 438. In the embodiment of the present invention described below, the knowledge DB 400 may be a database that exists inside or outside the server 150 and communicates with the processor 222 to provide data.

도 4 는 "백두산의 높이가 얼마야?"인 사용자 질의(410)를 수신한 경우, 사용자 질의를 분석하여(420), '백두산'과 '높이'라는 핵심단어를 추출한 후 백두산을 물어볼 대상으로 높이를 질문의 의도록 분석할 수 있다. 이에, entity = "백두산", attribute = "높이" 인 데이터를 검색하고, 해당하는 항목의 instant answer를 결과값으로 판단하여, 사용자에게 해당답변 2,744m을 제공한다(450). 상술한 바와 같은 지식 DB(400)는 최적 정답을 검색하는데 별도의 추론 과정 없이 최적의 답을 제공할 수 있다. 이하에서는, 도 3 및 도 4 에서 설명한 바와 같은 시맨틱 트리플에 기반한 본 발명의 인공 지능 질의 응답 방법 및 시스템을 보다 구체적으로 설명하기로 한다. Figure 4 is a "What is the height of Baekdusan?" When receiving a user query (410), by analyzing the user query (420),'Baekdusan' and'height' after extracting the key words as a subject to ask for Baekdusan The height can be analyzed as a question. Accordingly, data with entity = "Baekdusan" and attribute = "height" is searched, and the instant answer of the corresponding item is determined as a result value, and the corresponding answer is 2,744m to the user (450). As described above, the knowledge DB 400 may search for an optimal correct answer and provide an optimal answer without a separate reasoning process. Hereinafter, the artificial intelligence query response method and system based on the semantic triple as described in FIGS. 3 and 4 will be described in more detail.

도 5는 본 발명의 일 실시예에 따른 프로세서의 내부 구성을 나타낸 것이다.5 illustrates an internal configuration of a processor according to an embodiment of the present invention.

프로세서(212)는 웹 페이지를 온라인으로부터 제공받아 출력할 수 있는 웹 브라우저(web browser) 또는 어플리케이션을 포함할 수 있다. 프로세서(212) 내에서 본 발명의 일 실시예에 따른 시맨틱 트리플 기반의 지식 확장 시스템의 구성은 도 3 에 도시된 바와 같이 질의 수신부(510), 제1 질의 확장부(520), 제2 질의 확장부(530), 학습 데이터 구축부(540) 및 질의 응답부(550)를 포함할 수 있다. 더불어, 제2 질의 확장부(530)는 자연어확장 모듈(531) 및 패러프레이징 엔진(532)을 포함하고, 학습 데이터 구축부(540)는 NMT 엔진 관리부(541), 학습 데이터 관리부(542) 및 모델 배포부(543)를 포함할 수 있다. 본 발명의 일 실시예에 따라 프로세서(212)의 구성요소들은 선택적으로 프로세서(212)에 포함되거나 제외될 수도 있다. 또한, 실시예에 따라 프로세서(212)의 구성요소들은 프로세서(212)의 기능의 표현을 위해 분리 또는 병합될 수도 있다.The processor 212 may include a web browser or application capable of receiving and outputting a web page from online. The configuration of the semantic triple-based knowledge expansion system according to an embodiment of the present invention in the processor 212 is a query receiving unit 510, a first query expansion unit 520, a second query expansion as shown in FIG. It may include a unit 530, a learning data construction unit 540 and a query response unit 550. In addition, the second query expansion unit 530 includes a natural language expansion module 531 and a paraphrase engine 532, and the training data construction unit 540 includes an NMT engine management unit 541, a learning data management unit 542, and A model distribution unit 543 may be included. According to an embodiment of the present invention, components of the processor 212 may be selectively included or excluded from the processor 212. Further, according to an embodiment, the components of the processor 212 may be separated or merged to express the function of the processor 212.

여기서, 프로세서(212)의 구성요소들은 사용자 단말 1(110)에 저장된 프로그램 코드가 제공하는 명령(일례로, 사용자 단말 1(110)에서 구동된 웹 브라우저가 제공하는 명령)에 따라 프로세서(212)에 의해 수행되는 프로세서(212)의 서로 다른 기능들(different functions)의 표현들일 수 있다.Here, the components of the processor 212 are the processor 212 according to an instruction provided by the program code stored in the user terminal 1 110 (eg, an instruction provided by a web browser driven by the user terminal 1 110). It may be expressions of different functions (different functions) of the processor 212 performed by.

이러한 프로세서(212) 및 프로세서(212)의 구성요소들은 도 4 의 인공 지능 질의 응답 방법이 포함하는 단계들(S1 내지 S5)을 수행하도록 사용자 단말 1(110)을 제어할 수 있다. 예를 들어, 프로세서(212) 및 프로세서(212)의 구성요소들은 메모리(211)가 포함하는 운영체제의 코드와 적어도 하나의 프로그램의 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다.The processor 212 and the components of the processor 212 may control the user terminal 1 110 to perform steps S1 to S5 included in the artificial intelligence query response method of FIG. 4. For example, the processor 212 and components of the processor 212 may be implemented to execute instructions according to code of an operating system included in the memory 211 and code of at least one program.

도 6 는 본 발명의 일 실시예에 따른 인공지능 질의 응답 방법을 시계열적으로 나타낸 도면이며, 도 7a 및 7b 는 본 발명의 일 실시예에 따른 인공 지능 질의 응답 시스템의 전체적인 구조를 나타낸 도면이다. 보다 상세히, 도 7a는 사용자 질의 분석이 성공한 경우 질의 응답 시스템의 동작을 나타낸 것이고, 도 7b 는 사용자 질의 부석이 성공하지 못한 경우 질의 응답 시스템의 동작을 나타낸 것이다. 이하에서는, 도 6, 도 7a 및 도 7b 를 함께 참조하여 본 발명을 설명하기로 한다.FIG. 6 is a time series diagram of an artificial intelligence query response method according to an embodiment of the present invention, and FIGS. 7a and 7b are diagrams showing the overall structure of an artificial intelligence query response system according to an embodiment of the present invention. More specifically, FIG. 7A shows the operation of the query response system when the user query analysis is successful, and FIG. 7B shows the operation of the query response system when the user query analysis is unsuccessful. Hereinafter, the present invention will be described with reference to FIGS. 6, 7A, and 7B together.

먼저, 질의 수신부(510)는 사용자 단말(110)로부터 사용자의 질의를 수신한다(S61). 보다 상세히, 질의 수신부(510)는 사용자 질의를 수신하여(S71), 수신된 질의를 제1 질의 확장부(520)로 전달한다(S72). 사용자의 질의의 형태는 음성, 텍스트 등 다양한 형태로 수신될 수 있다. 질의 수신부(510)는 수신된 사용자의 질의를 적절한 변환 과정을 통해 적합한 형태로 변환할 수 있다.First, the query receiving unit 510 receives a user's query from the user terminal 110 (S61). In more detail, the query receiving unit 510 receives the user query (S71), and transmits the received query to the first query expansion unit 520 (S72). The form of the user's query may be received in various forms such as voice and text. The query receiving unit 510 may convert the received user's query into an appropriate form through an appropriate conversion process.

본 발명의 일 실시예에 따르면, 사용자는 본 발명의 인공지능 질의응답 방법, 장치 및 프로그램이 적용된 AI 기반 스마트머신인 사용자 단말(100)에 자연어 기반의 질의를 입력할 수 있다. 해당 스마트머신은 기존 스마트폰, 컴퓨터뿐만 아니라, 인공지능 스피커, 커넥티드 카, 홈 IoT, AI 가전, 개인 비서, 홈 로봇/스마트 토이, 챗봇 응용 프로그램 및 인트라넷 등을 포함할 수 있다.According to an embodiment of the present invention, the user may input a natural language-based query to the user terminal 100, which is an AI-based smart machine to which the artificial intelligence query response method, apparatus, and program of the present invention are applied. The smart machine may include not only existing smart phones and computers, but also artificial intelligence speakers, connected cars, home IoT, AI appliances, personal assistants, home robots/smart toys, chatbot applications, and intranets.

다음으로, 제1 질의 확장부(520)는 사용자 질의를 분석하여 entity 및 attribute 가 인식 가능한지 여부를 확인한다(S62). 사용자 질의 분석 결과 entity 및 attribute 가 인식 가능한 경우, 질문 템플릿을 생성하여 제1 질의 확장을 수행하며, 사용자 질의와 생성된 질문 템플릿이 일치하는지에 대한 여부를 확인한다(S64). 보다 상세히, 제1 질의 확장부(520)는 사용자의 질의를 분석하고, 지식 DB(400)로부터 질의에 대한 답변 정보를 확인하여, 지식 DB(400)에서 답변 정보를 찾아 해당 질의와 유사한 다수의 질문 템플릿을 생성하고, 사용자 질의와 질문 템플릿이 일치하는지를 비교하여 1차 검색을 수행한다.Next, the first query expansion unit 520 analyzes the user query to determine whether the entity and attribute are recognizable (S62). As a result of user query analysis, if the entity and attribute are recognizable, a question template is generated to perform the first query extension, and it is checked whether the user query and the generated question template match (S64). In more detail, the first query expansion unit 520 analyzes the user's query, checks the answer information for the query from the knowledge DB 400, finds answer information in the knowledge DB 400, and finds a number of similar queries. Create a question template and compare the user query with the question template to perform a primary search.

보다 상세히, 제1 질의 확장부(520)는 사용자 질의를 분석한 결과를 기반으로 사용자 질의에 부합하는 질문 템플릿을 생성할 수 있다. 사용자의 질의가 수신되면, 제1 질의 확장부(520)는 사용자 질의를 분석하여, 분석한 결과를 기반으로 질문 템플릿을 생성하고 제1 질의 확장을 수행한다. 이때, 제1 질의 확장부(520)는 상술한 지식 DB(590)로부터 질문 템플릿을 생성할 수 있다. 이하에서는, 제1 질의 확장부(520)가 질문 템플릿을 생성하는 구체적인 구성을 설명하기로 한다.In more detail, the first query extension unit 520 may generate a question template matching the user query based on the result of analyzing the user query. When a user query is received, the first query expansion unit 520 analyzes the user query, generates a question template based on the analyzed result, and performs the first query expansion. At this time, the first query expansion unit 520 may generate a question template from the knowledge DB 590 described above. Hereinafter, a specific configuration in which the first query expansion unit 520 generates a question template will be described.

먼저 제1 질의 확장부(520)는, 상술한 바와 같은 시맨틱 트리플 형식으로 사용자 질의를 분석하기 위해, NLP(Natural Language Processing) 엔진을 사용하여 사용자 질의를 분석할 수 있다. 이 때 사용자 질의에서 형태소분석 등의 기술을 사용하여 entity 및 attribute를 찾아낸다. 통상의 사용자들은 entity + attribute 형태의 질문을 하기 때문에, 순차적으로 문장을 분석하여 사용자 질의에서 entity 및 attribute 후보군을 찾아낸다.First, the first query extension unit 520 may analyze a user query using a natural language processing (NLP) engine to analyze a user query in the semantic triple format as described above. In this case, entities and attributes are found in the user query using techniques such as morpheme analysis. Since ordinary users ask questions in the form of entity + attribute, the sentences are sequentially analyzed to find candidate entities and attribute groups in the user query.

본 발명의 일 실시예에서, 제1 질의 확장부(520)는 사용자 질의 분석 시, entity 및 attribute 기준으로 하나만 있을 경우에는 해당 entity 및 attribute에 해당하는 내용을 질문 템플릿으로 생성한다. 일례로, '오버워치 영웅 중 런던에서 주로 활동하는 요원은?' 의 경우, entity가 '오버워치 영웅'으로 분석되고 attribute가 탐지되지 않으면, 지식 DB에서 ‘오버워치 영웅’을 entity로 가지는 모든 attribute를 활용하여 질문 템플릿을 생성할 수 있다.In an embodiment of the present invention, the first query expansion unit 520 generates a content of the corresponding entity and attribute as a question template when there is only one entity and attribute criteria when analyzing a user query. For example,'Which Overwatch heroes are active in London?' In the case of an entity, if an entity is analyzed as an'overwatch hero' and the attribute is not detected, a question template can be created by utilizing all the attributes that have an'overwatch hero' as an entity in the knowledge DB.

다음으로, 제1 질의 확장부(520)는 질의어 분석 결과를 기반으로 제1 질의 확장부(520)는 지식 DB(400)로부터 질문 템플릿을 생성할 수 있다. 구체적으로, 제1 질의 확장부(520)는 사용자 질의와 일치하는 카테고리별로 entity 및 attribute 후보군을 검색 형태로 찾고, entity 유의어 및 attribute 유의어를 기반으로 하여 질문 템플릿을 생성한다. 이때, 질문 템플릿은 entity와 attribute 뿐만 아니라, instant answer까지 추가 정보로 가질 수 있다. 이는 후에 서술할 사용자 질의와 제1 질의 확장부(520)에서 만든 질문 템플릿이 일치할 경우, 사용자에게 즉답(instant answer)을 제공할 수 있다(S65). 즉답(instant answer)이 존재하면, 해당 질문 템플릿을 후술하는 제2 질의 확장 단계를 학습(training)시키기 위한 학습데이터로 생성할 수 있다(S68).Next, the first query expansion unit 520 may generate a question template from the knowledge DB 400 based on the result of query analysis. Specifically, the first query extension unit 520 searches the entity and attribute candidate groups for each category matching the user query in a search form, and generates a question template based on the entity synonyms and attribute synonyms. At this time, the question template may have not only entity and attribute, but also instant answer as additional information. This may provide an instant answer to the user when the user query to be described later and the question template created by the first query expansion unit 520 match (S65 ). If an instant answer exists, the corresponding question template may be generated as training data for training a second query expansion step described later (S68).

다음으로, 제1 질의 확장부(520)는 생성된 질문 템플릿과 사용자 질의를 비교하여, 일치하는지 여부의 결과를 확인한다(S73). 제1 질의 확장부(520)가 생성한 질문 템플릿과 사용자 질의가 같다고 판단되면, 질문 템플릿에서 기 생성한 instant answer를 답변으로 제공한다(S65). 제1 질의 확장부(520)는 사용자 질의에서 의미없는 문자 또는 단어를 제거하여 생성된 질문 템플릿과 사용자 질의를 비교할 수 있다. 예를 들어, '금강산의 해발고도는?' 이라는 질문이 있을 경우, 자연어 처리를 통해 의미 표현과 관련없는 '의', '는'을 제거할 수 있다. 의미없는 문자 또는 단어가 제공된 사용자 질의와 생성된 질문 템플릿이 정확히 일치하는가를 판단한다. 만약, 생성된 질문 템플릿과 사용자 질의가 같지 않다고 판단되면 제2 질의 확장부(530)를 이용하여 사용자 질의를 확장할 수 있다.Next, the first query expansion unit 520 compares the generated question template and the user query, and checks whether the result matches (S73). If it is determined that the question template generated by the first query expansion unit 520 is the same as the user query, an instant answer previously generated from the question template is provided as an answer (S65). The first query expansion unit 520 may compare the user template with the question template generated by removing meaningless letters or words from the user query. For example,'What is the elevation of Mt. If there is a question that can be eliminated through the natural language processing,'righteousness' and'silver' that are not related to meaning expression can be removed. It is determined whether the user query provided with meaningless letters or words exactly matches the generated question template. If it is determined that the generated query template and the user query are not the same, the user query may be extended using the second query expansion unit 530.

다음으로, 제2 질의 확장부(530)는 생성된 질문 템플릿과 사용자 질의가 불일치한 경우, 유사 질문 템플릿을 생성한다. 보다 상세히, 제2 질의 확장부(530)는 자연어 처리 및 딥러닝 모델을 활용하여 시맨틱(semantic) 유사 질문 템플릿을 생성함으로써 질문 템플렛을 제2 확장하여 결과를 비교한다(S66). 즉, 제1 질의 확장부(520)에 의해 생성된 질문 템플릿과 사용자 질의가 일치하는 결과가 존재하지 않을 때, 유사 질의 엔진인 제2 질의 확장부(530)를 이용하여 기존에 생성했던 질문 템플릿 기반으로 질의 확장을 수행하고 유사 질문 템플릿을 추가적으로 생성할 수 있다(S74).Next, the second query extension unit 530 generates a similar question template when the generated question template and the user query are inconsistent. In more detail, the second query expansion unit 530 compares the results by expanding the second question template by generating a semantic-like question template using a natural language processing and a deep learning model (S66). That is, when the question template generated by the first query expansion unit 520 and the result of the user query matching do not exist, the question template previously generated using the second query expansion unit 530 which is a similar query engine. Based on the query expansion, similar question templates can be additionally generated (S74).

혹은, 상술한 S72 단계에서 제1 질의 확장부(520)의 사용자 질의 분석 결과 entity 및 attribute 가 인식 불가능하다고 판단한 경우, 제2 질의 확장부(530)는 기 도출된 사용자 질의 결과를 유사 질문 템플릿으로 하여 질의를 확장하여 결과를 비교한다(S63). 즉, 사용자 질의와 기 생성된 질문 템플릿을 유사 질의 엔진을 활용하여 결과를 확인한다(S78). 즉, 사용자 질의 분석 시 entity 및 attribute가 모두 발견되지 않을 경우, 기존 시스템 로그에서 답변된 내용을 기반으로 하여 제2 질의 확장부의 패러프레이징 엔진에서 사용자 질의와 시스템 로그 답변 내용을 유사도 비교하여 검색을 수행할 수 있으며, 이에 해당하는 것이 S63 및 S78 단계이다.Alternatively, if it is determined in step S72 that the user query analysis of the first query expansion unit 520 is not possible to recognize the entity and attribute, the second query expansion unit 530 uses the previously derived user query results as a similar question template. To expand the query and compare the results (S63). That is, the user query and the previously generated question template are verified using a similar query engine (S78). That is, when the user query analysis does not find both the entity and attribute, the paraphrase engine of the second query extension compares the user query and the system log answer based on similarity and performs a search based on the answers from the existing system log. It can be, and the corresponding step is S63 and S78.

제2 질의 확장부(530)의 유사 질문 템플릿 생성 방식에는 자연어 처리 및 딥러닝 모델을 사용한 시맨틱(semantic) 유사질문 생성 방식이 사용되어, 시맨틱 유사 질문 템플릿을 확장하여 결과를 비교할 수 있다. 도 3 에 도시된 바와 같이, 제2 질의 확장부(530)는 자연어 확장 모듈(531) 및 패러프레이징 엔진(532)을 포함할 수 있다. 즉, 제2 질의 확장부(530)는 자연어 확장 모듈(531)을 통해 서술어 확장을 진행하고, 딥러닝 기반의 패러프레이징 엔진(532)을 통해 사용자 질의와 질문 템플릿의 유사도를 파악하여 사용자 의도에 맞는 답변을 제공할 수 있다.The semantic similar question generation method using natural language processing and a deep learning model is used as a method for generating the similar question template of the second query expansion unit 530, and the results can be compared by expanding the semantic similar question template. As illustrated in FIG. 3, the second query expansion unit 530 may include a natural language expansion module 531 and a paraphrase engine 532. That is, the second query expansion unit 530 proceeds with the predicate expansion through the natural language expansion module 531, and grasps the similarity between the user query and the question template through the deep learning-based paraphrase engine 532 to determine the user's intention. You can provide the correct answer.

먼저, 제2 질의 확장부(530)의 자연어 확장 모듈(531)는 는 자연어 처리 방식을 사용하여, 특정 주제의 질의에 대해 다양한 패턴을 제공하여 유사질문 템플릿을 생성할 수 있다. 일 실시예에서, '[Person]의 출생지는?' 이라는 질문은, '[Person]이 어디서 태어났어?', '[Person]이 태어난 곳이 어디야?' 등과 같은 의미를 가지고 있다. 제2 질의 확장부(530)에서는, 위의 예시와 같이 질의 템플릿에서 만든 entity-attribute 조합의 질문 템플릿을 자연어 처리 방식을 통해 확장하여 유사 질문 템플릿을 생성할 수 있다.First, the natural language expansion module 531 of the second query expansion unit 530 may generate a similar question template by providing various patterns for queries of a specific subject using a natural language processing method. In one embodiment,'where is [Person] born?' The question is,'Where was [Person] born?','Where was [Person] born?' It has the same meaning. The second query expansion unit 530 may generate a similar question template by extending the question template of the entity-attribute combination created from the query template through a natural language processing method as in the example above.

보다 상세히, 자연어 확장 모듈(531)의 자연어 처리 방식은 별도 구축한 유사질의 DB를 활용하여 특정 attribute에 따라 패턴 형식으로 확장하는 방식으로 구현될 수 있다. 일례로, '출생지' attribute의 경우, 제2 질의 확장부(530)는 '어디서 태어났어?', '태어난 곳은?', '태어난 곳이 어디야?' 등의 다양한 술어부 확장을 통해 질문 템플릿을 확장하여 유사 질문 템플릿을 생성할 수 있다.In more detail, the natural language processing method of the natural language expansion module 531 may be implemented in a manner of expanding in a pattern form according to a specific attribute by using a separately constructed DB of similar quality. For example, in the case of the'birth place' attribute, the second query expansion unit 530 asks,'Where were you born?','Where were you born?','Where were you born?' Similar question templates can be generated by extending question templates through various predicate extensions such as.

또한, 제2 질의 확장부(530)의 패러프레이징 엔진(532)은 패러프레이징(paraphrasing)을 통해 사용자 질의의 시맨틱 유사 질문 템플릿을 생성한다. 또한, 패러프레이징 엔진(532)은 사용자 질의와 유사 질문 템플릿을 비교하여 결과를 확인한다(S75). 보다 상세히, 제2 질의 확장부(530)는 기존 생성된 질문 템플릿과 확장된 유사질문 템플릿을 합쳐, 딥러닝 기반의 패러프레이징 엔진을 통해 사용자 질의와 질문 템플릿, 유사 질문 템플릿의 유사도를 비교한다. 유사도 비교는 총 2단계로 진행되며, 세부적인 사항은 하기와 같다.In addition, the paraphrase engine 532 of the second query expansion unit 530 generates a semantic similar question template of the user query through paraphrase. In addition, the paraphrase engine 532 compares the user query and the similar question template to check the result (S75). In more detail, the second query extension unit 530 compares the similarity between the user query, the question template, and the similar question template through a deep learning-based paraphrase engine by combining the existing generated question template and the expanded similar question template. The similarity comparison is conducted in two stages, and the details are as follows.

첫번째 단계로, 먼저 패러프레이징 엔진(532)은 사용자 질의와 제1 질의 확장부(520)가 생성한 기존 질문 템플릿, 유사질문 템플릿의 유사도를 측정하여, 질문 템플릿과 유사질문 템플릿의 총 숫자 에서 상위 N개의 후보군을 선정한다. 이 때, 상위 N개는 갯수는 관리자페이지 혹은 통계 기반의 피드백 프로그램에 따라 변경될 수 있다. 패러프레이징 엔진(532)은 사용자 질의와 기존 질문 템플릿, 유사질문 템플릿 Top N개를 비교하여, 가장 유사하다고 생각하는 하나의 질문 템플릿 및 유사도를 반환한다.As a first step, first, the paraphrase engine 532 measures the similarity between the user query, the existing query template generated by the first query expansion unit 520, and the similar question template, and is higher than the total number of the question template and the similar question template. N candidates are selected. At this time, the number of the top N can be changed according to the manager page or the statistics-based feedback program. The paraphrase engine 532 compares the user query with the existing question templates and Top N similar question templates, and returns one question template and similarity that are considered to be the most similar.

두번째 단계로, 패러프레이징 엔진(532)은 유사도를 기반으로 하여 선정된 Top 1의 질문 템플릿이 최종적으로 사용자 질의와 같은 의미인지를 판단한다. 판단 기준의 경우 최초에는 관리자가 임의로 선정하지만, 이후 실제 결과 피드백을 통해 통계 기반으로 조정될 수 있다. 예를 들어, 초기 유사도를 90%로 선정했다 해도, 실제 정답 도출 이력을 확인하여 85% 이상의 유사도임에도 답변을 올바르게 수행??다면, 자동으로 해당 유사도를 90%에서 85%로 변경하여 자동으로 답변 커버리지를 확대할 수 있다. 만약, 유사도가 일정 기준치 미만일 경우, 검색결과가 없다는 메시지를 출력할 수 있다.In the second step, the paraphrase engine 532 determines whether the Top 1 question template selected based on the similarity has the same meaning as the user query. In the case of judgment criteria, the manager initially selects them arbitrarily, but can then be adjusted based on statistics through actual result feedback. For example, even if the initial similarity is selected as 90%, if the answer is correct even though it is more than 85% by checking the actual correct answer derivation history, the answer is automatically changed by changing the similarity from 90% to 85% automatically Coverage can be expanded. If the similarity is less than a certain threshold, a message indicating that there is no search result may be output.

한편, 제2 질의 확장부(530)는 상술한 단계가 끝나게 되면, 사용자 질의를 포함하여 Top 1개로 선정된 질문 템플릿, Top N개로 선정된 질문 템플릿, 검색 시간 및 단계, 검색 속도 등의 제반 시스템 정보를 별도의 DB로 저장할 수 있다.On the other hand, when the above-described step is completed, the second query expansion unit 530 includes a user query, a question template selected as Top 1, a question template selected as Top N, a search time and step, and a search system. Information can be stored in a separate DB.

다음으로, 질의 응답부(550)는 제1 질의 확장부(520) 및 제2 질의 확장부(530)를 통해 도출된 사용자 질의 결과를 사용자 단말(110)로 전달한다(S65). 사용자 질의 결과는 AI 기반 스마트 머신에 전달되어 특성에 맞는 인터페이스로 전달되며, 이와 함께 '사용자 질의'. '답변 여부'. '시간', '기기' 등의 세부적인 정보가 시스템 로그로 저장되어 차후 패러프레이징 모델 관리 등에 사용될 수 있다.Next, the query response unit 550 transmits the user query results derived through the first query expansion unit 520 and the second query expansion unit 530 to the user terminal 110 (S65). The user query result is delivered to the AI-based smart machine and delivered to the interface that fits the characteristics. 'Answer'. Detailed information such as'time' and'device' is stored as a system log and can be used for future paraphrase model management.

다음으로, 질의 응답부(550)는 도출된 사용자 질의 결과를 사용자 단말로 전달한다(S67).Next, the query response unit 550 delivers the derived user query result to the user terminal (S67).

또한, 본 발명의 일 실시예에 따른 학습 데이터 구축부(540)는 상기 도출된 사용자 질의 결과를 이용하여 제2 질의 확장부(530)를 학습(training)시키기 위한 학습 데이터를 NMT(Neural Machine Translation) 엔진을 이용하여 생성할 수 있다(S68). 즉, 제2 질의 확장부(530)는 학습 데이터 구축부(540)에 의해 생성된 학습 데이터를 이용하여 학습을 수행할 수 있다. 특히, 패러프레이징 엔진(532)은 딥러닝 모델을 이용한 시맨틱 유사 질문 사용 방법을 사용하는데, 이를 위해서는 풍부한 학습 데이터로 해당 딥러닝을 훈련시켜야 한다. 이에, 학습 데이터 구축부(540)는 패러프레이징 엔진의 학습(training)을 수행하며, 학습을 위한 학습 데이터를 생성할 수 있다.In addition, the training data construction unit 540 according to an embodiment of the present invention uses the derived user query result to train the training data for training the second query expansion unit 530 (NMT). ) Can be generated using the engine (S68). That is, the second query expansion unit 530 may perform learning using the training data generated by the training data construction unit 540. In particular, the paraphrase engine 532 uses a method of using a semantic similar question using a deep learning model, and for this, it is necessary to train the corresponding deep learning with rich learning data. Accordingly, the training data construction unit 540 may perform training of the paraphrase engine and generate training data for training.

보다 상세히, 본 발명의 일 실시예에 따르면 딥러닝 기반의 패러프레이징 엔진(532)의 경우 모델의 구조를 미리 지정하지 않고 학습을 통해 모델을 만들어 작업자의 개입이 최소화되며, 복잡하고 깊은 구조를 만들 수 있어 기존 방식보다 정확도가 높다는 특징이 있다. 하지만, 인간의 작업을 대체할 수 있는 성능을 내기 위해서는 수 만개 이상의 대량의 학습 데이터를 필요로 한다는 문제점이 존재한다. 이에, 본 발명의 일 실시예에 따르면 인공지능 질의 응답 장치는 학습 데이터를 자동으로 구축하는 방법을 제시한다.In more detail, according to an embodiment of the present invention, in the case of the deep learning-based paraphrase engine 532, the model does not specify the structure of the model in advance, and a model is generated through training to minimize operator intervention, and to create a complex and deep structure. It has the feature of being more accurate than the existing method. However, there is a problem in that a large amount of learning data of tens of thousands or more is required in order to produce a performance that can replace human tasks. Accordingly, according to an embodiment of the present invention, the artificial intelligence query response device proposes a method for automatically constructing learning data.

먼저, 학습 데이터 구축부(540)는 패러프레이징(paraphrasing)을 위한 학습 데이터를 생성한다. 보다 상세히, 학습 데이터 구축부(540)는 상술한 제2 질의 확장부(530)가 시맨틱 유사 질문 템플릿을 형성할 수 있도록, 본 발명의 인공 지능 응답 방법을 학습시킬 수 있는 학습 데이터를 구축한다. 이를 위해, 학습 데이터 구축부(540)는 NMT 엔진 관리부(541), 학습 데이터 관리부(542) 및 모델 배포부(543)를 포함할 수 있다.First, the training data construction unit 540 generates training data for paraphrase. In more detail, the learning data construction unit 540 constructs training data capable of learning the artificial intelligence response method of the present invention so that the above-described second query expansion unit 530 can form a semantic similar question template. To this end, the training data construction unit 540 may include an NMT engine management unit 541, a training data management unit 542, and a model distribution unit 543.

제2 질의 확장부(330)은 본 발명에서는 구축한 패러프레이징 엔진의 품질을 지속적으로 확보하기 위해, 학습 데이터 구축에 복수의 NMT(Neural Machine Translation) 엔진을 사용하기 위한 NMT 엔진 관리부(541)을 포함하고, 해당 NMT 엔진의 번역 품질을 관리하기 위한 통계 기반의 별도 관리 프로그램을 사용하는 학습 데이터 관리부(542)를 포함할 수 있다. 더불어, 생성된 학습 데이터로 학습을 진행하고, 패러프레이징 모델을 배포 및 적용하는 모델 배포부(543)을 포함하여, 일련의 과정을 포함하는 학습 데이터 구축 및 품질 관리가 총체적으로 본 발명의 구성이 될 수 있다.In the present invention, the second query expansion unit 330 uses the NMT engine management unit 541 for using a plurality of neural machine translation (NMT) engines to build training data in order to continuously secure the quality of the constructed paraphrase engine. Including, learning data management unit 542 using a statistical-based separate management program for managing the translation quality of the NMT engine. In addition, the learning data construction and quality management including a series of processes, including a model distribution unit 543 that progresses learning with the generated learning data and distributes and applies a paraphrase model, collectively constitutes the present invention. Can be.

도 8 은 본 발명의 일 실시예에 따라 학습 데이터 및 패러프레이징 모델을 구축하는 것을 설명하기 위한 도면이다.8 is a diagram for explaining building a training data and a paraphrase model according to an embodiment of the present invention.

먼저, 학습 데이터 구축부(540)는 학습 데이터를 만들기 위해, 사용자의 실제 질의를 원 데이터로 설정하고, 실제 질의 기반의 사용자 로그 데이터를 NMT 엔진 관리부(541)로 전송한다(S81). 이 때 실제 질의는 서버의 Log DB에 저장된 데이터를 사용할 수 있다. 학습 데이터 구축부(540)는 해당 데이터를 NMT 엔진 관리부(541)로 전달하여, 학습 데이터 생성 준비를 진행한다.First, the learning data construction unit 540 sets the actual query of the user as raw data to generate the training data, and transmits user log data based on the actual query to the NMT engine management unit 541 (S81). At this time, the actual query can use the data stored in the server's Log DB. The training data construction unit 540 transmits the corresponding data to the NMT engine management unit 541, and prepares to generate training data.

다음으로, NMT 엔진 관리부(541)는 신경망 기반의 외부 NMT 엔진을 복수로 사용하여, 사용자 로그 데이터 또는 사용자 질의를 타 언어로 번역 후, 한국어로 재번역하는 과정을 거쳐 학습 데이터를 생성한다(S82). 본 발명의 일 실시예에 따르면, NMT 엔진 관리부(541)는 한국어로 씌어진 제1 문장을 특정 외국어로 번역하고, 특정 외국어로 번역한 제1 문장을 다시 한국어로 번역하여 제 2문장을 획득할 수 있다. 즉, NMT 엔진 관리부(541)는 패러프레이징 엔진(532)이 동일한 질의 혹은 문장에 대해 이와 유사한 자연어 표현들을 학습 데이터로서 수집할 수 있도록 NMT 엔진을 활용할 수 있다.Next, the NMT engine management unit 541 uses a plurality of external NMT engines based on a neural network, and translates user log data or user queries into other languages, and then re-translates them into Korean to generate learning data (S82). . According to an embodiment of the present invention, the NMT engine management unit 541 may translate the first sentence written in Korean into a specific foreign language, and translate the first sentence translated into a specific foreign language back into Korean to obtain a second sentence. have. That is, the NMT engine management unit 541 may utilize the NMT engine so that the paraphrase engine 532 can collect similar natural language expressions as learning data for the same query or sentence.

본 발명에서 사용되는 NMT 엔진은 기 설정된 패턴 및 규칙 기반이 아닌, 학습을 통한 신경망 방식으로 번역을 수행하므로, 제1 문장을 외국어로 번역한 후 다시 한국어로 번역하면 제1 문장과 같거나 유사한 의미를 갖되 표현이 상이한 자연어 문장등을 획득할 수 있다. 또한, 외부 NMT 번역 엔진의 경우 각기 다른 신경망 규칙 및 다른 학습 데이터를 사용하기 때문에, 같은 문장을 특정 외국어로 번역 후 다시 한국어로 번역할 경우, 유사한 의미를 갖되 표현이 상이한 자연어 문장을 추가 적으로 획득할 수 있다.Since the NMT engine used in the present invention performs translation in a neural network manner through learning, not based on predetermined patterns and rules, if the first sentence is translated into a foreign language and then translated into Korean again, the same or similar meaning to the first sentence You can obtain natural language sentences with different expressions. In addition, since the external NMT translation engine uses different neural network rules and different learning data, when the same sentence is translated into a specific foreign language and then translated into Korean again, natural language sentences having similar meanings but different expressions are additionally acquired. can do.

더불어, 한국어인 제1 문장을 제1 외국어로 번역하고, 제1 외국어를 다시 제2 외국어로 번역한 후, 제2 외국어를 한국어로 번역할 경우, 유사한 의미를 갖되 또 다른 표현으로 표시된 자연어 문장을 추가 획득할 수 있다.In addition, when the first sentence in Korean is translated into the first foreign language, the first foreign language is translated into the second foreign language, and then the second foreign language is translated into Korean, the natural language sentence having similar meaning but expressed in another expression Can be obtained additionally.

상술한 바와 같은 방식으로 형성된 학습 데이터는, 실제 사용자 질의 및 사용된 외부 NMT 정보, 번역 단계 및 번역에 사용된 언어 등과 함께 학습 데이터 관리부(542)로 전달된다. 구체적으로, 생성된 학습 데이터는 실제 사용자 질의에 기반하여 생성된 것이므로, 양자를 매칭하여 [실제 사용자 질의 - 생성된 학습 데이터] 형태로 학습 데이터 관리부(542)로 전달될 수 있다. 만약, 사용자 질의와 생성된 학습 데이터가 일치할 경우 해당 정보는 전달되지 않는다. 또한, 생성 날짜 및 시간, NMT 모델, 번역 언어와 같은 관련 정보도 학습 데이터 관리부(542)에 전달될 수 있다. 한편, 번역에 사용되는 언어, 번역 단계 등은 이후 학습 데이터 관리부(542)에서 실제 패러프레이징 엔진 학습 결과를 기반으로 하여 자동으로 조정될 수 있다.The learning data formed in the above-described manner is transmitted to the learning data management unit 542 together with actual user queries and external NMT information used, translation steps and languages used for translation. Specifically, since the generated learning data is generated based on an actual user query, the two may be matched and transmitted to the learning data management unit 542 in the form of [real user query-generated learning data]. If the user query and generated learning data match, the corresponding information is not transmitted. In addition, related information such as a creation date and time, an NMT model, and a translation language may also be transmitted to the learning data management unit 542. Meanwhile, the language used for translation, the translation step, and the like may be automatically adjusted in the learning data management unit 542 based on the actual paraphrase engine learning result.

다음으로, 학습 데이터 관리부(542)는 NMT 엔진 관리부(541)에서 생성된 학습 데이터를 저장하고, 실제 패러프레이징 모델의 학습을 진행하고, 테스트 및 검증할 수 있다(S83). 보다 상세히, 학습 데이터 관리부(542)는 NMT 엔진 관리부(541)에 의해 생성된 학습 데이터를 저장하고, 생성된 학습 데이터를 이용하여 패러프레이징 엔진에 적용될 수 있는 패러프레이징 모델의 학습을 진행하고, 상기 패러프레이징 모델을 테스트 및 검증한다. 도 8 에 도시된 패러프레이징 학습 프로세서는 실제로 패러프레이징 엔진을 학습하는 프로세서를 의미하며, 학습 데이터 구축부(540)에 포함된 일 기능을 의미하는 것일 수 있다. Next, the learning data management unit 542 may store the learning data generated by the NMT engine management unit 541, progress the learning of the actual paraphrase model, and test and verify (S83). In more detail, the learning data management unit 542 stores the learning data generated by the NMT engine management unit 541 and uses the generated learning data to train a paraphrase model that can be applied to the paraphrase engine. Test and verify the paraphrase model. The paraphrase learning processor illustrated in FIG. 8 actually refers to a processor that learns a paraphrase engine, and may mean a function included in the learning data construction unit 540.

구체적으로, 학습 데이터 관리부(542)는 NMT 엔진 관리부(541)에서 생성한 학습 데이터로 다양한 조건 하에서 패러프레이징 모델을 생성하여 학습시키고, 기 구축된 Test Set 기반으로 품질평가를 진행하여 우수한 성과를 거둔 NMT 모델 및 번역에 사용된 언어는 지속적으로 사용하고, 그렇지 않은 NMT 모델 및 번역에 사용된 언어는 사용 빈도수를 줄이거나 아예 제외하는 등의 학습 데이터 품질 관리 역할도 수행한다. 또한, NMT 엔진 관리부(541)는 패러프레이징 엔진을 활용한 검증을 통해 해당 NMT 엔진의 품질평가를 진행하여 저품질의 학습 데이터를 생성한 NMT 엔진의 가중치를 줄이거나, 제외할 수 있다.Specifically, the learning data management unit 542 is a learning data generated by the NMT engine management unit 541, generates and trains a paraphrase model under various conditions, conducts quality evaluation based on the established test set, and has achieved excellent results. The language used for the NMT model and the translation is continuously used, and the language used for the NMT model and the translation also serves to manage the quality of learning data, such as reducing or excluding the frequency of use. In addition, the NMT engine management unit 541 may perform a quality evaluation of the corresponding NMT engine through verification using a paraphrase engine to reduce or exclude the weight of the NMT engine that generated low-quality learning data.

보다 구체적으로, 학습 데이터 관리부(542)는 학습 데이터를 실제 사용자 질의와 NMT에서 생성된 질의를 쌍(pair)으로 하여 구성할 수 있다.More specifically, the learning data management unit 542 may configure the learning data as a pair of an actual user query and a query generated by the NMT.

또한, 학습 데이터 관리부(542)는 NMT 엔진 관리부(541)에서 생성된 학습 데이터를 일정한 규칙으로 분류하고, 정확한 패러프레이징 엔진 훈련 및 품질 비교를 위해 각 단계별로 일정 수량을 확보한다. 일례로, 같은 사용자 질의에 기반하여, 구글 NMT 엔진을 사용하여 한국어 → 영어 → 한국어로 번역한 학습 데이터와 네이버 NMT 엔진을 사용하여 한국어 → 영어 → 한국어로 번역한 학습 데이터 수량을 일정하게 확보하며, 확보된 수량은 엔진별로 동일할 수 있다.In addition, the training data management unit 542 classifies the training data generated by the NMT engine management unit 541 into a certain rule, and secures a certain quantity for each step for accurate paraphrase engine training and quality comparison. As an example, based on the same user query, a certain amount of learning data translated into Korean → English → Korean using the Google NMT engine and Korean → English → Korean using the Naver NMT engine are secured, The quantity secured may be the same for each engine.

또한, 학습 데이터 관리부(542)는 NMT 엔진 사용별, 번역 언어 단계 및 종류별로 일정 수량 이상의 패러프레이징 모델을 훈련시키고, 각 엔진 및 언어별로 훈련시킨 패러프레이징 모델의 정확도를 기 구축한 test set을 통해 비교한다. 이 때, test set은 패러프레이징 모델에 훈련시키지 않은 실제 사용자 질의 및 테스트 질의로 구성되며, 테스트 질의를 패러프레이징 모델에 넣어 실제 사용자 질의가 올바로 도출되는지를 기준으로 평가한다.In addition, the training data management unit 542 trains a parametric model of a certain number or more for each NMT engine use, translation language level, and type, and through a test set that establishes the accuracy of the paraphrase model trained for each engine and language. Compare. At this time, the test set consists of real user queries and test queries that are not trained in the paraphrase model, and is evaluated based on whether the actual user query is correctly derived by putting the test query into the paraphrase model.

또한, 학습 데이터 관리부(542)는 주어진 test set의 결과에 따라, 각 NMT 모델 및 번역 언어 단계 및 종류별 결과를 합산한다. 해당 결과에 따라, NMT 엔진관리에서 어떤 엔진을 더 많이 사용하고, NMT 별로 어떤 번역 방식을 주로 사용할지 자동으로 피드백하여 학습 데이터 생성 과정에 반영한다. 이때, 학습 데이터 관리부(542)는 성능 평가를 기준으로 차후 학습 데이터 생성 수량을 결정할 수 있다. 성능 평가를 나타내는 수식은 {(평가 결과)-(기본 모델 성능)}/(기본 모델 성능)이며, 해당 수식 결과를 기준으로 학습 데이터 관리부(542)는 학습 데이터 수량의 총량을 조정한다.In addition, the learning data management unit 542 sums the results of each NMT model and translation language level and type according to the result of the given test set. According to the result, the NMT engine management automatically feedbacks which engine is used more and which translation method is mainly used for each NMT and is reflected in the learning data generation process. At this time, the learning data management unit 542 may determine the amount of future learning data generation based on the performance evaluation. The formula representing the performance evaluation is {(evaluation result)-(basic model performance)}/(basic model performance), and the learning data management unit 542 adjusts the total amount of the training data quantity based on the result of the formula.

일례로 하기의 [표 1] 과 같은 결과가 나왔다고 하면. 구글 NMT의 한국어 → 영어 → 한국어 기반의 학습 데이터를 20% 더 많이 생성하고, 동시에 네이버 NMT의 한국어 → 영어 → 일어 → 한국어 기반의 학습 데이터를 20%로 더 많이 생성한다. 이를 통해 번역 엔진 별 번역 언어 순서 및 품질을 기준으로, 더 좋은 학습 데이터가 자동적으로 생성될 수 있다.For example, if the results shown in [Table 1] below were produced. Google NMT generates 20% more Korean → English → Korean-based learning data, and at the same time creates more Naver NMT Korean → English → Japanese → Korean-based learning data at 20%. Through this, based on the translation language order and quality for each translation engine, better learning data can be automatically generated.

번역 모델Translation model 구글 NMTGoogle NMT 구글 NMTGoogle NMT 네이버 NMTNaver NMT 네이버 NMTNaver NMT 번역 언어Translation language 영어English 영어->일어English->Japanese 영어English 영어->일어English->Japanese 학습 데이터 수량Quantity of training data 10,00010,000 10,00010,000 10,00010,000 10,00010,000 기본 모델 성능Basic model performance 5050 5050 5050 5050 평가 결과Evaluation results 6060 5050 5050 6060 성능 평가Performance evaluation 20%20% -- -- 20%20%

다음으로, 모델 배포부(543)는 학습 데이터 기반으로 학습된 딥러닝 기반의 패러프레이징 모델을 실제로 사용할 수 있도록 배포하고 패러프레이징 모델의 앙상블을 생성한다(S84). 모델 배포부(543)는 실제 작업 수행 시 성능 향상을 위해 복수의 패러프레이징 모델을 묶어, 앙상블 형태로 패러프레이징 엔진에서 사용할 수 있도록 한다.Next, the model distribution unit 543 distributes the deep learning-based paraphrase model learned based on the training data so that it can be actually used and generates an ensemble of the paraphrase model (S84). The model distribution unit 543 bundles a plurality of paraphrase models in order to improve performance when performing an actual operation, and enables them to be used in a paraphrase engine in an ensemble form.

마지막으로, 학습 데이터 구축부(540)는 앙상블 형태의 패러프레이징 엔진을 정기적으로 서비스에 적용(S85)하여, 항상 최신의 품질 높은 엔진이 서비스에 적용될 수 있도록 한다.Finally, the learning data construction unit 540 regularly applies the ensemble-type paraphrase engine to the service (S85), so that the latest high-quality engine can be applied to the service at all times.

이상 설명된 본 발명에 따른 실시예는 컴퓨터 상에서 다양한 구성요소를 통하여 실행될 수 있는 컴퓨터 프로그램의 형태로 구현될 수 있으며, 이와 같은 컴퓨터 프로그램은 컴퓨터로 판독 가능한 매체에 기록될 수 있다. 이때, 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다.The embodiment according to the present invention described above may be implemented in the form of a computer program that can be executed through various components on a computer, and such a computer program can be recorded on a computer-readable medium. In this case, the medium may continuously store a program executable on a computer or may be stored for execution or download. In addition, the medium may be various recording means or storage means in the form of a combination of single or several hardware, and is not limited to a medium directly connected to a computer system, but may be distributed on a network. Examples of the medium include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks, And program instructions including ROM, RAM, flash memory, and the like. In addition, examples of other media may include an application store for distributing applications or a recording medium or storage medium managed by a site, server, or the like that supplies or distributes various software.

이상에서 본 발명이 구체적인 구성요소 등과 같은 특정 사항과 한정된 실시예 및 도면에 의하여 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위하여 제공된 것일 뿐, 본 발명이 상기 실시예에 한정되는 것은 아니며, 본 발명이 속하는 기술분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정과 변경을 꾀할 수 있다.In the above, the present invention has been described by specific matters such as specific components and limited examples and drawings, but it is provided to help a more comprehensive understanding of the present invention, and the present invention is not limited to the above embodiments, Those skilled in the art to which the invention pertains may seek various modifications and changes from these descriptions.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위는 본 발명의 사상의 범주에 속한다고 할 것이다.Accordingly, the spirit of the present invention is not limited to the above-described embodiments, and should not be determined, and the scope of the spirit of the present invention as well as the claims to be described later, as well as all ranges that are equivalent to or equivalently changed from the claims Would belong to

Claims

A user query receiving unit that receives a user query from a user terminal;
A query extension unit generating a question template by analyzing a user query and generating a similar question template using a natural language processing and a deep learning model; And
A learning data construction unit that generates learning data for training the query expansion unit;
Artificial intelligence question and answer system comprising a.

According to claim 1,
The question template and the similar question template are semantic triple-based question templates consisting of entities, attributes, and instant answers.

According to claim 1,
The learning data construction unit,
Artificial intelligence query response system that translates the first sentence in Korean into a specific foreign language, translates the first sentence translated into a specific foreign language back into Korean, acquires the second sentence, and builds the generated second sentence into learning data .

According to claim 1,
The query expansion unit,
A natural language expansion module that processes the user query in natural language; And
A paraphrase engine that generates a similar question template through paraphrasing the natural language processed user query; Including, artificial intelligence question and answer system.

According to claim 1,
When the user query and the generated question template do not match, the artificial intelligence query response system provides an immediate answer corresponding to the generated question template to the user terminal.

A user query receiving step of receiving a user query from a user terminal;
A query expansion step of generating a question template by analyzing a user query and generating a similar question template using a natural language processing and a deep learning model; And
A training data construction step of generating training data for training the second query expansion unit;
Artificial intelligence query answering method comprising a.

The method of claim 6,
The question template and the similar question template are semantic triple-based question templates consisting of entities, attributes, and instant answers.

The method of claim 6,
The learning data construction step,
Artificial intelligence query response method for translating the first sentence in Korean into a specific foreign language, translating the first sentence translated into a specific foreign language back into Korean, obtaining a second sentence, and constructing the generated second sentence as learning data .

The method of claim 6,
The second query expansion step,
Processing the user query in natural language; And
Generating a similar question template through paraphrasing the natural language processed user query; Including, artificial intelligence question and answer method.

The method of claim 6,
If the user query and the generated question template do not match, providing an immediate answer corresponding to the generated question template to the user terminal, artificial intelligence query response method.

A user query receiving unit that receives a user query from a user terminal;
A query expansion unit generating a similar question template of the user query using a paraphrase engine;
It includes; a learning data construction unit for generating training data for training the query expansion unit using an NMT (Neural Machine Translation) engine.
The learning data construction unit,
Using the NMT engine, the first sentence in Korean is translated into a specific foreign language, the first sentence translated into a specific foreign language is translated back into Korean to obtain a second sentence, and the generated second sentence is constructed as learning data. ,
Artificial intelligence question and answer system.

A user query receiving step of receiving a user query from a user terminal;
A query expansion step of generating a similar question template of the user query using a paraphrase engine;
A training data construction step of generating training data for training the query extension using a neural machine translation (NMT) engine; Including,
The learning data construction step,
Using the NMT engine, the first sentence in Korean is translated into a specific foreign language, the first sentence translated into a specific foreign language is translated back into Korean to obtain a second sentence, and the generated second sentence is constructed as learning data. ,
Artificial intelligence question and answer system comprising a.

The method of claim 12,
The similar question template is a semantic triple (semantic triple) based question template consisting of an entity (attribute) and attributes (instant answer), artificial intelligence query response system.

A user query receiving step of receiving a user query from a user terminal;
A query expansion step of generating a similar question template of the user query using a paraphrase engine;
It includes; learning data construction step of generating training data for training the query expansion unit using an NMT (Neural Machine Translation) engine;
The learning data construction step,
Using the NMT engine, the first sentence in Korean is translated into a specific foreign language, the first sentence translated into a specific foreign language is translated into Korean again to obtain a second sentence, and the generated second sentence is constructed as learning data. ,
Artificial intelligence query answering method comprising a.

The method of claim 14,
The similar question template is a semantic triple based question template consisting of an entity, an attribute, and an instant answer.

A user query receiving unit that receives a user query from a user terminal;
A query expansion unit generating a similar question template of the user query using a paraphrase engine;
It includes; a learning data construction unit for generating training data for training the query expansion unit using an NMT (Neural Machine Translation) engine.
The learning data construction unit,
An NMT engine management unit that translates and retranslates user log data into the neural network-based NMT engine to generate learning data; And
Learning data for storing learning data generated by the NMT engine management unit, learning a paraphrase model that can be applied to the paraphrase engine using the generated learning data, and testing and verifying the paraphrase model Management; Including, artificial intelligence question and answer system.