KR20190046062A

KR20190046062A - Method and apparatus of dialog scenario database constructing for dialog system

Info

Publication number: KR20190046062A
Application number: KR1020170139164A
Authority: KR
Inventors: 윤재민
Original assignee: 얄리주식회사
Priority date: 2017-10-25
Filing date: 2017-10-25
Publication date: 2019-05-07

Abstract

본 발명은 본 발명은 대화 시나리오 데이터베이스 구축에 관한 것으로서, 보다 상세하게는 대화 시나리오를 SNS, 라디오, 방송에서 자동으로 검색, 수집, 정제, 학습하여 자동으로 대화 영역을 확장하거나 대화 품질을 개선시키고자 하는 온톨로지 대화 관계망에 적용되는 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 방법에 관한 것이다.
이와 같은 목적을 달성하기 위한 본 발명의 특징은 온톨로지 대화 관계망에 적용되는 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 방법으로서, (a) 대화 형식의 음성 파일 또는 대화 형식의 게시글로부터 문장을 추출하는 단계; (b) 상기 단계 (a)에서 추출된 문장을 질문과 답변으로 분류하는 단계; (c) 상기 단계 (b)에서 분류된 질문과 답변 문장이 서로 대응되어 연결되는지를 감정 및 의도, 긍정 및 부정으로 선별하고, 이 선별된 질문과 답변 문장을 리스트 형태인 시나리오로 생성하는 단계; (d) 상기 단계 (c)에서 생성된 리스트 형태인 시나리오의 질문과 답변 문장을 의미벡터로 변환하여 학습하는 단계; 및 (e) 상기 단계 (d)에서 학습된 시나리오를 연속적인 시나리오 의미벡터로 데이터베이스화 시키는 단계를 포함한다. The present invention relates to the construction of a dialogue scenario database, and more particularly to a dialogue scenario database which automatically searches, collects, refines, and learns a dialogue scenario in SNS, radio, and broadcast to automatically expand the dialogue area or improve the conversation quality To a dialog scenario database construction method for a dialog system applied to an ontology dialog network.
According to another aspect of the present invention, there is provided a method for constructing a dialog scenario database for an interactive system applied to an ontology dialog network, the method comprising the steps of: (a) extracting a sentence from a conversation- (b) classifying the sentence extracted in the step (a) as a question and an answer; (c) selecting whether the question and answer sentences classified in step (b) are linked and connected to each other by emotion, intention, affirmation, and negation, and creating the selected question and answer sentence as a list-type scenario; (d) converting a question and an answer sentence of a scenario, which is a list type generated in step (c), into a semantic vector and learning; And (e) databaseing the scenarios learned in step (d) into consecutive scenario semantic vectors.

Description

FIELD OF THE INVENTION [0001] The present invention relates to a method and apparatus for constructing a dialogue scenario database for a dialogue system,

본 발명은 대화 시나리오 데이터베이스 구축에 관한 것으로서, 보다 상세하게는 대화 시나리오를 SNS, 라디오, 방송에서 자동으로 검색, 수집, 정제, 학습하여 자동으로 대화 영역을 확장하거나 대화 품질을 개선시키고자 하는 온톨로지 대화 관계망에 적용되는 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 방법에 관한 것이다. The present invention relates to the construction of a dialogue scenario database, and more particularly, to an ontology dialogue service in which dialogue scenarios are automatically searched, collected, refined, and learned in SNS, radio, And a dialogue scenario database construction method for an interactive system applied to a network.

인간은 인간 이외의 대상과 자연스럽게 대화를 주고받는 오랜 꿈을 가지고 왔다. 현재 인공지능(AI)과 빅데이터가 몰고 온 제4차 산업혁명이 진행되고 있으며, 인공지능의 핵심은 인간과 사물과의 자연스러운 대화 커뮤니케이션이다.Humans have long dreamed to communicate naturally with people other than humans. Artificial Intelligence (AI) and Big Data are driving the fourth industrial revolution, and at the core of artificial intelligence is the natural dialog communication between humans and objects.

그러나 현재까지 대화 시나리오는 사람이 직접 시나리오를 정제해서 구축하므로, 많은 비용과 시간, 인력이 소요되었다. 이렇게 사람이 직접 시나리오를 수집하고 정제해서 구축하므로 상대적으로 많은 시간이 필요로 하게 되며, 이로써 과거의 대화를 할 수 밖에 없어 현재 발생하고 있는 사건이나 사고, 각종 트랜드나 이슈에 대한 즉각적인 대화가 가능하지 못하였다. However, until now, the dialogue scenarios have been costly, time-consuming, and labor-intensive, since people manually build the scenarios by refining them. This allows people to collect and refine their own scenarios, which will require a relatively long time, which will allow them to engage in conversations in the past, allowing immediate conversations about current incidents or accidents, trends or issues. I did not.

즉 제한된 영역에서 제한된 대화만 구사하게 되므로 대화 영역을 확장하거나 대화품질을 개선하지 못하는 한계가 있었다. That is, since only a limited dialogue is used in a limited area, there is a limitation in expanding the dialog area or improving the conversation quality.

본 발명은 이와 같은 문제점을 해결하기 위하여 창안된 것으로서, 대화 시나리오를 SNS, 라디오, 방송에서 자동으로 검색, 수집, 정제, 학습하여 자동으로 대화 영역을 확장하거나 대화 품질을 개선시키고자 하는 온톨로지 대화 관계망에 적용되는 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 방법을 제공하는데 목적이 있다. It is an object of the present invention to provide an ontology dialogue network in which dialogue scenarios are automatically searched, collected, refined, and learned in SNS, radio, and broadcasting, The present invention provides a method for constructing a dialogue scenario database for an interactive system applied to a dialogue scenario.

이와 같은 목적을 달성하기 위한 본 발명에 따른 특징은 온톨로지 대화 관계망에 적용되는 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 방법으로서, (a) 대화 형식의 음성 파일 또는 대화 형식의 게시글로부터 문장을 추출하는 단계; (b) 상기 단계 (a)에서 추출된 문장을 질문과 답변으로 분류하는 단계; (c) 상기 단계 (b)에서 분류된 질문과 답변 문장이 서로 대응되어 연결되는지를 감정 및 의도, 긍정 및 부정으로 선별하고, 이 선별된 질문과 답변 문장을 리스트 형태인 시나리오로 생성하는 단계; (d) 상기 단계 (c)에서 생성된 리스트 형태인 시나리오의 질문과 답변 문장을 의미벡터로 변환하여 학습하는 단계; 및 (e) 상기 단계 (d)에서 학습된 시나리오를 연속적인 시나리오 의미벡터로 데이터베이스화 시키는 단계를 포함한다. According to another aspect of the present invention, there is provided a method of constructing a dialog scenario database for an interactive system applied to an ontology dialog network, the method comprising the steps of: (a) extracting a sentence from a conversation- (b) classifying the sentence extracted in the step (a) as a question and an answer; (c) selecting whether the question and answer sentences classified in step (b) are linked and connected to each other by emotion, intention, affirmation, and negation, and creating the selected question and answer sentence as a list-type scenario; (d) converting a question and an answer sentence of a scenario, which is a list type generated in step (c), into a semantic vector and learning; And (e) databaseing the scenarios learned in step (d) into consecutive scenario semantic vectors.

바람직하게는 상기 단계 (e) 이후, 상기 온톨로지 다차원 공간상에 상기 데이터베이스화 시킨 연속적의 의미벡터를 의미단어 또는 의미노드 그리고 의미큐브 중 하나로 표시하는 단계를 더 포함한다. Preferably, the step (e) further comprises the step of displaying the continuous semantic vector converted into the database on the ontology multidimensional space as one of a semantic word, a semantic node and a semantic cube.

바람직하게는 상기 단계 (a)의 대화 형식의 음성 파일은 음성인식을 통하여 텍스트로 변경한 후 추출되는 것이다. Preferably, the interactive speech file of step (a) is extracted after being converted into text through speech recognition.

바람직하게는 상기 단계 (a)의 대화 형식의 게시글로부터의 추출은, 특정 단어(word)를 검색어로 지정하며, 이 지정된 검색어로 검색된 게시글 및 댓글을 데이터 파싱하여 추출하는 것이다. Preferably, the step (a) extracts a specific word (word) from a conversation-type article, and parses and extracts the posts and comments retrieved with the specified search term.

바람직하게는 상기 단계 (a)에서 추출된 문장은 의미벡터로 자동 변환되는 것이다. Preferably, the sentence extracted in step (a) is automatically converted into a semantic vector.

이와 같은 목적을 달성하기 위한 본 발명의 다른 특징은 문장을 추출하는 문장추출부; 상기 문장추출부에 의하여 추출된 문장을 질문과 답변으로 분류하는 문장분석부; 상기 문장분석부에 의하여 분류된 문장이 서로 대응되어 대화가 연결되는 리스트 형태로 시나리오를 생성하는 대화 시나리오 생성부; 상기 대화 시나리오 생성부에 의하여 생성된 대화 시나리오의 문장을 학습하는 대화 시나리오 학습부; 상기 대화 시나리오 학습부에 의하여 학습된 대화 시나리오를 의미벡터로 연속화하여 저장하는 대화 시나리오 데이터베이스; 및 상기 대화 시나리오 데이터베이스에 저장된 상기 대화 시나리오의 연속적인 의미벡터를 온톨로지 다차원 공간상에 의미단어 또는 의미노드 그리고 의미큐브 중 하나로 표시하는 온톨로지 관계 매핑부를 포함한다. According to another aspect of the present invention, there is provided a sentence extracting unit for extracting a sentence. A sentence analyzing unit for classifying the sentence extracted by the sentence extracting unit into a question and an answer; A dialog scenario generation unit for generating a scenario in which a sentence classified by the sentence analysis unit corresponds to each other and a dialog is connected; A dialog scenario learning unit for learning a sentence of a dialogue scenario generated by the dialogue scenario generating unit; A dialogue scenario database for storing the dialogue scenarios learned by the dialogue scenario learning unit as semantic vectors and storing them; And an ontology relationship mapping unit for displaying successive semantic vectors of the conversation scenario stored in the conversation scenario database as one of semantic words, meaning nodes, and semantic cubes on the ontology multidimensional space.

본 발명에 의하면, 대화 시나리오 수집, 정제, 구축을 자동화하므로 대화 시나리오 구축비용을 절감할 수 있는 효과가 있다. According to the present invention, it is possible to reduce the construction cost of the dialogue scenario by automating the dialogue scenario collection, refinement, and construction.

또한 사람이 직접 시나리오를 수집하면, 정제해서 구축하는데 많은 시간이 필요하므로 과거의 대화를 하게 되지만, 대화 시나리오를 자동으로 수집해서 대화에 반영하기 때문에 현재 발생하고 있는 사건이나 사고, 각종 트랜드나 이슈에 대한 즉각적인 대화가 가능해서 대화 품질을 높이는 효과가 있다.In addition, if a person directly collects scenarios, it takes a lot of time to refine and construct them, so that conversations are made in the past. However, since conversation scenarios are automatically collected and reflected in conversations, So that the conversation quality can be improved.

또한 다양한 주제에 대해서 끊임없이 대화 시나리오를 수집하기 때문에 다양한 관점이 반영된 주제에 대한 대화가 가능한 효과가 있다.In addition, since conversation scenarios are constantly collected on various topics, it is possible to have conversations on topics reflecting various perspectives.

도 1은 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 방법을 설명하기 위한 순서도.
도 2는 본 발명에 따른 대화 시스템을 위한 대화 시나리오의 데이터베이스 구축 장치를 개략적으로 도시한 도면.
도 3은 도 2에 따른 대화 시스템을 위한 대화 시나리오의 데이터베이스 구축 장치를 세부적으로 도시한 도면.
도 4 내지 도 5는 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 중 문장 추출의 일예를 나타낸 화면.
도 6은 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스의 구축 중 문장 분석의 일예를 나타낸 화면.
도 7은 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 중 대화 시나리오 생성의 일예를 나타낸 화면.
도 8은 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 중 대화 시나리오 학습의 일예를 나타낸 화면.
도 9는 본 발명에 따라 구축된 대화 시스템을 위한 대화 시나리오 데이터베이스가 온톨로지 대화 관계망에 표시된 일예를 나타낸 도면.
도 10은 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 표현된 일예.
도 11은 본 발명에 따른 대화 시나리오 데이터페이스에 저장된 대화 시나리오가 3차원 저작도구로 표현된 화면.
도 12는 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 3차원 저작도구로 표현된 시나리오 입력 화면의 일예.
도 13은 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 3차원 저작도구로 표현된 시나리오 수정 화면의 일예.
도 14는 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 3차원 저작도구로 표현된 시나리오 삭제 화면의 일예.
도 15는 본 발명에 따른 대화 시나리오 자동 수집 및 온톨로지 대화 관계망을 이용한 연속 대화가 이루어지는 시퀀스를 블럭 다이어그램으로 도시한 도면.
도 16은 본 발명에 따른 대화 시나리오 자동 수집 및 온톨로지 대화 관계망을 이용한 연속 대화가 이루어지는 시퀀스를 순서도로써 도시한 도면.
도 17은 본 발명에 따른 온톨로지 대화 관계망을 이용한 연속 대화 중, 신규 대화문장이 입력되었을 경우, 온톨로지 대화 관계망으로 매핑하는 방법을 수행하는 순서도.
도 18은 본 발명에 따른 온톨로지 대화 관계망을 이용한 연속 대화를 이용하여 대화 품질 개선을 구현하기 위한 시퀀스를 블럭 다이어그램으로 도시한 도면.
도 19는 본 발명에 따른 온톨로지 대화 관계망을 이용한 연속 대화 시스템의 구성을 나타내는 도면.
도 20은 본 발명에 따른 온톨로지 대화 관계망 구조의 일 실시예를 도시한 도면.
도 21은 본 발명에 따른 온톨로지 대화 관계망에서 일반대화 분류 구조의 일 실시예를 도시한 도면.
도 22는 본 발명에 따른 온톨로지 대화 관계망에서 전문대화 분류 구조의 일 실시예를 도시한 도면.
도 23은 본 발명에 따른 온톨로지 대화 관계망에서 전문대화 중 병원 콜센터에서 상담원과 주고받는 대화 분류 구조의 일 실시예를 도시한 도면.
도 24는 본 발명에 따른 온톨로지 대화 관계망에서 일상대화와 감성대화 및 전문대화가 연결되는 분류 구조의 일 실시예를 도시한 도면.1 is a flowchart for explaining a method of constructing a dialogue scenario database for an interactive system according to the present invention;
Figure 2 schematically illustrates a database building apparatus for a dialog scenario for an interactive system according to the present invention;
3 is a detailed view of a database construction apparatus of a dialogue scenario for the dialog system according to FIG.
FIG. 4 to FIG. 5 are views illustrating an example of sentence extraction during the construction of a dialogue scenario database for the dialog system according to the present invention;
6 is a screen showing an example of sentence analysis during construction of a dialogue scenario database for the dialog system according to the present invention;
7 is a screen showing an example of dialogue scenario generation during dialogue scenario database construction for an interactive system according to the present invention;
8 is a screen showing an example of conversation scenario learning during construction of a conversation scenario database for the conversation system according to the present invention.
9 is a diagram illustrating an example of a dialog scenario database for an interactive system constructed in accordance with the present invention displayed on an ontology dialogue network;
Figure 10 is an illustration of a dialog scenario stored in the dialog scenario database according to the present invention;
11 is a screen in which the dialog scenario stored in the dialog scenario data face according to the present invention is expressed by a three-dimensional authoring tool.
12 is an example of a scenario input screen in which a dialog scenario stored in the dialog scenario database according to the present invention is expressed by a three-dimensional authoring tool;
13 is an example of a scenario modification screen in which the dialog scenario stored in the dialog scenario database according to the present invention is expressed by a three-dimensional authoring tool.
14 is an example of a scenario deletion screen in which a dialog scenario stored in the dialog scenario database according to the present invention is expressed by a three-dimensional authoring tool;
FIG. 15 is a block diagram illustrating a sequence in which continuous conversation is performed using the conversation scenario automatic collection and ontology conversation network according to the present invention; FIG.
FIG. 16 is a flowchart showing a sequence in which continuous conversation is performed using an automatic conversation scenario collection and ontology conversation network according to the present invention; FIG.
FIG. 17 is a flowchart illustrating a method of mapping a new conversation sentence into an ontology conversation network, when a new conversation sentence is input, in a continuous conversation using the ontology conversation network according to the present invention.
FIG. 18 is a block diagram illustrating a sequence for implementing conversation quality improvement using continuous conversation using an ontology conversation network according to the present invention; FIG.
19 is a diagram showing a configuration of a continuous conversation system using an ontology conversation network according to the present invention;
FIG. 20 illustrates an embodiment of an ontology talk network structure according to the present invention; FIG.
FIG. 21 illustrates an embodiment of a general conversation classification structure in an ontology conversation network according to the present invention; FIG.
22 illustrates an embodiment of a professional conversation classification structure in an ontology conversation network according to the present invention;
23 illustrates an example of a conversation classification structure that is exchanged with an agent in a hospital call center during a professional conversation in an ontology conversation network according to the present invention.
24 illustrates an example of a classification structure in which a daily conversation, an emotional conversation, and a professional conversation are connected in an ontology conversation network according to the present invention.

이하 첨부된 도면을 참조로 본 발명의 바람직한 실시예를 상세히 설명하기로 한다. 이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Prior to this, terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary terms, and the inventor should appropriately interpret the concepts of the terms appropriately It should be interpreted in accordance with the meaning and concept consistent with the technical idea of the present invention based on the principle that it can be defined. Therefore, the embodiments described in this specification and the configurations shown in the drawings are merely the most preferred embodiments of the present invention and do not represent all the technical ideas of the present invention. Therefore, It is to be understood that equivalents and modifications are possible.

도 1은 본 발명에 다른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 방법을 설명하기 위한 순서도이다. 1 is a flowchart illustrating a method for constructing a dialogue scenario database for a dialog system according to the present invention.

도 1에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 방법은, 먼저 문장을 추출한다(S110). 문장의 추출은 대화 형식의 음성 파일 또는 대화 형식의 게시글로부터 추출한다. In the dialogue scenario database building method for the dialog system according to FIG. 1, a sentence is extracted first (S110). Extraction of sentences is extracted from conversational voice files or conversational posts.

단계 (S110)에서 문장이 추출되면, 추출된 문장을 질문과 답변으로 분류한다(S120).If a sentence is extracted in step S110, the extracted sentence is classified into a question and an answer (S120).

그리고 단계 (S120)에서 분류된 질문과 답변 문장이 서로 대응되어 연결되는지를 감정 및 의도, 긍정 및 부정으로 선별(S130)하고, 이 선별된 질문과 답변 문장을 리스트 형태인 시나리오로 생성한다(S140).Then, it is determined whether the question and answer sentences classified in step S120 correspond to each other and connected to each other by the sentiment, intention, affirmation, and negation (S130), and the selected question and answer sentence is generated as a list-type scenario ).

이어서, 생성된 리스트 형태인 시나리오의 질문과 답변 문장을 의미벡터로 변환하여 시나리오를 학습하고(S150), 학습된 시나리오를 연속적인 시나리오 의미벡터로 데이터베이스화 시킨다(S160).Subsequently, the scenario and the question and answer sentences of the generated scenario, which are the list forms, are converted into semantic vectors to learn a scenario (S150), and the learned scenario is converted into a database with a continuous scenario meaning vector (S160).

이렇게 데이터베이스화 된 시나리오의 연속적의 의미벡터를 온토롤지 다차원 공간에 의미단어 또는 의미노드 그리고 의미큐브 중 하나로 표시한다(S160). The consecutive semantic vectors of the database-dated scenario are displayed in the ontology multi-dimensional space as one of semantic words, meaning nodes, and semantic cubes (S160).

도 1에 따른 본 발명의 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 방법을 도 2 내지 도 3의 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 장치를 통하여 설명하면 다음과 같다. The method for constructing a dialog scenario database for the dialog system of the present invention shown in FIG. 1 will be described with reference to a dialogue scenario database establishing apparatus for the dialog system of FIGS.

도 2는 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 장치를 개략적으로 도시한 도면이며, 도 3은 도 2에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 장치를 세부적으로 도시한 도면이다. FIG. 2 is a view schematically showing an apparatus for constructing a dialogue scenario database for an interactive system according to the present invention, and FIG. 3 is a detailed view illustrating an apparatus for constructing a dialogue scenario database for the interactive system according to FIG.

도 2 내지 도 3에 도시된 바와 같이, 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 장치(100)는 문장을 추출하는 문장추출부(110)와, 문장추출부(110)에 의하여 추출된 문장을 질문과 답변으로 분류하는 문장분석부(120)과, 문장분석부(120)에 의하여 분류된 문장이 서로 대응되어 대화가 연결되는 리스트 형태로 시나리오를 생성하는 대화 시나리오 생성부(130)와, 대화 시나리오 생성부(130)에 의하여 생성된 대화 시나리오의 문장을 학습하는 대화 시나리오 학습부(140)와, 대화 시나리오 학습부(140)에 의하여 학습된 대화 시나리오를 의미벡터로 연속화하여 저장하는 대화 시나리오 데이터베이스(150)와, 대화 시나리오 데이터베이스(150)에 저장된 대화 시나리오의 연속적인 의미벡터를 온톨로지 다차원 공간상에 의미단어 또는 의미노드 그리고 의미큐브 중 하나로 표시하는 온톨로지 관계 매핑부(160)을 포함한다. 2 to 3, an apparatus 100 for constructing a dialogue scenario database for an interactive system according to the present invention includes a sentence extraction unit 110 for extracting a sentence, a sentence extraction unit 110 for extracting a sentence extracted by the sentence extraction unit 110, A dialogue scenario generation unit 130 for generating a scenario in which sentences classified by the sentence analysis unit 120 correspond to each other and conversations are connected to each other, A dialogue scenario learning unit 140 for learning the sentence of the dialogue scenario generated by the dialogue scenario generation unit 130, a dialogue scenario storage unit 140 for storing a dialogue scenario, which is learned by the dialogue scenario learning unit 140, The scenario database 150 and the continuous semantic vectors of the dialog scenario stored in the dialog scenario database 150 are stored in the ontology multidimensional space with semantic words or semantic nodes Hitting include ontology mapping relationship unit 160 to display one of the mean cube.

문장추출부(110))은 대화 형식의 음성 파일 또는 대화 형식의 게시글로부터 문장을 추출하는데, 이 문장추출부(110)은 대화 형식의 음성 파일을 추출하는 음성 파일 추출모듈(111)과, 대화 형식의 게시글로부터 문장을 추출하는 게시글 추출모듈(112)을 포함하는데, 도 3에 도시된 바와 같이 콜센터 및 라디오 그리고 TV 방송 등의 대화 형식의 음성 파일은 음성 파일 추출모듈(111)을 통하여 문장이 추출되며, SNS 등에 게시된 대화 형식의 문장은 검색어 생성기(113)에 검색어가 선택되면 게시글 추출모듈(112)에 의하여 게시글을 검색하고 게시글을 파서하므로 문장이 추출된다. 예를 들면, 콜센터에서 제공되는 기존 음성녹취 파일을 음성 인식 후, 텍스트로부터 문장을 추출하거나, 또한 콜센터에서 고객과 상담원 간의 상담 내용을 실시간으로 음성 인식 후, 텍스트로부터 문장을 추출 할 수도 있다. 한편 SNS는 트위터나 페이스 북 등에서 다양한 주제에 대해서 대화를 나눈 게시글로부터 문장을 추출하는데, 먼저 준비된 검색어로 SNS를 검색하고, 이 검색어(특정 단어)로 검색된 게시글 링크의 각각을 추출한 뒤에, 각 링크의 게시글이 가르키는 최초로 입력된 게시글의 처음부터 그 게시글로부터 시작된 토론에 대한 마지막 게시글 까지를 하나의 대화 주제로 생각하여 해당 대화 주제에 대한 게시글을 모두 추출한다.The sentence extracting unit 110 extracts a sentence from an interactive voice file or a conversation type of text. The sentence extracting unit 110 includes a voice file extracting module 111 for extracting an interactive voice file, 3, an interactive voice file, such as a call center, a radio, and a TV broadcast, is sent to a voice file extraction module 111 through a voice file extraction module 111. The voice file extraction module 111 extracts a sentence When a search word is selected in the search word generator 113, a sentence is extracted because the search is performed by the postexpression module 112 and the posting is parsed. For example, it is possible to extract a sentence from a text after speech recognition of an existing voice recording file provided by a call center, or extract a sentence from text after speech recognition of a consultation content between a customer and an agent in a call center in real time. On the other hand, SNS extracts sentences from posts that talk about various subjects on Twitter or Facebook, searches for SNS with the prepared search words, extracts each of the searched search links (specific words) From the beginning of the first entry pointed to by the posting to the last posting of the discussion starting from that posting as a conversation topic, extract all the posts for that conversation topic.

문장분석부(120)은 문장추출부(110)로부터 추출된 문장을 질문과 답변으로 분류하는데, 분류 전에 음성인식 후 텍스트화되어 추출된 문장에서 오류 단어가 발생하면 음성인식 오류분석을 수행하고 이 오류분석된 문장을 복원하거나, 또는 부적절한 대화를 필터링하여 문장이 질문인지 답변인지를 분석 및 분류한다. 문장분석부(120)은 규칙기반이나 머신러닝기반, 통계기반으로 만들 수도 있고, 규칙기반과 머신러닝기반, 통계기반 중 하나이상의 조합으로 구성될 수도 있다. 문장분석부(120)가 머신러닝 기반 일 때는 지도학습에 의해서 학습이 수행되며, 수행된 결과로 구축된 언어모델 데이터에 의해서 입력 문장이 질문인지 답변인지 분석하게 된다. The sentence analyzing unit 120 classifies sentences extracted from the sentence extracting unit 110 as questions and answers. If an error word occurs in a sentence extracted as text after speech recognition before classification, Restore the error-resolved sentences or filter inappropriate conversations to analyze and classify whether the sentence is a question or an answer. The sentence analysis unit 120 may be configured based on rules, machine learning, statistics, or a combination of at least one of rule based, machine learning based, and statistical based. When the sentence analyzing unit 120 is based on a machine learning system, learning is performed by learning a map, and the input sentence is analyzed to determine whether the input sentence is a question or an answer based on the language model data constructed as a result of the instruction.

대화 시나리오 생성부(130)는 문장분석부(120)을 통해 분석된 질문과 답변 문장이 서로 대응되어 연결되는지를 감정 및 의도, 긍정 및 부정으로 선별하고, 이 선별된 질문과 답변 문장이 서로 연결된 리스트 형태인 시나리오로 생성하며, 시나리오 생성을 위하여 시나리오 문장의 의미벡터 데이터를 이용하며 머신러닝 기반으로 지도학습에 의하여 시나리오를 생성한다.The dialogue scenario generation unit 130 selects whether the question and the answer sentence analyzed through the sentence analysis unit 120 correspond to each other and connected to each other by emotion, intention, affirmation, and negation, and the selected question and answer sentence are connected to each other And the scenario is created by using the semantic vector data of the scenario sentence for the scenario generation and by the map learning based on the machine learning.

대화 시나리오 학습부(140)는 대화 시나리오 생성부(130)에 의하여 생성된 시나리오의 질문 및 답변 문장들을 의미벡터로 변경하여 머신러닝 기반에 의하여 지도학습에 의하여 학습된다.The dialogue scenario learning unit 140 changes the question and answer sentences of the scenario generated by the dialogue scenario generation unit 130 into a semantic vector and learns by map learning based on a machine learning basis.

일반적으로 지도학습(supervised learning)이란, 훈련 데이터로부터 함수를 만들어내는 기계 학습(Machine Learning)이다. 훈련 데이터는 입력 대상(전형적으로 벡터)의 쌍과 원하는 출력으로 구성된다. 함수의 출력은 연속값 일 수 있고 또는 입력 대상의 분류명을 예상할 수도 있다. 지도학습의 일은 단지 소수의 훈련예 인 입력쌍과 목표출력들만을 보고서 유효한 입력대상을 위한 함수의 값을 예측하는 것이다. 이를 위해서 학습기(learner)는 이성적인(reasonable) 방법으로 현재의 데이터로부터 보이지 않는 상황까지 일반화해야 한다.Generally, supervised learning is machine learning that produces functions from training data. The training data consists of a pair of inputs (typically a vector) and a desired output. The output of the function may be a continuous value or it may predict the classification name of the input object. The job of learning a map is to predict the value of a function for a valid input target by reporting only a small number of training examples, the input pair and the target outputs. To do this, the learner has to generalize from the current data to the invisible situation in a reasonable way.

대화 시나리오 데이터베이스(150)는 대화 시나리오가 의미벡터로 데이터베이스화 되어있으며, 대화 시나리오 학습부(140)에서 학습된 시나리오 의미벡터가 저장된다. 이때 대화 시나리오는 문장단위 질문과 답변으로 구성된 의미벡터가 연속적으로 저장된다. 문장, 시나리오의 의미벡터는 인간이 직관적으로 파악하는데 도움을 주기 위해서 하나 이상의 의미 단어로 저장된다. The dialogue scenario database 150 stores the dialogue scenario database as a semantic vector, and the scenario mean vector learned in the dialogue scenario learning unit 140 is stored. At this time, the dialogue scenario consecutively stores semantic vectors composed of sentence-level questions and answers. Sentences, and scenarios are stored as one or more semantic words to help human be intuitively grasped.

온톨로지 관계 매핑부(160)는 앞서 대화 시나리오의 연속적인 의미벡터가 저장된 대화 시나리오 데이터베이스(150)가 구축되면, 온톨로지 대화 관계망에 이 저장된 연속적인 의미벡터의 대화 시나리오를 매핑 및 표시하는 기능을 수행한다. 온톨로지 대화 관계망은 300~600개의 벡터로 구성된 다차원공간이지만, 다차원 벡터를 물리적으로 표현할 수 없기 때문에 3차원으로 공간압축해서 표시한다. 온톨로지 대화 관계망은 3차원 공간상의 하나의 점을 노드로 표현하는데, 각 노드는 데이터베이스(150)에 저장된 단어 의미벡터, 문장 의미벡터, 시나리오 의미벡터로부터 나타낼 수 있다. 여기서 단어 의미벡터와 문장 의미벡터는 기존에 공개된 word2vec, sent2vec 머신러닝 알고리즘을 이용하며, 시나리오도 마찬가지 scenario2vec 형태로 시나리오 자체를 시나리오 의미벡터로 표현할 수 있으며, 시나리오를 질문과 답변 문장들의 의미벡터 시퀀스로 표현할 수도 있다. The ontology relationship mapping unit 160 performs a function of mapping and displaying the dialog scenario of the continuous semantic vector stored in the ontology dialog network when the dialog scenario database 150 storing the continuous semantic vector of the dialog scenario is constructed . The ontology dialogue network is a multidimensional space composed of 300 to 600 vectors, but since the multidimensional vectors can not be represented physically, they are compressed in three dimensions and displayed. The ontology conversation network expresses one point on the three-dimensional space as a node, and each node can be represented from a word semantic vector, a sentence semantic vector, and a scenario semantic vector stored in the database 150. Here, the word semantic vector and the sentence semantic vector use the previously disclosed word2vec and sent2vec machine learning algorithms, and the scenarios can be expressed in scenario2vec as the scenario semantic vector, and the scenario can be expressed as the semantic vector of the question and answer sentences. .

온톨로지 대화 관계망에는 온톨로지 관계 매핑부(160)을 통하여 대화 시나리오 데이터베이스(150)에 저장된 대화 시나리오의 연속적 의미벡터를 업로드 할 수 있고, 시나리오 자체를 하나의 의미벡터로 만들어서 업로드 할 수 있으며, 유사 문장, 유사 시나리오를 검색할 때에는 입력한 시나리오와 가장 가까운 시나리오를 의미벡터 공간상에서 거리 계산에 의해서 매핑 할 수 있다. 이때 의미벡터 공간상에 단어, 문장, 시나리오를 표현할 수 있지만, 단어, 문장, 시나리오는 각각 별개의 의미공간을 가진다. 그리고 단어, 문장, 시나리오의 의미공간을 하나의 의미공간에 사상하여 동시에 단어, 문장, 시나리오를 공간상에 표현할 수도 있다. 사상하는 방법 중 문장은 단어 벡터의 합이나 곱 등으로 표현하고, 시나리오는 문장벡터의 합이나 곱 등으로 표현하는 방법 등을 이용하며, 문장 시나리오의 의미벡터는 인간이 직관적으로 파악하는데 도움을 주기 위해서 하나 이상의 의미단어로 표시된다. The ontology dialogue network can upload the continuous semantic vector of the dialogue scenario stored in the dialogue scenario database 150 via the ontology relationship mapping unit 160. The scenario itself can be made into a semantic vector and uploaded, When searching similar scenarios, it is possible to map the scenario closest to the input scenario by distance calculation on the semantic vector space. At this time, words, sentences, and scenarios can be expressed on the semantic vector space, but words, sentences, and scenarios each have a separate semantic space. In addition, words, sentences, and scenarios can be expressed in space while mapping the semantic space of words, sentences, and scenarios to a single semantic space. In the method of mapping, the sentence is expressed as the sum or product of word vectors, and the scenario is expressed as the sum or product of the sentence vectors. The semantic vector of the sentence scenario is used to help the intuitive understanding of the human being And is represented by one or more semantic words.

한편, 대화 시나리오 데이터베이스(150)에 저장된 대화 시나리오는 온톨로지 관계 매핑부(160)를 통하여 온톨로지 대화 관계망에서 관리자가 대화 시나리오를 직접 입력할 수 있으며, 시나리오를 구성하는 문장의 관계를 다음의 [실시예 1]과 같이 의미단어의 연속성(sequency)으로 표시한다. On the other hand, the dialog scenario stored in the dialog scenario database 150 can directly input the dialog scenario in the ontology dialogue network through the ontology relationship mapping unit 160, and the relationship of the sentences constituting the scenario is described in the following [Embodiment 1], as shown in Fig.

[실시예 1][Example 1]

시나리오1 = (질문1) - (답변1) - (질문2) - (답변2) .... (질문N) - (답변N)Scenario 1 = (Question 1) - (Answer 1) - (Question 2) - (Answer 2) .... (Question N) - (Answer N)

시나리오1 = (어디서 봤더라) - (제 이름이 궁금하세요?) - (아니 너를 봤던 장소 말이야) - (우리가 마지막 만났던 장소 말이죠?) .... (그래) - (아마도, 제 생각에는 우리가 지난 얄리 주체 세미나에서 마지막으로 봤던 것 같아요.)Scenario 1 = (Where did you see it?) - (Are you curious about my name?) - (Not where I saw you) - (Where did we last meet? I think I saw it last time at the last Yaliki Seminar.)

시나리오1 = (이름+모호) - (이름+확인) - (장소+모호) - (장소+확인) .... (별명+승낙) - (별명+답변)Scenario 1 = (name + ambiguous) - (name + confirmation) - (place + ambiguous) - (place + confirmation) .... (alias + accept) - (alias + answer)

여기서 의미단어 "이름모호"는 의미단어 "이름"과 의미단어 "모호"의 의미벡터가 결합된 새로운 의미벡터이며, "어디서 봤더라?"와 같은 문장을 대표하는 의미단어이다. Here, the semantic word " name ambiguity " is a new semantic vector that is a combination of the semantic word " name " and the semantic vector of the semantic word " ambiguity ", and is a semantic word representing a sentence such as "

하나의 의미단어는 하나 이상의 문장을 의미적으로 표현하며, 같은 뜻을 가진 수많은 문장을 대표하는 대표단어 라고도 할 수 있다.One semantic word represents one or more sentences semantically, and it can also be called a representative word representing many sentences with the same meaning.

그리고, "의미모호", "이름확인" 등의 의미단어는 "(질문1)", "(답변1)"처럼 하나의 시나리오를 구성하는 각각의 질문, 답변의 연속성을 대표하기도 한다.In addition, semantic vocabulary such as "semantic ambiguity" and "name resolution" represent the continuity of each question and answer constituting one scenario, such as "(Question 1)" and "(Answer 1)".

도 4 내지 도 5는 본 발명에 따른 대화 시스템을 위한 대화 시나리오의 데이터베이스 구축 중 문장 추출의 일예를 나타낸 화면이다. 4 to 5 are views showing examples of sentence extraction during database construction of a dialogue scenario for the dialog system according to the present invention.

도 4에 도시된 바와 같이 문장 추출은 데이터베이스페디아(DBpedia)나 워드넷(WordNet)으로부터 단어(Word)를 추출하여, 이 단어를 검색어로 활용하며, 검색어에 의하여 크롤링 방식 등으로 SNS 게시글 및 댓글 데이터를 웹을 통해서 파싱한다. 이는 트위터에서 게시물이 올라오면 리트윗하면서 특정 주제에 대해서 대화를 주고받는 형식과 비슷하게 댓글이 달리므로 가능하다. As shown in FIG. 4, the sentence extraction extracts a word from a DB pedia or WordNet, uses the word as a search word, searches for SNS posts and comment data Is parsed through the web. This is possible because when a post is posted on Twitter, the comment is similar to the format in which you are retweeting and exchanging conversations about a specific topic.

도 5는 처음 게시물을 올린 사용자의 글에 대해서 리트윗하면서 다른 사용자가 계속 댓글을 달게 되며, 이때 대화 전개가 트리형태와 유사하다. 여기서 각각의 트리를 모두 대화 시나리오로 보며 이해를 돕기 위하여 댓글 아이디 별로 대화 시나리오를 표현하면 다음 [실시예 2]와 같다.FIG. 5 retweets the article of the user who originally posted the article, and the other user continues to comment. At this time, the conversation development is similar to the tree form. Here, each tree is viewed as a dialog scenario, and a dialog scenario is expressed for each comment ID in order to facilitate understanding, as in the following [second embodiment].

[실시예 2] [Example 2]

시나리오1 : ffebreze - hatter365 - fffebrezeScenario 1: ffebreze - hatter365 - fffebreze

시나리오2 : ffebreze - ffebreze - Teahya - ffebreze - TeahyaScenario 2: ffebreze - ffebreze - Teahya - ffebreze - Teahya

도 6은 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 중 문장 분석의 일예를 나타낸 화면으로, 도시된 바와 같이 게시글을 파싱한 뒤에는 해당 게시글이 대화 시나리오에서 질문에 해당하는지 답변에 해당하는지 알 수 없다. 따라서 머신러닝의 지도학습에 의하여 질문과 답변 유형을 분류한다. FIG. 6 is a screen showing an example of sentence analysis during the construction of a dialogue scenario database for the dialog system according to the present invention. As shown in FIG. 6, after parsing a posting, it is determined whether the corresponding posture corresponds to a question in the dialogue scenario none. Therefore, classify question and answer types by instructional learning of machine learning.

도 7은 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 중 시나리오 생성의 일예를 나타낸 화면이고, 도 8은 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 중 대화 시나리오 학습의 일예를 나타낸 화면이다. 도 7에 도시된 바와 같이 하나의 주제에 대해서 질문, 답변이 번갈아가면서 나타나지 않고, 질문과 답변이 중복되어서 발생할 수도 있어서(예; 질문1, 답변1, 질문2, 질문2, 답변2, 질문3, 답변3, 답변3, 답변3 등) 이러한 질문, 답변 중에 시나리오에 적합한 질문과 답변을 선별하고, 도 8에 도시된 바와 같이 선별된 대화 시나리오가 하나의 질문과 그 질문에 대한 하나의 답변의 반복 학습을 보여주고 있다. FIG. 7 is a view showing an example of scenario generation during construction of a dialogue scenario database for the dialog system according to the present invention, and FIG. 8 is a screen showing an example of dialogue scenario learning during dialogue scenario database construction for the dialogue system according to the present invention . As shown in FIG. 7, a question and an answer may not be displayed alternately for one subject, and a question and an answer may overlap (for example, question 1, answer 1, question 2, question 2, answer 2, question 3 , Answer 3, answer 3, answer 3, etc.) Among the questions and answers, questions and answers suitable for the scenario are selected, and the selected dialogue scenario as shown in FIG. 8 is displayed as one question and one answer It shows the iterative learning.

도 9는 본 발명에 따라 구축된 대화 시스템을 위한 대화 시나리오 데이터베이스가 온톨로지 대화 관계망에 매핑 된 일예를 나타낸 화면이다. FIG. 9 is a screen showing an example in which a dialogue scenario database for an interactive system constructed according to the present invention is mapped to an ontology dialogue network.

시나리오는 문장단위 질문과 답변으로 구성된 의미벡터의 시퀀스로 표시된다. 하나의 시나리오를 구성하는 각각의 질문과 답변은 다차원 의미벡터를 가지고 있으며, 각각 질문과 답변 문장이 하나의 대화의도(의미단어)로 자동적으로 각각 매핑된다.The scenario is represented by a sequence of semantic vectors consisting of sentence-level questions and answers. Each question and answer that constitutes a scenario has a multidimensional semantic vector, and each question and answer sentence is automatically mapped to a dialogue intention (meaning word).

상기 대화의도(의미단어)는 word2vec 등을 이용하여, 현재 사용되는 단어와 이 단어의 의미벡터값을 미리 3차원 공간상에 표현하고, 하나 이상의 단어 벡터 조합(합 또는 곱)으로 새로운 대화의도(의미단어)를 3차원 공간상에 표현한다.The dialogue intention (semantic word) expresses the currently used word and the semantic vector value of the word in a three-dimensional space in advance using word2vec or the like, (Meaning words) on a three-dimensional space.

대량의 시나리오가 다차원 의미벡터값으로 입력되면, 이 입력된 의미벡터값과 기존 대화의도(의미단어)의 의미벡터값 사이의 거리를 비교하고, 일정한 값 안에 들어왔을 때, 입력된 의미벡터값에 기존 대화의도(의미단어 또는 의미노드 또는 의미큐브)를 부여하게 된다. (입력된 의미벡터값은 기존 대화의도 이름으로 명명한다)When a large number of scenarios are input as a multidimensional semantic vector value, the distance between the input semantic vector value and the semantic vector value of the existing dialog intention (semantic word) is compared. When the input semantic vector value (Semantic word or semantic node or semantic cube) to the user. (The input semantic vector value is named as the name of the existing conversation)

대량의 시나리오가 다차원 의미벡터값 뿐만 아니라, 대화의도(의미단어 또는 의미노드 또는 의미큐브), 대화문장과 동시에 입력되어 들어올 때에는 각각의 대화의도(의미단어 또는 의미노드 또는 의미큐브)의 공간으로 대화 시나리오(각각의 질문, 답변)를 매핑하면 된다. 이때, 의미단어는 시나리오의 "대화문장"을 자동으로 분석해서 의미단어를 부여할 수도 있고, 사람이 직접 "대화문장"에 어울리는 의미단어를 부착할 수도 있다.When a large number of scenarios are inputted at the same time as the conversation intention (meaning word or semantic node or semantic cube) and dialogue sentence as well as the multidimensional semantic vector value, the space of each conversation intention (semantic word or semantic node or semantic cube) (Each question and answer). At this time, the semantic word may be automatically assigned to a semantic word by automatically analyzing the "conversational sentence" of the scenario, or a person may directly attach a semantic word suitable for the "conversational sentence".

[실시예 3][Example 3]

시나리오1 = [의미단어][대화문장][의미벡터], [의미단어][대화문장][의미벡터] ....Scenario 1 = [Semantic word] [Conversation sentence] [Semantic vector], [Semantic word] [Conversation sentence] [Semantic vector] ....

시나리오1 = [이름모호][어디서 본 것 같아요][2.382, 6.108, ...], [이름확인][제이 이름이 궁금하세요?][8.730, 1,383, ....] ....Scenario 1 = [name ambiguity] [Where do you think] [2.382, 6.108, ...], [Name resolution] [Are you curious about Jay's name?] [8.730, 1,383, ....] ....

[의미단어]를 자동으로 분석하는 방법은 자연어처리 방법 중, 규칙기반, 통계기반, 머신러닝 기반으로 분석할 수 있으며, 머신러닝 기반의 경우, 지도학습에 의한 학습모델을 바탕으로 입력된 문장의 대화의도(의도단어)를 분류하게 된다.The method of automatically analyzing [semantic words] can be analyzed based on natural language processing methods, rule-based, statistical-based, and machine-based. In the case of machine learning based on the learning model, The intent of the conversation (intention word) is classified.

도 10은 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 표현된 일예로서, 대화 시나리오는 보통 2turn(질문1-답변1-질문2-답변2 관계) 이상으로 표시되고, 평면적으로 하나의 공간에 표시할 수 있는 시나리오 개수도 제한 적이며, 현재 시나리오와 공간적으로 가까운 시나리오도 알 수 없는 문제가 발생한다. 또한, 시나리오를 입력하거나 수정하더라도, 다른 시나리오와의 의미 공간상의 상관관계를 전혀 알 수 없는 문제점이 있다.FIG. 10 shows an example of a dialog scenario stored in the dialogue scenario database according to the present invention. The dialogue scenario is usually displayed at 2 turn (relation of question 1 - answer 1 - question 2 - answer 2) The number of scenarios that can be displayed is also limited, and scenarios that are spatially close to the current scenario are also unknown. In addition, even if a scenario is input or modified, there is a problem that the correlation in the semantic space with other scenarios can not be known at all.

대화 시나리오는 의미단어의 연속적인 나열로써 표시하며, 그림 상단 첫 번째 시나리오처럼, "이름모호", "C이름확인", "C승낙", "KE이름아이유"처럼 중복을 방지하기 위해서 특정 문자와 결합하여 표시할 수도 있다.Dialogue scenario means and displayed as a continuous sequence of words, figure like the first scenario, the top, "the name is ambiguous", and certain characters in order to avoid duplication, such as "check C Name", "C yes", "KE name of IU" It can also be combined and displayed.

도 11은 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 3차원 시나리오 저작도구로 표현된 화면으로, 도 11과 같이 시나리오는 3차원 공간상에 표시할 수 있다. 시나리오의 각 질문과 답변은 하나의 의미노드(의미큐브 또는 의미단어)로 표시된다. 각 시나리오에 소속된 의미노드 사이의 의미적인 밀접성에 따라 가까운 공간상에 표시되기도 하고 멀리 떨어진 공간상에 표시되기도 하기 때문에, 시나리오를 구성하는 노드사이의 의미관계 파악이 훨씬 쉽다. FIG. 11 is a screen in which a dialog scenario stored in the dialog scenario database according to the present invention is expressed by a three-dimensional scenario authoring tool, and the scenario can be displayed in a three-dimensional space as shown in FIG. Each question and answer in the scenario is represented by a semantic node (meaning cube or semantic word). Since semantic closeness between semantic nodes belonging to each scenario is displayed in the near space and displayed in the distant space, it is much easier to grasp the semantic relationship between the nodes constituting the scenario.

도 12는 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 3차원 시나리오 저작도구로 표현된 시나리오 입력 화면의 일예로서, 도 12에 도시된 바와 같이 시나리오는 우측 의미단어(의미노드 또는 의미큐브)를 좌측 여백 공간으로 드래그해서 의미단어의 시퀀스(예: 주말일정-일정답변-취미질의-취미답변)로 하나의 시나리오를 표현하고, 이렇게 시나리오 입력이 완료되면 3차원 대화 관계망 속에 시나리오가 자동으로 매핑된다. 그리고 우측 의미단어패턴은 모두 다차원 의미벡터 값을 가지고 있으며, 의미단어의 조합(주말+일정)을 별도로 학습해서 의미벡터값을 추출하거나 기존 의미단어(주말, 일정)의 의미벡터합이나 곱으로 만들 수도 있다.FIG. 12 is an example of a scenario input screen in which a dialog scenario stored in the dialog scenario database according to the present invention is expressed by a three-dimensional scenario authoring tool. As shown in FIG. 12, the scenario includes a right semantic word (semantic node or semantic cube) Dragging to the left margin space, one scenario is represented by a sequence of semantic words (eg, weekend schedule - schedule answer - hobby question - hobby answer). When the scenario input is completed, the scenario is automatically mapped in the three dimensional dialogue network . In addition, the right semantic word pattern has a multidimensional semantic vector value, and the semantic vector value is extracted by learning the combination of the semantic word (weekend + schedule) separately, or the semantic vector sum or product of the existing semantic word It is possible.

우측 의미단어(예:주말일정)는 다양한 문장을 대표하며, "주말일정"을 함의하는 아래와 같은 다양한 문장구성을 가진다.The right semantic word (eg, weekend schedule) represents various sentences, and it has various sentence structures which implies "weekend schedule".

의미단어패턴Meaning word pattern 문장 예Sentence example 주말일정Weekend Schedule 주말 일정이 어떻게 되세요?What is your weekend schedule? 주말 일정을 알려주세요.Please let me know your weekend schedule. 주말 일정이 궁금해요.I am curious about the weekend schedule. 주말 일정을 말해줘요.Tell me your weekend schedule. 주말 일정이 있나요?Do you have a weekend schedule?

도 13은 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 3차원 시나리오 저작도구로 표현된 시나리오 수정 화면의 일예로서, 검색 기능에 의해서 시나리오를 검색할 수 있으며, 검색된 시나리오를 수정하는 기능을 제공한다. 시나리오가 수정되면 3차원 공간상에서 즉시 수정된 시나리오가 반영된다. FIG. 13 is an example of a scenario modification screen in which a dialog scenario stored in the dialog scenario database according to the present invention is expressed by a three-dimensional scenario authoring tool. The scenario may be searched by a search function, and a function of modifying a searched scenario is provided . When the scenario is modified, the modified scenario immediately reflects in the 3D space.

도 14는 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 3차원 시나리오 저작도구로 표현된 시나리오 삭제 화면의 일예로서, 시나리오는 삭제할 수 있으며, 삭제된 시나리오는 3차원 공간상에서도 완전히 삭제되어서 더 이상 볼 수 없게 된다. FIG. 14 is an example of a scenario deletion screen in which a dialog scenario stored in the dialog scenario database according to the present invention is represented by a three-dimensional scenario authoring tool. The scenario can be deleted, and the deleted scenario is completely deleted in the three- Can not.

도 15는 본 발명에 따른 대화 시나리오 자동 수집 및 온톨로지 대화 관계망을 이용한 연속 대화가 이루어지는 시퀀스를 블럭 다이어그램으로 도시한 도면이고, 도 16은 본 발명에 따른 대화 시나리오 자동 수집 및 온톨로지 대화 관계망을 이용한 연속 대화가 이루어지는 시퀀스를 순서도로써 도시한 도면이다.FIG. 15 is a block diagram showing a sequence in which continuous conversation is performed using the conversation scenario automatic collecting and ontology conversation network according to the present invention. FIG. 16 is a block diagram showing a sequence diagram of automatic conversation scenario collection and on- As a sequence diagram.

앞서 전술한 바와 같이, 온톨로지 대화 관계망은 300~600개의 벡터로 구성된 다차원벡터공간이지만, 다차원 벡터를 물리적으로 표현할 수 없기 때문에 3차원으로 공간압축(PCA 등 공간압축 알고리즘 사용)해서 표시한다. 온톨로지 대화 관계망은 3차원 공간상의 하나의 점을 노드로 표현하는데, 각 노드는 단어 의미벡터, 문장의미벡터, 시나리오 의미벡터를 나타낼 수 있다. '온톨로지 대화 관계망'이라 할 때는 그러한 노드 및, 노드 간의 연결에 의한 시나리오 구성 등의 전체 구조를 표현하며, '온톨로지 대화 관계망 데이터베이스'라 할 때는 그러한 데이터 및 구조 관계를 저장하고 있는 저장부를 표현하나, 이하에서는 큰 의미 구별없이 혼용하여 사용하기로 한다.As described above, the ontology dialogue network is a multidimensional vector space composed of 300 to 600 vectors. However, since the multidimensional vector can not be represented physically, spatial compression (using a space compression algorithm such as PCA) is displayed in three dimensions. The ontology dialogue network expresses one point on the 3D space as nodes, and each node can represent a word semantic vector, a sentence semantic vector, and a scenario semantic vector. The term 'ontology dialogue network' expresses the entire structure of such a node and the configuration of a scenario by connection between nodes. When 'ontology dialogue network database' is used, the storage unit storing such data and structure relation is expressed, Hereinafter, they will be used in a mixed manner with no significant discrimination.

또한 전술한 바와 같이, 의미단어는, 유사한 의미의 여러 문장을 대표하는 단어이며, 하나의 단어로 이루어질 수도 있고, 여러 단어의 결합으로 이루어질 수도 있다. 예를 들어, 의미단어 “이름모호“는 의미단어 “이름”과 의미단어 “모호”의 의미벡터가 결합된 새로운 의미벡터이며, “어디서 봤더라”와 같은 문장을 대표하는 의미단어임은 앞서 설명한 바와 같다. 즉, 의미단어 역시 온톨로지 대화 관계망 공간에서 하나의 의미벡터에 의해 나타낼 수 있으며, 하나의 노드를 가진다.Also, as described above, the semantic word is a word representing a plurality of sentences having a similar meaning, and may be a single word or a combination of several words. For example, the semantic word " name ambiguity " is a new meaning vector in which the semantic vector of the semantic word " name " is combined with the semantic vector of the semantic word " ambiguity ", and the semantic word representing " Same as. That is, the semantic word can be represented by a semantic vector in the ontology dialog network space, and has one node.

시나리오는 3차원의 온톨로지 대화 관계망 공간상에 표시할 수 있으며, 시나리오의 각 질문과 답변은 하나의 의미노드(의미큐브 또는 의미단어)로 표시된다. 각 시나리오에 소속된 의미노드 사이의 의미적인 밀접성에 따라 가까운 공간상에 표시되기도 하고 멀리 떨어진 공간상에 표시되기도 하기 때문에, 시나리오를 구성하는 노드사이의 의미관계 파악이 훨씬 쉽다. 이하에서는 의미노드 역시 단순히 '노드'라 지칭하기로 한다.Scenarios can be displayed on a three-dimensional ontology dialogue network space, and each question and answer in a scenario is represented by a semantic node (meaning cube or semantic word). Since semantic closeness between semantic nodes belonging to each scenario is displayed in the near space and displayed in the distant space, it is much easier to grasp the semantic relationship between the nodes constituting the scenario. Hereinafter, the semantic node will also be referred to simply as a 'node'.

이하에서는 도 15의 블럭 다이어그램 시퀀스 및 도 16의 순서도를 참조하여 온톨로지 대화 관계망을 이용한 연속 대화 수행과정을 설명한다.Hereinafter, a continuous conversation process using the ontology conversation network will be described with reference to the block diagram sequence of FIG. 15 and the flowchart of FIG.

먼저, 사용자가 말을 할 경우, 본 발명의 온톨로지 대화 관계망을 이용한 연속 대화 시스템(200, 도 19 참조)에 사용자의 음성이 입력된다(S210). 온톨로지 대화 관계망을 이용한 연속 대화 시스템(200)은 입력받은 음성에 대한 문장이해를 위해 문장을 분석한다(S220). 온톨로지 대화 관계망을 이용한 연속 대화 시스템(200)은 온톨로지 대화 관계망 데이터베이스로부터, 분석된 대화문장에 대하여, 해당 대화문장을 대표하는 단어 또는 단어의 조합인 의미단어가 상기 대화문장과 매칭되어 있는지 파악한다(S230).First, when the user speaks, the user's voice is input to the continuous conversation system 200 (see FIG. 19) using the ontology conversation network of the present invention (S210). The continuous conversation system 200 using the ontology conversation network analyzes a sentence to understand a sentence of the input voice (S220). The continuous conversation system 200 using the ontology conversation network determines whether a semantic word that is a word or combination of words representing the conversation sentence is matched with the conversation sentence from the ontology conversation relationship network database S230).

의미단어가 매칭되어 있지 않은 경우(S240), 해당 대화문장에 대하여 의미단어를 결정하여 상기 대화문장에 매칭시켜 온톨로지 대화 관계망 데이터베이스에 저장하는데(S250), 이에 대하여는 도 17을 참조하여 상세히 후술한다.If the semantic word is not matched (S240), a semantic word is determined for the conversation sentence, and the semantic word is matched to the conversation sentence and stored in the ontology conversation network database (S250), which will be described later in detail with reference to FIG.

상기 대화문장에 의미단어가 매칭되어 있지 않아 새로운 의미단어를 결정하여 상기 대화문장에 매칭시킨 경우든지, 또는 상기 대화문장에 이미 의미단어가 매칭되있는 경우든지, 해당 대화문장에 의미단어가 매칭되어 온톨로지 대화 관계망 데이터베이스에 저장된 후, 온톨로지 대화 관계망 데이터베이스에서의 대화 시나리오에 따라, 상기 대화문장에 매칭된 의미단어(이하 '제1 의미단어'라 한다)에 이어지는 의미단어(이하 '제2 의미단어'라 한다)를 추출하고(S260), 해당 의미단어에 포함된 대화문장 중 하나를 선택하여(S270), 선택된 대화문장을 사용자에게 음성으로 출력하게 된다(S280). 예를 들어 제1 의미단어는 현재 사용자가 말하여 입력된 질문 대화문장을 대표하는 의미단어일 수 있고, 제2 의미단어는 이에 대하여 온톨로지 대화 관계망을 이용한 연속 대화 시스템(200)이 답변하기로 결정한 답변 대화문장을 대표하는 의미단어일 수 있다.A meaning word is not matched to the conversation sentence, and a new meaning word is determined and matched to the conversation sentence, or when a meaning word already matches the conversation sentence, a meaning word is matched to the conversation sentence (Hereinafter referred to as 'second semantic word') following the semantic word (hereinafter referred to as 'first semantic word') matched to the conversational sentence, in accordance with the conversation scenario in the ontology conversational network database, (S260), selects one of the conversation sentences included in the corresponding word (S270), and outputs the selected conversation sentence to the user (S280). For example, the first semantic word may be a semantic word representing a question conversation sentence that the current user has spoken, and the second semantic word may be a semantic word that the contiguous conversation system 200 using the ontology conversation network decided to answer Answer It can be a semantic word representing a conversation sentence.

온톨로지 대화 관계망 데이터베이스에서의 대화 시나리오에 따라, 상기 대화문장에 매칭된 제1 의미단어에 이어지는 의미단어가 다수개 존재하는 경우가 있을 수 있다. 이러한 경우는 각 의미단어로의 시나리오에 부여되어 있는 가중치가 가장 높은 의미단어를 제2 의미단어로 추출하도록 할 수 있다. 예를 들어 제1 의미단어에 이어지는 시나리오가, 제1 의미단어->'A(의미단어)'(가중치 0.7), 제1 의미단어->'B(의미단어)'(가중치 0.2), 제1 의미단어->'C(의미단어)'(가중치 0.1)와 같이 3가지가 있을 경우, 제1 의미단어->'A(의미단어)'(가중치 0.7)로 결정하여 의미단어 A를 제2 의미단어로 결정할 수 있는 것이다. 이와 같은 가중치는, 평소 사용자와의 대화를 진단 및 분석하는 대화품질 관리를 통해 미리 설정해 놓을 수 있는데, 이에 대하여는 도 18을 참조하여 후술한다.There may be a case where there are a plurality of semantic words following the first semantic word matched to the conversation sentence according to the conversation scenario in the ontology dialog network database. In this case, it is possible to extract a semantic word having the highest weight given to the scenario for each semantic word as a second semantic word. For example, if the scenario following the first semantic word is the first semantic word -> A (meaning word) (weight 0.7), the first semantic word -> B (semantic word) A (meaning word) '(weighting 0.7) when there are three kinds of meaning words ->' C (meaning word) '(weighting 0.1) You can decide by word. Such a weight can be set in advance through conversation quality control for diagnosing and analyzing a conversation with a user, which will be described later with reference to FIG.

도 17은 본 발명에 따른 온톨로지 대화 관계망을 이용한 연속 대화 중, 신규 대화문장이 입력되었을 경우, 온톨로지 대화 관계망으로 매핑하는 방법을 수행하는 순서도이다.FIG. 17 is a flow chart illustrating a method of mapping a new conversation sentence into an ontology conversation network when a new conversation sentence is input, in a continuous conversation using the ontology conversation network according to the present invention.

새로운 시나리오가 입력되면, 해당 시나리오의 질문과 답변 문장의 의미벡터값을 추출하고, 해당 질문 답변 의미벡터의 대화의도를 각각 분류해서 입력 시나리오(“대화의도(의미단어), 대화문장, 의미벡터”의 연속된 리스트)를 대화 관계망에 매핑한다.When a new scenario is input, the semantic vector values of the question and answer sentences of the scenario are extracted, and the conversation intention of the question answering semantic vector is classified and classified into input scenarios ("conversation intention (semantic word) Vector ") to the conversation network.

즉, 사용자에 의해 입력된 대화문장에 의미단어가 매칭되어 있지 않은 경우, 즉 새로운 시나리오가 입력된 경우(S240), 해당 대화문장의 의미벡터를 도출하고(S251), 상기 의미벡터로부터 상기 대화문장에 해당하는 의미단어를 분류한다(S252). 의미단어를 분류하는 방법은, 규칙기반, 통계기반, 머신 러닝(machine learning) 기반 중 하나 이상의 방법을 사용할 수 있다. 여기서 머신 러닝은, 도 15에서 '대화 시나리오 학습' 블럭에 해당된다.That is, when a semantic word is not matched to a conversation sentence input by a user, that is, when a new scenario is input (S240), a semantic vector of the conversation sentence is derived (S251) (S252). &Lt; / RTI > Methods for classifying semantic words can be based on one or more of rule based, statistical based, and machine learning based methods. Here, the machine learning corresponds to the 'conversation scenario learning' block in FIG.

상기 대화문장에 해당하는 의미단어가 분류된 경우(S252), 그 의미단어를 상기 대화문장의 의미단어로 결정하여 상기 대화문장에 매칭시켜 온톨로지 대화 관계망 데이터베이스에 저장한다(S253). 이때 전술한 바와 같이 분류된 의미단어, 대화문장, 의미벡터의 연속된 리스트를 온톨로지 대화 관계망 데이터베이스에 저장할 수 있다. If the semantic word corresponding to the conversation sentence is classified (S252), the semantic word is determined to be a semantic word of the conversation sentence, and is matched to the conversation sentence and stored in the ontology conversation network database (S253). At this time, a continuous list of semantic words, conversation sentences, and semantic vectors classified as described above can be stored in the ontology dialog network database.

상기 대화문장에 해당하는 의미단어가 분류되지 않은 경우(S252), 상기 의미벡터가 가리키는 노드와 가장 가까운 노드에 매핑되어 있는 의미단어를 온톨로지 대화 관계망 데이터베이스로부터 파악하여, 그 파악된 의미단어를 상기 대화문장의 의미단어로 결정하여 상기 대화문장에 매칭시켜 온톨로지 대화 관계망 데이터베이스에 저장하며(S254), 이때도 역시 분류된 의미단어, 대화문장, 의미벡터의 연속된 리스트를 온톨로지 대화 관계망 데이터베이스에 저장할 수 있다.If the semantic word corresponding to the conversation sentence is not classified (S252), a semantic word mapped to the node closest to the node indicated by the semantic vector is grasped from the ontology dialog network database, (S254). At this time, a continuous list of the classified semantic word, conversation sentence, and semantic vector can also be stored in the ontology dialog network database .

이때, 의미벡터가 가리키는 노드와 가장 가까운 노드에 매핑되어 있는 의미단어는, 단일 의미단어일 수도 있고, 또는 의미단어의 조합으로 생성된 의미벡터가 가리키는 새로운 의미단어일 수도 있다.In this case, the semantic word mapped to the node closest to the node indicated by the semantic vector may be a single semantic word, or may be a new semantic word indicated by a semantic vector generated by a combination of semantic words.

하나의 시나리오는 질문 답변의 연속된 쌍으로 표현할 수 있는데, 각 질문 답변이 모여서 하나의 시나리오를 형성하듯이, 시나리오는 특정 대화 영역 정보(교육, 문학, 상식, 일상대화, 스포츠, 영화 등)를 가지고 있으며, 시나리오도 의미단어의 조합으로 나타낼 수 있다.A scenario can be expressed as a series of pairs of question answers. Just as each question answer forms a single scenario, a scenario can contain specific dialogue information (education, literature, common sense, everyday conversation, sports, movies, etc.) And scenarios can be represented by a combination of meaning words.

시나리오 분류도 규칙기반, 통계기반, 머신 러닝 기반 중 하나 이상의 방법 사용하여 각 시나리오를 분류할 수 있다. 따라서, 각 문장의 대화의도를 분류하기 전에 시나리오 분류를 먼저 선택하여, 해당 시나리오 분류에 속하는 대화의도(의미단어)를 먼저 탐색하여, 검색시간을 줄일 수도 있다.Scenario classifications can be categorized using one or more of the following: rule-based, statistical-based, and machine-based. Therefore, before classifying the conversation intention of each sentence, the scenario classification may be firstly selected and the search intention (semantic word) belonging to the scenario classification may be searched first to reduce the search time.

도 18은 본 발명에 따른 온톨로지 대화 관계망을 이용한 연속 대화를 이용하여 대화품질 개선을 구현하기 위한 시퀀스를 블럭 다이어그램으로 도시한 도면이다.18 is a block diagram showing a sequence for implementing conversation quality improvement using continuous conversation using the ontology conversation network according to the present invention.

사용자는 시스템과 대화를 진행하면서 사용자의 반응이 긍정적인지 부정적인지 즉각적으로 알 수 있다. 긍정적인 반응, 부정적인 반응, 무시 반응, 화제 변경 반응 등 사용자의 반응은 사용자의 대화문장을 분석해서 알 수 있으며, 이때, 규칙기반, 통계기반, 머신러닝기반 등 다양한 분석 방법을 사용할 수 있다.As the user interacts with the system, the user can immediately know whether the user's response is positive or negative. The user's reaction such as positive reaction, negative reaction, neglect reaction, topic change reaction can be analyzed by analyzing the user's conversation sentence, and various analysis methods such as rule based, statistical based, and machine learning based can be used.

사용자 반응이 긍정적일 때, 현재 시나리오를 선호 대화 시나리오라고 인식할 수 있으며, 가중치 조정 등을 통해서 현재 시나리오가 채택될 확률을 높여서, 선호 시나리오가 계속 선택되고, 비 선호 시나리오는 우선 순위가 밀리게 하는 등, 실시간 사용자 반응을 통해서 지속적으로 대화 시나리오를 관리하여, 궁극적으로 연속 대화를 하면 할수록 대화품질이 개선되도록 하는 것을 목표로 한다.When the user response is positive, the current scenario is recognized as the preferred dialog scenario, and the likelihood that the current scenario will be adopted is increased by weight adjustment, so that the preferred scenario is continuously selected, and the non-preferred scenario has priority , And managing the dialogue scenario continuously through the real-time user response and ultimately aiming at improving the conversation quality as the continuous conversation is performed.

이와 같은 과정을 도 18을 참조하여 설명하면, 지속적인 대화품질 관리는, 평소 사용자와 본 발명의 온톨로지 대화 관계망을 이용한 연속 대화 시스템(200) 간의 대화 시나리오 이력을 통해 이루어진다. 즉, 평소 사용자로부터 입력되는 문장을 분석하고, 분석된 대화문장으로부터 사용자의 반응을 분석한다. 사용자 반응이 긍정적인 것으로 분석된 경우 해당 대화문장 또는 해당 대화문장의 의미단어로 이어지는 대화 시나리오가 선택될 가능성을 높게 설정하고, 사용자 반응이 부정적인 것으로 분석된 경우 해당 대화문장 또는 해당 대화문장의 의미단어로 이어지는 대화 시나리오가 선택될 가능성을 낮게 설정하는 것이다.This process will be described with reference to FIG. 18. Continuous conversation quality management is performed through a conversation scenario history between the user and the continuous conversation system 200 using the ontology conversation network of the present invention. In other words, the sentence inputted from the user is analyzed and the reaction of the user is analyzed from the analyzed conversation sentence. If the user reaction is analyzed to be positive, the likelihood that the dialog scenario leading to the corresponding sentence sentence or the semantic word of the corresponding sentence is selected is set to be high, and if the user reaction is analyzed as negative, the corresponding sentence or semantic word Is set to a low possibility.

여기서 긍정적 또는 부정적이라 함은 매우 포괄적으로 사용한 용어이다. 도 18의 예로서 도시한 바와 같이, 칭찬 반응, 긍정적인 반응, 부정적인 반응, 무시 반응, 화제 변경 반응 등, 세부적으로 나눌 수도 있지만, 이 모든 것을 포괄하는 것으로 긍정적 또는 부정적 반응으로 설명할 수 있다. 이는 2 종류의 반응으로 분석한다는 것이 아니라, 일종의 긍정도(degree) 또는 부정도(degree)를 준다는 것이 더 정확한 표현일 것이다. 즉, 문장분석에 의한 사용자의 반응이 더 호응적이고 좋은 반응일수록, 피드백을 통해 그와 같은 반응이 나온 대화 시나리오가 향후 선택될 가능성을 더 높게 하고, 반대일수록 그와 같은 반응이 나온 대화 시나리오가 향후 선택될 가능성을 더 낮게 하는 것이다.Positive or negative here is a very generic term. As shown in the example of Fig. 18, it can be divided into details such as praise reaction, positive reaction, negative reaction, neglect reaction, topic change reaction, etc. However, it can be explained as a positive or negative reaction including all of these. This would be a more accurate expression of giving a degree or degree, rather than analyzing it with two kinds of reactions. In other words, the more responsive and favorable the user's response to the sentence analysis is, the more likely it is that the dialogue scenario with such response will be selected in the future through feedback, and the more the opposing dialogue scenario The possibility of being selected is lowered.

향후 대화 시나리오가 선택될 가능성의 설정은, 해당 사용자 반응이 나온 대화 시나리오에, 기 설정된 기준에 따라 선호/비선호 대화 시나리오와 관련한 가중치를 부여함으로써 이루어질 수 있다. 이로써 향후 가중치가 높은 대화 시나리오가 선택될 확률을 높게 하는 것이다.The setting of the likelihood that a future dialog scenario will be selected may be made by assigning a weight in relation to the preference / non-preference dialog scenario according to predetermined criteria in the dialog scenario in which the user response is made. This increases the probability that a conversation scenario with a higher weight will be selected in the future.

이때 사용자 반응의 분석은, 사용자의 대화문장을 규칙에 따라 분석함을 통하여 수행하거나(규칙기반), 대화문장의 통계를 기반하여 수행하거나(통계기반), 머신러닝(machine learning) 방식으로 수행(머신러닝 기반)할 수 있다.At this time, the analysis of the user reaction may be performed by analyzing the user's conversation sentence according to a rule (rule-based), performing based on the sentence statistics (statistical based), or machine learning Machine learning basis).

또는 상기 사용자 반응의 분석은, 온톨로지 대화 관계망 데이터베이스에서 상기 사용자의 대화문장이 해당하는 의미단어에 대한, 기 분석되어 매핑되어 있는 사용자 선호도에 의해 이루어질 수도 있다. 즉, 온톨로지 대화 관계망 데이터베이스 상에서 의미단어 공간과, 사용자 선호도 공간이 별도로 구비될 수도 있고, 이때 각 의미단어 공간의 특정 의미단어 노드와 사용자 선호도 공간의 동일 노드의 사용자 선호도가 매핑되도록 할 수 있다. 즉, 사용자의 그 의미단어에 대하여 사용자 반응으로 분석된 사용자 선호도가 기 매핑되어 있어, 이로부터 해당 대화에 대한 가중치를 설정할 수도 있는 것이다.Alternatively, the analysis of the user response may be performed by a user preference in which the conversation sentence of the user in the ontology dialog network database is analyzed and mapped to the corresponding semantic word. That is, a semantic word space and a user preference space may be separately provided on the ontology dialog network database, and a specific semantic word node of each semantic word space may be mapped to a user preference of the same node in the user preference space. That is, since the user preference analyzed by the user reaction is mapped to the semantic word of the user, the weight for the conversation can be set therefrom.

도 19는 본 발명에 따른 온톨로지 대화 관계망을 이용한 연속 대화 시스템의 구성을 나타내는 도면이다.19 is a diagram illustrating a configuration of a continuous conversation system using an ontology conversation network according to the present invention.

제어부(201)는 온톨로지 대화 관계망을 이용한 연속 대화 시스템의 상기 각 구성요소를 제어하여 온톨로지 대화 관계망을 이용한 연속 대화와 관련한 일련의 처리를 수행한다.The control unit 201 controls each of the components of the continuous dialog system using the ontology dialog network to perform a series of processes related to the continuous dialog using the ontology dialog network.

음성입력부(202)는 사용자의 음성을 입력받는다.The voice input unit 202 receives the voice of the user.

문장분석부(203)는 상기 입력받은 음성으로부터 문장분석을 수행한다.The sentence analysis unit 203 analyzes sentences from the input speech.

대화관리부(204)는 온톨로지 대화 관계망 데이터베이스로부터, 분석된 대화문장에 대하여 해당 대화문장을 대표하는 단어 또는 단어의 조합인 의미단어(이하 '제1 의미단어'라 한다)에 이어지는 의미단어(이하 '제2 의미단어'라 한다)를 추출하고, 해당 의미단어에 포함된 대화문장 중 하나를 출력 대화문장으로 선택하는 역할을 수행한다. 또한 대화관리부(204)는, 온톨로지 대화 관계망 데이터베이스에서의 대화 시나리오에 따라, 상기 대화문장에 매칭된 제1 의미단어에 이어지는 의미단어가 다수개 존재하는 경우, 각 의미단어로의 시나리오에 부여되어 있는 가중치가 가장 높은 의미단어를 제2 의미단어로 추출하는 역할을 수행할 수도 있다.The dialog management unit 204 receives a semantic word (hereinafter referred to as a " first semantic word ") following a semantic word (hereinafter, referred to as 'first semantic word'), which is a combination of words or words representing the conversation sentence, Quot; second semantic word "), and selects one of the conversation sentences included in the corresponding semantic word as an output conversation sentence. In addition, in the case where there are a plurality of semantic words following the first semantic word matched to the conversation sentence in accordance with the conversation scenario in the ontology conversation relationship network database, And extracts the semantic word having the highest weight as the second semantic word.

신규 시나리오 처리부(206)는 분석된 대화문장에 대하여 의미단어가 매칭되어 있지 않은 경우, 해당 문장에 대한 새로운 의미단어를 결정하여 상기 문장에 매칭시켜 온톨로지 대화 관계망 데이터베이스에 저장하고, 결정된 의미단어를 상기 대화관리부에 전달하는 역할을 수행한다. If the semantic word is not matched with the analyzed conversation sentence, the new scenario processing unit 206 determines a new semantic word for the sentence, stores the new semantic word in the ontology dialog network database, matches the sentence with the sentence, And transmits it to the dialog management unit.

상기 신규 시나리오 처리부는, 입력받은 대화문장에 의미단어가 매칭되어 있지 않은 경우, 해당 대화문장의 의미벡터를 도출할 수 있다. 도출된 의미벡터로부터, 상기 대화문장에 해당하는 의미단어를 분류한다. 상기 대화문장에 해당하는 의미단어가 분류된 경우, 그 의미단어를 상기 대화문장의 의미단어로 결정하여 상기 대화문장에 매칭시켜 온톨로지 대화 관계망 데이터베이스에 저장하고, 상기 대화문장에 해당하는 의미단어가 분류되지 않은 경우, 상기 의미벡터가 가리키는 노드와 가장 가까운 노드에 매핑되어 있는 의미단어를 온톨로지 대화 관계망 데이터베이스로부터 파악하여, 그 파악된 의미단어를 상기 대화문장의 의미단어로 결정하여 상기 대화문장에 매칭시켜 온톨로지 대화 관계망 데이터베이스에 저장하고, 결정된 의미단어를 상기 대화관리부에 전달하는 역할을 수행할 수 있다.The new scenario processing unit can derive the semantic vector of the corresponding sentence sentence if the sentence sentence is not matched with the semantic word. And classifies the semantic word corresponding to the conversation sentence from the derived semantic vector. If the semantic word corresponding to the conversation sentence is classified, the semantic word is determined to be a semantic word of the conversation sentence, and the semantic word is matched to the conversation sentence and stored in the ontology conversation network database, The semantic vector mapped to the node closest to the node indicated by the semantic vector is grasped from the ontology dialog network database, and the identified semantic word is determined to be a semantic word of the conversation sentence and is matched to the sentence sentence Store it in the ontology dialog network database, and deliver the determined semantic word to the conversation management unit.

도 17을 참조하여 전술한 바와 같이, 상기 의미벡터가 가리키는 노드와 가장 가까운 노드에 매핑되어 있는 의미단어는, 단일 의미단어 또는, 의미단어의 조합으로 생성된 의미벡터가 가리키는 의미단어일 수 있다.As described above with reference to FIG. 17, the semantic word mapped to the node closest to the node indicated by the semantic vector may be a single semantic word or a semantic word indicated by a semantic vector generated by a combination of semantic words.

대화품질 관리부(205)는, 상기 분석된 대화문장으로부터, 사용자의 반응을 분석하여, 사용자 반응이 긍정적인 것으로 분석된 경우 해당 대화문장 또는 해당 대화문장의 의미단어로 이어지는 대화 시나리오가 선택될 가능성을 높게 설정하고, 사용자 반응이 부정적인 것으로 분석된 경우 해당 대화문장 또는 해당 대화문장의 의미단어로 이어지는 대화 시나리오가 선택될 가능성을 낮게 설정함으로써 대화 품질관리를 수행한다. 이때 도 18을 참조하여 전술한 바와 같이, 상기 대화 시나리오가 선택될 가능성의 설정은, 기 설정된 기준에 따라 가중치를 부여함으로써 이루어질 수 있다. 또한 사용자 반응의 분석은, 상기 대화문장을 규칙에 따라 분석함을 통하여 수행하거나(규칙기반), 대화문장의 통계를 기반하여 수행하거나(통계기반), 머신러닝(machine learning) 방식으로 수행(머신러닝기반)할 수 있다. 또는 사용자 반응의 분석은, 온톨로지 대화 관계망 데이터베이스에서 상기 사용자의 대화문장이 해당하는 의미단어에 대한, 기 분석되어 매핑되어 있는 사용자 선호도에 의해 이루어질 수도 있는데, 이에 대하여는 도 18을 참조하여 상세히 설명한 바 있다.The dialog quality management unit 205 analyzes the response of the user from the analyzed conversation sentence to determine the possibility that the conversation scenario leading to the corresponding sentence sentence or the semantic word of the corresponding sentence is selected if the user response is analyzed as being positive And when the user reaction is analyzed as negative, the conversation quality management is performed by setting a possibility that the conversation scenario leading to the conversation sentence or the meaning word of the conversation sentence is selected to be low. At this time, as described above with reference to FIG. 18, the setting of the possibility that the conversation scenario is selected may be made by weighting according to predetermined criteria. Also, analysis of the user response may be performed by analyzing the conversation sentence according to a rule (rule-based), based on statistics of a conversation sentence (statistical based), or by machine learning Based). Analysis of the user response may be performed by the user preference that is analyzed and mapped on the corresponding semantic word of the conversation sentence of the user in the ontology conversation network database. This has been described in detail with reference to FIG. 18 .

대화출력부(207)는상기 대화관리부에서 선택된 대화문장을 사용자에게 음성 등으로 출력하는 역할을 수행한다.The dialogue output unit 207 outputs the dialogue sentence selected by the dialogue management unit to the user through voice or the like.

도 20은 본 발명에 따른 온톨로지 대화 관계망 구조의 일 실시예를 도시한 도면이다.20 is a diagram showing an embodiment of the ontology talk network structure according to the present invention.

즉, 도 20의 실시예를 참조하면, 온톨로지 대화 관계망의 대화는 일반대화 도메인, 전문대화 도메인, 또는 필요에 따라 다른 새로운 도메인을 설정하여 그러한 도메인 등으로 분류될 수 있다. 즉, 온톨로지 대화 관계망 공간에 매핑되어 있는 의미단어 등의 수많은 노드는, 이미 각각 일반대화, 전문대화 등의 영역으로 이미 설정되어 있을 수 있다. 즉, 각 문장에 대한 의미벡터를 구할 경우, 문장의 의미에 따라 각 의미벡터가 가리키는 노드는, 온톨로지 대화 관계망 공간상에서 정해진 분류의 영역을 가리키도록 구성되어 있을 수 있다. 물론 일반대화 또는 전문대화의 하부에 세부적 분류를 한 경우도 마찬가지 방식일 수 있다.That is, referring to the embodiment of FIG. 20, the conversation of the ontology conversation network can be classified into a general conversation domain, a professional conversation domain, or another domain by setting another new domain as necessary. That is, a large number of nodes such as semantic words mapped in the ontology dialog network space may already be set in areas such as general conversation and professional conversation, respectively. That is, when a semantic vector for each sentence is obtained, the node indicated by each semantic vector according to the meaning of the sentence may be configured to point to an area of the classification determined on the ontology dialog network space. Of course, the same method can be used for detailed classification at the bottom of a general conversation or a professional conversation.

일반대화(일상대화, 상식대화, 주제대화 등)는 다양한 주제에 대해서 대화 관계망(Network)속의 대화 시나리오에 따라 대화를 전개한다. General conversations (daily conversations, common sense conversations, topic conversations, etc.) develop conversations based on conversation scenarios in a conversation network for various topics.

전문대화(예매, 예약, 구입 등)는 특정 목적의 대화 전략(Strategy)에 따라 사용자에게 특정정보(Argument)를 얻어서 과제 수행을 위한 대화를 수행한다. 예를 들자면, KTX예약의 경우 예약시간, 동행자 수, 좌석종류 등 예약에 필요한 정보를 사용자에게 물어봐서, 해당 정보가 충족된 뒤에야 예약을 마무리 하게 된다. 따라서, 특정 목적(Task)에 맞는, 대화 절차(Flow)를 따라야 한다.Specialized conversations (advance purchase, reservation, purchase, etc.) acquire specific information (Argument) according to a specific purpose of the strategy, and perform dialogue for the performance of the task. For example, in the case of KTX reservation, the user is asked about the information necessary for the reservation such as the reservation time, the number of the companion, the type of the seat, and the reservation is finished after the information is satisfied. Therefore, it is necessary to follow a dialogue procedure (Flow) for a specific purpose.

이와 같은 방식으로 온톨로지 대화 관계망에서 일반대화와 전문대화가 자유롭게 이루어질 수 있다.In this way, general conversation and professional conversation can be freely performed in the ontology conversation network.

이하 도 21 내지 도 24는 이와 같이 온톨로지 대화 관계망에서 분류된 대화의 예를 나타낸다.21 to 24 show examples of conversations classified in the ontology dialog network as described above.

도 21은 본 발명에 따른 온톨로지 대화 관계망에서 일반대화 분류 구조의 일 실시예를 도시한 도면이다.21 is a diagram illustrating an example of a general conversation classification structure in an ontology conversation network according to the present invention.

일반대화에는 일상대화, 상식대화, 주제대화, 감성대화, 주제대화 등이 올 수 있으며, 다양한 주제에 대해서 끊임없이 대화 포커스를 이동시키면 대화가 가능하다.General conversations can include daily conversations, common conversations, topic conversations, emotional conversations, topic conversations, and conversations are possible by constantly moving conversation focus on various topics.

도 22는 본 발명에 따른 온톨로지 대화 관계망에서 전문대화 분류 구조의 일 실시예를 도시한 도면이다.22 is a diagram illustrating an embodiment of a professional talk classification structure in an ontology talk network according to the present invention.

전문대화는 예매, 예약, 구입 등 특정 목적기반 대화를 진행하는 것을 말한다.Professional conversation refers to conducting a specific purpose-based conversation, such as booking, reserving, or purchasing.

도 23은 본 발명에 따른 온톨로지 대화 관계망에서 전문대화 중 병원 콜센터에서 상담원과 주고받는 대화 분류 구조의 일 실시예를 도시한 도면이다.23 is a diagram illustrating an example of a conversation classification structure exchanged with an agent in a hospital call center during a professional conversation in an ontology conversation network according to the present invention.

도 24는 본 발명에 따른 온톨로지 대화 관계망에서 일상대화와 감성대화 및 전문대화가 연결되는 분류 구조의 일 실시예를 도시한 도면이다.24 is a view showing an embodiment of a classification structure in which a daily conversation, an emotional conversation, and a professional conversation are connected in an ontology conversation network according to the present invention.

일상대화와 감성대화, 전문대화를 자유롭게 오고가면서 연속적으로 대화가 이루어질 수 있다.Continuous conversation can be done while freely coming in everyday conversation, emotional conversation, and professional conversation.

이상과 같이, 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 이것에 의해 한정되지 않으며 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 본 발명의 기술사상과 아래에 기재될 특허청구범위의 균등범위 내에서 다양한 수정 및 변형이 가능함은 물론이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. It will be understood that various modifications and changes may be made without departing from the scope of the appended claims.

이하 첨부된 도면을 참조로 본 발명의 바람직한 실시예를 상세히 설명하기로 한다. 이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.
도 1은 본 발명에 다른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 방법을 설명하기 위한 순서도이다.
도 1에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 방법은, 먼저 문장을 추출한다(S110). 문장의 추출은 대화 형식의 음성 파일 또는 대화 형식의 게시글로부터 추출한다.
단계 (S110)에서 문장이 추출되면, 추출된 문장을 질문과 답변으로 분류한다(S120).
그리고 단계 (S120)에서 분류된 질문과 답변 문장이 서로 대응되어 연결되는지를 감정 및 의도, 긍정 및 부정으로 선별(S130)하고, 이 선별된 질문과 답변 문장을 리스트 형태인 시나리오로 생성한다(S140).
이어서, 생성된 리스트 형태인 시나리오의 질문과 답변 문장을 의미벡터로 변환하여 시나리오를 학습하고(S150), 학습된 시나리오를 연속적인 시나리오 의미벡터로 데이터베이스화 시킨다(S160).
이렇게 데이터베이스화 된 시나리오의 연속적의 의미벡터를 온토롤지 다차원 공간에 의미단어 또는 의미노드 그리고 의미큐브 중 하나로 표시한다(S160).
도 1에 따른 본 발명의 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 방법을 도 2 내지 도 3의 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 장치를 통하여 설명하면 다음과 같다.
도 2는 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 장치를 개략적으로 도시한 도면이며, 도 3은 도 2에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 장치를 세부적으로 도시한 도면이다.
도 2 내지 도 3에 도시된 바와 같이, 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 장치(100)는 문장을 추출하는 문장추출부(110)와, 문장추출부(110)에 의하여 추출된 문장을 질문과 답변으로 분류하는 문장분석부(120)과, 문장분석부(120)에 의하여 분류된 문장이 서로 대응되어 대화가 연결되는 리스트 형태로 시나리오를 생성하는 대화 시나리오 생성부(130)와, 대화 시나리오 생성부(130)에 의하여 생성된 대화 시나리오의 문장을 학습하는 대화 시나리오 학습부(140)와, 대화 시나리오 학습부(140)에 의하여 학습된 대화 시나리오를 의미벡터로 연속화하여 저장하는 대화 시나리오 데이터베이스(150)와, 대화 시나리오 데이터베이스(150)에 저장된 대화 시나리오의 연속적인 의미벡터를 온톨로지 다차원 공간상에 의미단어 또는 의미노드 그리고 의미큐브 중 하나로 표시하는 온톨로지 관계 매핑부(160)을 포함한다.
문장추출부(110))은 대화 형식의 음성 파일 또는 대화 형식의 게시글로부터 문장을 추출하는데, 이 문장추출부(110)은 대화 형식의 음성 파일을 추출하는 음성 파일 추출모듈(111)과, 대화 형식의 게시글로부터 문장을 추출하는 게시글 추출모듈(112)을 포함하는데, 도 3에 도시된 바와 같이 콜센터 및 라디오 그리고 TV 방송 등의 대화 형식의 음성 파일은 음성 파일 추출모듈(111)을 통하여 문장이 추출되며, SNS 등에 게시된 대화 형식의 문장은 검색어 생성기(113)에 검색어가 선택되면 게시글 추출모듈(112)에 의하여 게시글을 검색하고 게시글을 파서하므로 문장이 추출된다. 예를 들면, 콜센터에서 제공되는 기존 음성녹취 파일을 음성 인식 후, 텍스트로부터 문장을 추출하거나, 또한 콜센터에서 고객과 상담원 간의 상담 내용을 실시간으로 음성 인식 후, 텍스트로부터 문장을 추출 할 수도 있다. 한편 SNS는 트위터나 페이스 북 등에서 다양한 주제에 대해서 대화를 나눈 게시글로부터 문장을 추출하는데, 먼저 준비된 검색어로 SNS를 검색하고, 이 검색어(특정 단어)로 검색된 게시글 링크의 각각을 추출한 뒤에, 각 링크의 게시글이 가르키는 최초로 입력된 게시글의 처음부터 그 게시글로부터 시작된 토론에 대한 마지막 게시글 까지를 하나의 대화 주제로 생각하여 해당 대화 주제에 대한 게시글을 모두 추출한다.
문장분석부(120)은 문장추출부(110)로부터 추출된 문장을 질문과 답변으로 분류하는데, 분류 전에 음성인식 후 텍스트화되어 추출된 문장에서 오류 단어가 발생하면 음성인식 오류분석을 수행하고 이 오류분석된 문장을 복원하거나, 또는 부적절한 대화를 필터링하여 문장이 질문인지 답변인지를 분석 및 분류한다. 문장분석부(120)은 규칙기반이나 머신러닝기반, 통계기반으로 만들 수도 있고, 규칙기반과 머신러닝기반, 통계기반 중 하나이상의 조합으로 구성될 수도 있다. 문장분석부(120)가 머신러닝 기반 일 때는 지도학습에 의해서 학습이 수행되며, 수행된 결과로 구축된 언어모델 데이터에 의해서 입력 문장이 질문인지 답변인지 분석하게 된다.
대화 시나리오 생성부(130)는 문장분석부(120)을 통해 분석된 질문과 답변 문장이 서로 대응되어 연결되는지를 감정 및 의도, 긍정 및 부정으로 선별하고, 이 선별된 질문과 답변 문장이 서로 연결된 리스트 형태인 시나리오로 생성하며, 시나리오 생성을 위하여 시나리오 문장의 의미벡터 데이터를 이용하며 머신러닝 기반으로 지도학습에 의하여 시나리오를 생성한다.
대화 시나리오 학습부(140)는 대화 시나리오 생성부(130)에 의하여 생성된 시나리오의 질문 및 답변 문장들을 의미벡터로 변경하여 머신러닝 기반에 의하여 지도학습에 의하여 학습된다.
일반적으로 지도학습(supervised learning)이란, 훈련 데이터로부터 함수를 만들어내는 기계 학습(Machine Learning)이다. 훈련 데이터는 입력 대상(전형적으로 벡터)의 쌍과 원하는 출력으로 구성된다. 함수의 출력은 연속값 일 수 있고 또는 입력 대상의 분류명을 예상할 수도 있다. 지도학습의 일은 단지 소수의 훈련예 인 입력쌍과 목표출력들만을 보고서 유효한 입력대상을 위한 함수의 값을 예측하는 것이다. 이를 위해서 학습기(learner)는 이성적인(reasonable) 방법으로 현재의 데이터로부터 보이지 않는 상황까지 일반화해야 한다.
대화 시나리오 데이터베이스(150)는 대화 시나리오가 의미벡터로 데이터베이스화 되어있으며, 대화 시나리오 학습부(140)에서 학습된 시나리오 의미벡터가 저장된다. 이때 대화 시나리오는 문장단위 질문과 답변으로 구성된 의미벡터가 연속적으로 저장된다. 문장, 시나리오의 의미벡터는 인간이 직관적으로 파악하는데 도움을 주기 위해서 하나 이상의 의미 단어로 저장된다.
온톨로지 관계 매핑부(160)는 앞서 대화 시나리오의 연속적인 의미벡터가 저장된 대화 시나리오 데이터베이스(150)가 구축되면, 온톨로지 대화 관계망에 이 저장된 연속적인 의미벡터의 대화 시나리오를 매핑 및 표시하는 기능을 수행한다. 온톨로지 대화 관계망은 300~600개의 벡터로 구성된 다차원공간이지만, 다차원 벡터를 물리적으로 표현할 수 없기 때문에 3차원으로 공간압축해서 표시한다. 온톨로지 대화 관계망은 3차원 공간상의 하나의 점을 노드로 표현하는데, 각 노드는 데이터베이스(150)에 저장된 단어 의미벡터, 문장 의미벡터, 시나리오 의미벡터로부터 나타낼 수 있다. 여기서 단어 의미벡터와 문장 의미벡터는 기존에 공개된 word2vec, sent2vec 머신러닝 알고리즘을 이용하며, 시나리오도 마찬가지 scenario2vec 형태로 시나리오 자체를 시나리오 의미벡터로 표현할 수 있으며, 시나리오를 질문과 답변 문장들의 의미벡터 시퀀스로 표현할 수도 있다.
온톨로지 대화 관계망에는 온톨로지 관계 매핑부(160)을 통하여 대화 시나리오 데이터베이스(150)에 저장된 대화 시나리오의 연속적 의미벡터를 업로드 할 수 있고, 시나리오 자체를 하나의 의미벡터로 만들어서 업로드 할 수 있으며, 유사 문장, 유사 시나리오를 검색할 때에는 입력한 시나리오와 가장 가까운 시나리오를 의미벡터 공간상에서 거리 계산에 의해서 매핑 할 수 있다. 이때 의미벡터 공간상에 단어, 문장, 시나리오를 표현할 수 있지만, 단어, 문장, 시나리오는 각각 별개의 의미공간을 가진다. 그리고 단어, 문장, 시나리오의 의미공간을 하나의 의미공간에 사상하여 동시에 단어, 문장, 시나리오를 공간상에 표현할 수도 있다. 사상하는 방법 중 문장은 단어 벡터의 합이나 곱 등으로 표현하고, 시나리오는 문장벡터의 합이나 곱 등으로 표현하는 방법 등을 이용하며, 문장 시나리오의 의미벡터는 인간이 직관적으로 파악하는데 도움을 주기 위해서 하나 이상의 의미단어로 표시된다.
한편, 대화 시나리오 데이터베이스(150)에 저장된 대화 시나리오는 온톨로지 관계 매핑부(160)를 통하여 온톨로지 대화 관계망에서 관리자가 대화 시나리오를 직접 입력할 수 있으며, 시나리오를 구성하는 문장의 관계를 다음의 [실시예 1]과 같이 의미단어의 연속성(sequency)으로 표시한다.
[실시예 1]
시나리오1 = (질문1) - (답변1) - (질문2) - (답변2) .... (질문N) - (답변N)
시나리오1 = (어디서 봤더라) - (제 이름이 궁금하세요?) - (아니 너를 봤던 장소 말이야) - (우리가 마지막 만났던 장소 말이죠?) .... (그래) - (아마도, 제 생각에는 우리가 지난 얄리 주체 세미나에서 마지막으로 봤던 것 같아요.)
시나리오1 = (이름+모호) - (이름+확인) - (장소+모호) - (장소+확인) .... (별명+승낙) - (별명+답변)
여기서 의미단어 "이름모호"는 의미단어 "이름"과 의미단어 "모호"의 의미벡터가 결합된 새로운 의미벡터이며, "어디서 봤더라?"와 같은 문장을 대표하는 의미단어이다.
하나의 의미단어는 하나 이상의 문장을 의미적으로 표현하며, 같은 뜻을 가진 수많은 문장을 대표하는 대표단어 라고도 할 수 있다.
그리고, "의미모호", "이름확인" 등의 의미단어는 "(질문1)", "(답변1)"처럼 하나의 시나리오를 구성하는 각각의 질문, 답변의 연속성을 대표하기도 한다.
도 4 내지 도 5는 본 발명에 따른 대화 시스템을 위한 대화 시나리오의 데이터베이스 구축 중 문장 추출의 일예를 나타낸 화면이다.
도 4에 도시된 바와 같이 문장 추출은 데이터베이스페디아(DBpedia)나 워드넷(WordNet)으로부터 단어(Word)를 추출하여, 이 단어를 검색어로 활용하며, 검색어에 의하여 크롤링 방식 등으로 SNS 게시글 및 댓글 데이터를 웹을 통해서 파싱한다. 이는 트위터에서 게시물이 올라오면 리트윗하면서 특정 주제에 대해서 대화를 주고받는 형식과 비슷하게 댓글이 달리므로 가능하다.
도 5는 처음 게시물을 올린 사용자의 글에 대해서 리트윗하면서 다른 사용자가 계속 댓글을 달게 되며, 이때 대화 전개가 트리형태와 유사하다. 여기서 각각의 트리를 모두 대화 시나리오로 보며 이해를 돕기 위하여 댓글 아이디 별로 대화 시나리오를 표현하면 다음 [실시예 2]와 같다.
[실시예 2]
시나리오1 : ffebreze - hatter365 - fffebreze
시나리오2 : ffebreze - ffebreze - Teahya - ffebreze - Teahya
도 6은 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 중 문장 분석의 일예를 나타낸 화면으로, 도시된 바와 같이 게시글을 파싱한 뒤에는 해당 게시글이 대화 시나리오에서 질문에 해당하는지 답변에 해당하는지 알 수 없다. 따라서 머신러닝의 지도학습에 의하여 질문과 답변 유형을 분류한다.
도 7은 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 중 시나리오 생성의 일예를 나타낸 화면이고, 도 8은 본 발명에 따른 대화 시스템을 위한 대화 시나리오 데이터베이스 구축 중 대화 시나리오 학습의 일예를 나타낸 화면이다. 도 7에 도시된 바와 같이 하나의 주제에 대해서 질문, 답변이 번갈아가면서 나타나지 않고, 질문과 답변이 중복되어서 발생할 수도 있어서(예; 질문1, 답변1, 질문2, 질문2, 답변2, 질문3, 답변3, 답변3, 답변3 등) 이러한 질문, 답변 중에 시나리오에 적합한 질문과 답변을 선별하고, 도 8에 도시된 바와 같이 선별된 대화 시나리오가 하나의 질문과 그 질문에 대한 하나의 답변의 반복 학습을 보여주고 있다.
도 9는 본 발명에 따라 구축된 대화 시스템을 위한 대화 시나리오 데이터베이스가 온톨로지 대화 관계망에 매핑 된 일예를 나타낸 화면이다.
시나리오는 문장단위 질문과 답변으로 구성된 의미벡터의 시퀀스로 표시된다. 하나의 시나리오를 구성하는 각각의 질문과 답변은 다차원 의미벡터를 가지고 있으며, 각각 질문과 답변 문장이 하나의 대화의도(의미단어)로 자동적으로 각각 매핑된다.
상기 대화의도(의미단어)는 word2vec 등을 이용하여, 현재 사용되는 단어와 이 단어의 의미벡터값을 미리 3차원 공간상에 표현하고, 하나 이상의 단어 벡터 조합(합 또는 곱)으로 새로운 대화의도(의미단어)를 3차원 공간상에 표현한다.
대량의 시나리오가 다차원 의미벡터값으로 입력되면, 이 입력된 의미벡터값과 기존 대화의도(의미단어)의 의미벡터값 사이의 거리를 비교하고, 일정한 값 안에 들어왔을 때, 입력된 의미벡터값에 기존 대화의도(의미단어 또는 의미노드 또는 의미큐브)를 부여하게 된다. (입력된 의미벡터값은 기존 대화의도 이름으로 명명한다)
대량의 시나리오가 다차원 의미벡터값 뿐만 아니라, 대화의도(의미단어 또는 의미노드 또는 의미큐브), 대화문장과 동시에 입력되어 들어올 때에는 각각의 대화의도(의미단어 또는 의미노드 또는 의미큐브)의 공간으로 대화 시나리오(각각의 질문, 답변)를 매핑하면 된다. 이때, 의미단어는 시나리오의 "대화문장"을 자동으로 분석해서 의미단어를 부여할 수도 있고, 사람이 직접 "대화문장"에 어울리는 의미단어를 부착할 수도 있다.
[실시예 3]
시나리오1 = [의미단어][대화문장][의미벡터], [의미단어][대화문장][의미벡터] ....
시나리오1 = [이름모호][어디서 본 것 같아요][2.382, 6.108, ...], [이름확인][제이 이름이 궁금하세요?][8.730, 1,383, ....] ....
[의미단어]를 자동으로 분석하는 방법은 자연어처리 방법 중, 규칙기반, 통계기반, 머신러닝 기반으로 분석할 수 있으며, 머신러닝 기반의 경우, 지도학습에 의한 학습모델을 바탕으로 입력된 문장의 대화의도(의도단어)를 분류하게 된다.
도 10은 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 표현된 일예로서, 대화 시나리오는 보통 2turn(질문1-답변1-질문2-답변2 관계) 이상으로 표시되고, 평면적으로 하나의 공간에 표시할 수 있는 시나리오 개수도 제한 적이며, 현재 시나리오와 공간적으로 가까운 시나리오도 알 수 없는 문제가 발생한다. 또한, 시나리오를 입력하거나 수정하더라도, 다른 시나리오와의 의미 공간상의 상관관계를 전혀 알 수 없는 문제점이 있다.
대화 시나리오는 의미단어의 연속적인 나열로써 표시하며, 그림 상단 첫 번째 시나리오처럼, "이름모호", "C이름확인", "C승낙", "KE이름아이유"처럼 중복을 방지하기 위해서 특정 문자와 결합하여 표시할 수도 있다.
도 11은 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 3차원 시나리오 저작도구로 표현된 화면으로, 도 11과 같이 시나리오는 3차원 공간상에 표시할 수 있다. 시나리오의 각 질문과 답변은 하나의 의미노드(의미큐브 또는 의미단어)로 표시된다. 각 시나리오에 소속된 의미노드 사이의 의미적인 밀접성에 따라 가까운 공간상에 표시되기도 하고 멀리 떨어진 공간상에 표시되기도 하기 때문에, 시나리오를 구성하는 노드사이의 의미관계 파악이 훨씬 쉽다.
도 12는 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 3차원 시나리오 저작도구로 표현된 시나리오 입력 화면의 일예로서, 도 12에 도시된 바와 같이 시나리오는 우측 의미단어(의미노드 또는 의미큐브)를 좌측 여백 공간으로 드래그해서 의미단어의 시퀀스(예: 주말일정-일정답변-취미질의-취미답변)로 하나의 시나리오를 표현하고, 이렇게 시나리오 입력이 완료되면 3차원 대화 관계망 속에 시나리오가 자동으로 매핑된다. 그리고 우측 의미단어패턴은 모두 다차원 의미벡터 값을 가지고 있으며, 의미단어의 조합(주말+일정)을 별도로 학습해서 의미벡터값을 추출하거나 기존 의미단어(주말, 일정)의 의미벡터합이나 곱으로 만들 수도 있다.
우측 의미단어(예:주말일정)는 다양한 문장을 대표하며, "주말일정"을 함의하는 아래 같은 다양한 문장구성을 가진다.
도 13은 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 3차원 시나리오 저작도구로 표현된 시나리오 수정 화면의 일예로서, 검색 기능에 의해서 시나리오를 검색할 수 있으며, 검색된 시나리오를 수정하는 기능을 제공한다. 시나리오가 수정되면 3차원 공간상에서 즉시 수정된 시나리오가 반영된다.
도 14는 본 발명에 따른 대화 시나리오 데이터베이스에 저장된 대화 시나리오가 3차원 시나리오 저작도구로 표현된 시나리오 삭제 화면의 일예로서, 시나리오는 삭제할 수 있으며, 삭제된 시나리오는 3차원 공간상에서도 완전히 삭제되어서 더 이상 볼 수 없게 된다.
도 15는 본 발명에 따른 대화 시나리오 자동 수집 및 온톨로지 대화 관계망을 이용한 연속 대화가 이루어지는 시퀀스를 블럭 다이어그램으로 도시한 도면이고, 도 16은 본 발명에 따른 대화 시나리오 자동 수집 및 온톨로지 대화 관계망을 이용한 연속 대화가 이루어지는 시퀀스를 순서도로써 도시한 도면이다.
앞서 전술한 바와 같이, 온톨로지 대화 관계망은 300~600개의 벡터로 구성된 다차원벡터공간이지만, 다차원 벡터를 물리적으로 표현할 수 없기 때문에 3차원으로 공간압축(PCA 등 공간압축 알고리즘 사용)해서 표시한다. 온톨로지 대화 관계망은 3차원 공간상의 하나의 점을 노드로 표현하는데, 각 노드는 단어 의미벡터, 문장의미벡터, 시나리오 의미벡터를 나타낼 수 있다. '온톨로지 대화 관계망'이라 할 때는 그러한 노드 및, 노드 간의 연결에 의한 시나리오 구성 등의 전체 구조를 표현하며, '온톨로지 대화 관계망 데이터베이스'라 할 때는 그러한 데이터 및 구조 관계를 저장하고 있는 저장부를 표현하나, 이하에서는 큰 의미 구별없이 혼용하여 사용하기로 한다.
또한 전술한 바와 같이, 의미단어는, 유사한 의미의 여러 문장을 대표하는 단어이며, 하나의 단어로 이루어질 수도 있고, 여러 단어의 결합으로 이루어질 수도 있다. 예를 들어, 의미단어 “이름모호“는 의미단어 “이름”과 의미단어 “모호”의 의미벡터가 결합된 새로운 의미벡터이며, “어디서 봤더라”와 같은 문장을 대표하는 의미단어임은 앞서 설명한 바와 같다. 즉, 의미단어 역시 온톨로지 대화 관계망 공간에서 하나의 의미벡터에 의해 나타낼 수 있으며, 하나의 노드를 가진다.
시나리오는 3차원의 온톨로지 대화 관계망 공간상에 표시할 수 있으며, 시나리오의 각 질문과 답변은 하나의 의미노드(의미큐브 또는 의미단어)로 표시된다. 각 시나리오에 소속된 의미노드 사이의 의미적인 밀접성에 따라 가까운 공간상에 표시되기도 하고 멀리 떨어진 공간상에 표시되기도 하기 때문에, 시나리오를 구성하는 노드사이의 의미관계 파악이 훨씬 쉽다. 이하에서는 의미노드 역시 단순히 '노드'라 지칭하기로 한다.
이하에서는 도 15의 블럭 다이어그램 시퀀스 및 도 16의 순서도를 참조하여 온톨로지 대화 관계망을 이용한 연속 대화 수행과정을 설명한다.
먼저, 사용자가 말을 할 경우, 본 발명의 온톨로지 대화 관계망을 이용한 연속 대화 시스템(200, 도 19 참조)에 사용자의 음성이 입력된다(S210). 온톨로지 대화 관계망을 이용한 연속 대화 시스템(200)은 입력받은 음성에 대한 문장이해를 위해 문장을 분석한다(S220). 온톨로지 대화 관계망을 이용한 연속 대화 시스템(200)은 온톨로지 대화 관계망 데이터베이스로부터, 분석된 대화문장에 대하여, 해당 대화문장을 대표하는 단어 또는 단어의 조합인 의미단어가 상기 대화문장과 매칭되어 있는지 파악한다(S230).
의미단어가 매칭되어 있지 않은 경우(S240), 해당 대화문장에 대하여 의미단어를 결정하여 상기 대화문장에 매칭시켜 온톨로지 대화 관계망 데이터베이스에 저장하는데(S250), 이에 대하여는 도 17을 참조하여 상세히 후술한다.
상기 대화문장에 의미단어가 매칭되어 있지 않아 새로운 의미단어를 결정하여 상기 대화문장에 매칭시킨 경우든지, 또는 상기 대화문장에 이미 의미단어가 매칭되있는 경우든지, 해당 대화문장에 의미단어가 매칭되어 온톨로지 대화 관계망 데이터베이스에 저장된 후, 온톨로지 대화 관계망 데이터베이스에서의 대화 시나리오에 따라, 상기 대화문장에 매칭된 의미단어(이하 '제1 의미단어'라 한다)에 이어지는 의미단어(이하 '제2 의미단어'라 한다)를 추출하고(S260), 해당 의미단어에 포함된 대화문장 중 하나를 선택하여(S270), 선택된 대화문장을 사용자에게 음성으로 출력하게 된다(S280). 예를 들어 제1 의미단어는 현재 사용자가 말하여 입력된 질문 대화문장을 대표하는 의미단어일 수 있고, 제2 의미단어는 이에 대하여 온톨로지 대화 관계망을 이용한 연속 대화 시스템(200)이 답변하기로 결정한 답변 대화문장을 대표하는 의미단어일 수 있다.
온톨로지 대화 관계망 데이터베이스에서의 대화 시나리오에 따라, 상기 대화문장에 매칭된 제1 의미단어에 이어지는 의미단어가 다수개 존재하는 경우가 있을 수 있다. 이러한 경우는 각 의미단어로의 시나리오에 부여되어 있는 가중치가 가장 높은 의미단어를 제2 의미단어로 추출하도록 할 수 있다. 예를 들어 제1 의미단어에 이어지는 시나리오가, 제1 의미단어->'A(의미단어)'(가중치 0.7), 제1 의미단어->'B(의미단어)'(가중치 0.2), 제1 의미단어->'C(의미단어)'(가중치 0.1)와 같이 3가지가 있을 경우, 제1 의미단어->'A(의미단어)'(가중치 0.7)로 결정하여 의미단어 A를 제2 의미단어로 결정할 수 있는 것이다. 이와 같은 가중치는, 평소 사용자와의 대화를 진단 및 분석하는 대화품질 관리를 통해 미리 설정해 놓을 수 있는데, 이에 대하여는 도 18을 참조하여 후술한다.
도 17은 본 발명에 따른 온톨로지 대화 관계망을 이용한 연속 대화 중, 신규 대화문장이 입력되었을 경우, 온톨로지 대화 관계망으로 매핑하는 방법을 수행하는 순서도이다.
새로운 시나리오가 입력되면, 해당 시나리오의 질문과 답변 문장의 의미벡터값을 추출하고, 해당 질문 답변 의미벡터의 대화의도를 각각 분류해서 입력 시나리오(“대화의도(의미단어), 대화문장, 의미벡터”의 연속된 리스트)를 대화 관계망에 매핑한다.
즉, 사용자에 의해 입력된 대화문장에 의미단어가 매칭되어 있지 않은 경우, 즉 새로운 시나리오가 입력된 경우(S240), 해당 대화문장의 의미벡터를 도출하고(S251), 상기 의미벡터로부터 상기 대화문장에 해당하는 의미단어를 분류한다(S252). 의미단어를 분류하는 방법은, 규칙기반, 통계기반, 머신 러닝(machine learning) 기반 중 하나 이상의 방법을 사용할 수 있다. 여기서 머신 러닝은, 도 15에서 '대화 시나리오 학습' 블럭에 해당된다.
상기 대화문장에 해당하는 의미단어가 분류된 경우(S252), 그 의미단어를 상기 대화문장의 의미단어로 결정하여 상기 대화문장에 매칭시켜 온톨로지 대화 관계망 데이터베이스에 저장한다(S253). 이때 전술한 바와 같이 분류된 의미단어, 대화문장, 의미벡터의 연속된 리스트를 온톨로지 대화 관계망 데이터베이스에 저장할 수 있다.
상기 대화문장에 해당하는 의미단어가 분류되지 않은 경우(S252), 상기 의미벡터가 가리키는 노드와 가장 가까운 노드에 매핑되어 있는 의미단어를 온톨로지 대화 관계망 데이터베이스로부터 파악하여, 그 파악된 의미단어를 상기 대화문장의 의미단어로 결정하여 상기 대화문장에 매칭시켜 온톨로지 대화 관계망 데이터베이스에 저장하며(S254), 이때도 역시 분류된 의미단어, 대화문장, 의미벡터의 연속된 리스트를 온톨로지 대화 관계망 데이터베이스에 저장할 수 있다.
이때, 의미벡터가 가리키는 노드와 가장 가까운 노드에 매핑되어 있는 의미단어는, 단일 의미단어일 수도 있고, 또는 의미단어의 조합으로 생성된 의미벡터가 가리키는 새로운 의미단어일 수도 있다.
하나의 시나리오는 질문 답변의 연속된 쌍으로 표현할 수 있는데, 각 질문 답변이 모여서 하나의 시나리오를 형성하듯이, 시나리오는 특정 대화 영역 정보(교육, 문학, 상식, 일상대화, 스포츠, 영화 등)를 가지고 있으며, 시나리오도 의미단어의 조합으로 나타낼 수 있다.
시나리오 분류도 규칙기반, 통계기반, 머신 러닝 기반 중 하나 이상의 방법 사용하여 각 시나리오를 분류할 수 있다. 따라서, 각 문장의 대화의도를 분류하기 전에 시나리오 분류를 먼저 선택하여, 해당 시나리오 분류에 속하는 대화의도(의미단어)를 먼저 탐색하여, 검색시간을 줄일 수도 있다.
도 18은 본 발명에 따른 온톨로지 대화 관계망을 이용한 연속 대화를 이용하여 대화품질 개선을 구현하기 위한 시퀀스를 블럭 다이어그램으로 도시한 도면이다.
사용자는 시스템과 대화를 진행하면서 사용자의 반응이 긍정적인지 부정적인지 즉각적으로 알 수 있다. 긍정적인 반응, 부정적인 반응, 무시 반응, 화제 변경 반응 등 사용자의 반응은 사용자의 대화문장을 분석해서 알 수 있으며, 이때, 규칙기반, 통계기반, 머신러닝기반 등 다양한 분석 방법을 사용할 수 있다.
사용자 반응이 긍정적일 때, 현재 시나리오를 선호 대화 시나리오라고 인식할 수 있으며, 가중치 조정 등을 통해서 현재 시나리오가 채택될 확률을 높여서, 선호 시나리오가 계속 선택되고, 비 선호 시나리오는 우선 순위가 밀리게 하는 등, 실시간 사용자 반응을 통해서 지속적으로 대화 시나리오를 관리하여, 궁극적으로 연속 대화를 하면 할수록 대화품질이 개선되도록 하는 것을 목표로 한다.
이와 같은 과정을 도 18을 참조하여 설명하면, 지속적인 대화품질 관리는, 평소 사용자와 본 발명의 온톨로지 대화 관계망을 이용한 연속 대화 시스템(200) 간의 대화 시나리오 이력을 통해 이루어진다. 즉, 평소 사용자로부터 입력되는 문장을 분석하고, 분석된 대화문장으로부터 사용자의 반응을 분석한다. 사용자 반응이 긍정적인 것으로 분석된 경우 해당 대화문장 또는 해당 대화문장의 의미단어로 이어지는 대화 시나리오가 선택될 가능성을 높게 설정하고, 사용자 반응이 부정적인 것으로 분석된 경우 해당 대화문장 또는 해당 대화문장의 의미단어로 이어지는 대화 시나리오가 선택될 가능성을 낮게 설정하는 것이다.
여기서 긍정적 또는 부정적이라 함은 매우 포괄적으로 사용한 용어이다. 도 18의 예로서 도시한 바와 같이, 칭찬 반응, 긍정적인 반응, 부정적인 반응, 무시 반응, 화제 변경 반응 등, 세부적으로 나눌 수도 있지만, 이 모든 것을 포괄하는 것으로 긍정적 또는 부정적 반응으로 설명할 수 있다. 이는 2 종류의 반응으로 분석한다는 것이 아니라, 일종의 긍정도(degree) 또는 부정도(degree)를 준다는 것이 더 정확한 표현일 것이다. 즉, 문장분석에 의한 사용자의 반응이 더 호응적이고 좋은 반응일수록, 피드백을 통해 그와 같은 반응이 나온 대화 시나리오가 향후 선택될 가능성을 더 높게 하고, 반대일수록 그와 같은 반응이 나온 대화 시나리오가 향후 선택될 가능성을 더 낮게 하는 것이다.
향후 대화 시나리오가 선택될 가능성의 설정은, 해당 사용자 반응이 나온 대화 시나리오에, 기 설정된 기준에 따라 선호/비선호 대화 시나리오와 관련한 가중치를 부여함으로써 이루어질 수 있다. 이로써 향후 가중치가 높은 대화 시나리오가 선택될 확률을 높게 하는 것이다.
이때 사용자 반응의 분석은, 사용자의 대화문장을 규칙에 따라 분석함을 통하여 수행하거나(규칙기반), 대화문장의 통계를 기반하여 수행하거나(통계기반), 머신러닝(machine learning) 방식으로 수행(머신러닝 기반)할 수 있다.
또는 상기 사용자 반응의 분석은, 온톨로지 대화 관계망 데이터베이스에서 상기 사용자의 대화문장이 해당하는 의미단어에 대한, 기 분석되어 매핑되어 있는 사용자 선호도에 의해 이루어질 수도 있다. 즉, 온톨로지 대화 관계망 데이터베이스 상에서 의미단어 공간과, 사용자 선호도 공간이 별도로 구비될 수도 있고, 이때 각 의미단어 공간의 특정 의미단어 노드와 사용자 선호도 공간의 동일 노드의 사용자 선호도가 매핑되도록 할 수 있다. 즉, 사용자의 그 의미단어에 대하여 사용자 반응으로 분석된 사용자 선호도가 기 매핑되어 있어, 이로부터 해당 대화에 대한 가중치를 설정할 수도 있는 것이다.
도 19는 본 발명에 따른 온톨로지 대화 관계망을 이용한 연속 대화 시스템의 구성을 나타내는 도면이다.
제어부(201)는 온톨로지 대화 관계망을 이용한 연속 대화 시스템의 상기 각 구성요소를 제어하여 온톨로지 대화 관계망을 이용한 연속 대화와 관련한 일련의 처리를 수행한다.
음성입력부(202)는 사용자의 음성을 입력받는다.
문장분석부(203)는 상기 입력받은 음성으로부터 문장분석을 수행한다.
대화관리부(204)는 온톨로지 대화 관계망 데이터베이스로부터, 분석된 대화문장에 대하여 해당 대화문장을 대표하는 단어 또는 단어의 조합인 의미단어(이하 '제1 의미단어'라 한다)에 이어지는 의미단어(이하 '제2 의미단어'라 한다)를 추출하고, 해당 의미단어에 포함된 대화문장 중 하나를 출력 대화문장으로 선택하는 역할을 수행한다. 또한 대화관리부(204)는, 온톨로지 대화 관계망 데이터베이스에서의 대화 시나리오에 따라, 상기 대화문장에 매칭된 제1 의미단어에 이어지는 의미단어가 다수개 존재하는 경우, 각 의미단어로의 시나리오에 부여되어 있는 가중치가 가장 높은 의미단어를 제2 의미단어로 추출하는 역할을 수행할 수도 있다.
신규 시나리오 처리부(206)는 분석된 대화문장에 대하여 의미단어가 매칭되어 있지 않은 경우, 해당 문장에 대한 새로운 의미단어를 결정하여 상기 문장에 매칭시켜 온톨로지 대화 관계망 데이터베이스에 저장하고, 결정된 의미단어를 상기 대화관리부에 전달하는 역할을 수행한다.
상기 신규 시나리오 처리부는, 입력받은 대화문장에 의미단어가 매칭되어 있지 않은 경우, 해당 대화문장의 의미벡터를 도출할 수 있다. 도출된 의미벡터로부터, 상기 대화문장에 해당하는 의미단어를 분류한다. 상기 대화문장에 해당하는 의미단어가 분류된 경우, 그 의미단어를 상기 대화문장의 의미단어로 결정하여 상기 대화문장에 매칭시켜 온톨로지 대화 관계망 데이터베이스에 저장하고, 상기 대화문장에 해당하는 의미단어가 분류되지 않은 경우, 상기 의미벡터가 가리키는 노드와 가장 가까운 노드에 매핑되어 있는 의미단어를 온톨로지 대화 관계망 데이터베이스로부터 파악하여, 그 파악된 의미단어를 상기 대화문장의 의미단어로 결정하여 상기 대화문장에 매칭시켜 온톨로지 대화 관계망 데이터베이스에 저장하고, 결정된 의미단어를 상기 대화관리부에 전달하는 역할을 수행할 수 있다.
도 17을 참조하여 전술한 바와 같이, 상기 의미벡터가 가리키는 노드와 가장 가까운 노드에 매핑되어 있는 의미단어는, 단일 의미단어 또는, 의미단어의 조합으로 생성된 의미벡터가 가리키는 의미단어일 수 있다.
대화품질 관리부(205)는, 상기 분석된 대화문장으로부터, 사용자의 반응을 분석하여, 사용자 반응이 긍정적인 것으로 분석된 경우 해당 대화문장 또는 해당 대화문장의 의미단어로 이어지는 대화 시나리오가 선택될 가능성을 높게 설정하고, 사용자 반응이 부정적인 것으로 분석된 경우 해당 대화문장 또는 해당 대화문장의 의미단어로 이어지는 대화 시나리오가 선택될 가능성을 낮게 설정함으로써 대화 품질관리를 수행한다. 이때 도 18을 참조하여 전술한 바와 같이, 상기 대화 시나리오가 선택될 가능성의 설정은, 기 설정된 기준에 따라 가중치를 부여함으로써 이루어질 수 있다. 또한 사용자 반응의 분석은, 상기 대화문장을 규칙에 따라 분석함을 통하여 수행하거나(규칙기반), 대화문장의 통계를 기반하여 수행하거나(통계기반), 머신러닝(machine learning) 방식으로 수행(머신러닝기반)할 수 있다. 또는 사용자 반응의 분석은, 온톨로지 대화 관계망 데이터베이스에서 상기 사용자의 대화문장이 해당하는 의미단어에 대한, 기 분석되어 매핑되어 있는 사용자 선호도에 의해 이루어질 수도 있는데, 이에 대하여는 도 18을 참조하여 상세히 설명한 바 있다.
대화출력부(207)는상기 대화관리부에서 선택된 대화문장을 사용자에게 음성 등으로 출력하는 역할을 수행한다.
도 20은 본 발명에 따른 온톨로지 대화 관계망 구조의 일 실시예를 도시한 도면이다.
즉, 도 20의 실시예를 참조하면, 온톨로지 대화 관계망의 대화는 일반대화 도메인, 전문대화 도메인, 또는 필요에 따라 다른 새로운 도메인을 설정하여 그러한 도메인 등으로 분류될 수 있다. 즉, 온톨로지 대화 관계망 공간에 매핑되어 있는 의미단어 등의 수많은 노드는, 이미 각각 일반대화, 전문대화 등의 영역으로 이미 설정되어 있을 수 있다. 즉, 각 문장에 대한 의미벡터를 구할 경우, 문장의 의미에 따라 각 의미벡터가 가리키는 노드는, 온톨로지 대화 관계망 공간상에서 정해진 분류의 영역을 가리키도록 구성되어 있을 수 있다. 물론 일반대화 또는 전문대화의 하부에 세부적 분류를 한 경우도 마찬가지 방식일 수 있다.
일반대화(일상대화, 상식대화, 주제대화 등)는 다양한 주제에 대해서 대화 관계망(Network)속의 대화 시나리오에 따라 대화를 전개한다.
전문대화(예매, 예약, 구입 등)는 특정 목적의 대화 전략(Strategy)에 따라 사용자에게 특정정보(Argument)를 얻어서 과제 수행을 위한 대화를 수행한다. 예를 들자면, KTX예약의 경우 예약시간, 동행자 수, 좌석종류 등 예약에 필요한 정보를 사용자에게 물어봐서, 해당 정보가 충족된 뒤에야 예약을 마무리 하게 된다. 따라서, 특정 목적(Task)에 맞는, 대화 절차(Flow)를 따라야 한다.
이와 같은 방식으로 온톨로지 대화 관계망에서 일반대화와 전문대화가 자유롭게 이루어질 수 있다.
이하 도 21 내지 도 24는 이와 같이 온톨로지 대화 관계망에서 분류된 대화의 예를 나타낸다.
도 21은 본 발명에 따른 온톨로지 대화 관계망에서 일반대화 분류 구조의 일 실시예를 도시한 도면이다.
일반대화에는 일상대화, 상식대화, 주제대화, 감성대화, 주제대화 등이 올 수 있으며, 다양한 주제에 대해서 끊임없이 대화 포커스를 이동시키면 대화가 가능하다.
도 22는 본 발명에 따른 온톨로지 대화 관계망에서 전문대화 분류 구조의 일 실시예를 도시한 도면이다.
전문대화는 예매, 예약, 구입 등 특정 목적기반 대화를 진행하는 것을 말한다.
도 23은 본 발명에 따른 온톨로지 대화 관계망에서 전문대화 중 병원 콜센터에서 상담원과 주고받는 대화 분류 구조의 일 실시예를 도시한 도면이다.
도 24는 본 발명에 따른 온톨로지 대화 관계망에서 일상대화와 감성대화 및 전문대화가 연결되는 분류 구조의 일 실시예를 도시한 도면이다.
일상대화와 감성대화, 전문대화를 자유롭게 오고가면서 연속적으로 대화가 이루어질 수 있다.
이상과 같이, 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 이것에 의해 한정되지 않으며 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 본 발명의 기술사상과 아래에 기재될 특허청구범위의 균등범위 내에서 다양한 수정 및 변형이 가능함은 물론이다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Prior to this, terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary terms, and the inventor should appropriately interpret the concepts of the terms appropriately It should be interpreted in accordance with the meaning and concept consistent with the technical idea of the present invention based on the principle that it can be defined. Therefore, the embodiments described in this specification and the configurations shown in the drawings are merely the most preferred embodiments of the present invention and do not represent all the technical ideas of the present invention. Therefore, It is to be understood that equivalents and modifications are possible.
1 is a flowchart illustrating a method for constructing a dialogue scenario database for a dialog system according to the present invention.
In the dialogue scenario database building method for the dialog system according to FIG. 1, a sentence is extracted first (S110). Extraction of sentences is extracted from conversational voice files or conversational posts.
If a sentence is extracted in step S110, the extracted sentence is classified into a question and an answer (S120).
Then, it is determined whether the question and answer sentences classified in step S120 correspond to each other and connected to each other by the sentiment, intention, affirmation, and negation (S130), and the selected question and answer sentence is generated as a list-type scenario ).
Subsequently, the scenario and the question and answer sentences of the generated scenario, which are the list forms, are converted into semantic vectors to learn a scenario (S150), and the learned scenario is converted into a database with a continuous scenario meaning vector (S160).
The consecutive semantic vectors of the database-dated scenario are displayed in the ontology multi-dimensional space as one of semantic words, meaning nodes, and semantic cubes (S160).
The method for constructing a dialog scenario database for the dialog system of the present invention shown in FIG. 1 will be described with reference to a dialogue scenario database establishing apparatus for the dialog system of FIGS.
FIG. 2 is a view schematically showing an apparatus for constructing a dialogue scenario database for an interactive system according to the present invention, and FIG. 3 is a detailed view illustrating an apparatus for constructing a dialogue scenario database for the interactive system according to FIG.
2 to 3, an apparatus 100 for constructing a dialogue scenario database for an interactive system according to the present invention includes a sentence extraction unit 110 for extracting a sentence, a sentence extraction unit 110 for extracting a sentence extracted by the sentence extraction unit 110, A dialogue scenario generation unit 130 for generating a scenario in which sentences classified by the sentence analysis unit 120 correspond to each other and conversations are connected to each other, A dialogue scenario learning unit 140 for learning the sentence of the dialogue scenario generated by the dialogue scenario generation unit 130, a dialogue scenario storage unit 140 for storing a dialogue scenario, which is learned by the dialogue scenario learning unit 140, The scenario database 150 and the continuous semantic vectors of the dialog scenario stored in the dialog scenario database 150 are stored in the ontology multidimensional space with semantic words or semantic nodes Hitting include ontology mapping relationship unit 160 to display one of the mean cube.
The sentence extracting unit 110 extracts a sentence from an interactive voice file or a conversation type of text. The sentence extracting unit 110 includes a voice file extracting module 111 for extracting an interactive voice file, 3, an interactive voice file, such as a call center, a radio, and a TV broadcast, is sent to a voice file extraction module 111 through a voice file extraction module 111. The voice file extraction module 111 extracts a sentence When a search word is selected in the search word generator 113, a sentence is extracted because the search is performed by the postexpression module 112 and the posting is parsed. For example, it is possible to extract a sentence from a text after speech recognition of an existing voice recording file provided by a call center, or extract a sentence from text after speech recognition of a consultation content between a customer and an agent in a call center in real time. On the other hand, SNS extracts sentences from posts that talk about various subjects on Twitter or Facebook, searches for SNS with the prepared search words, extracts each of the searched search links (specific words) From the beginning of the first entry pointed to by the posting to the last posting of the discussion starting from that posting as a conversation topic, extract all the posts for that conversation topic.
The sentence analyzing unit 120 classifies sentences extracted from the sentence extracting unit 110 as questions and answers. If an error word occurs in a sentence extracted as text after speech recognition before classification, Restore the error-resolved sentences or filter inappropriate conversations to analyze and classify whether the sentence is a question or an answer. The sentence analysis unit 120 may be configured based on rules, machine learning, statistics, or a combination of at least one of rule based, machine learning based, and statistical based. When the sentence analyzing unit 120 is based on a machine learning system, learning is performed by learning a map, and the input sentence is analyzed to determine whether the input sentence is a question or an answer based on the language model data constructed as a result of the instruction.
The dialogue scenario generation unit 130 selects whether the question and the answer sentence analyzed through the sentence analysis unit 120 correspond to each other and connected to each other by emotion, intention, affirmation, and negation, and the selected question and answer sentence are connected to each other And the scenario is created by using the semantic vector data of the scenario sentence for the scenario generation and by the map learning based on the machine learning.
The dialogue scenario learning unit 140 changes the question and answer sentences of the scenario generated by the dialogue scenario generation unit 130 into a semantic vector and learns by map learning based on a machine learning basis.
Generally, supervised learning is machine learning that produces functions from training data. The training data consists of a pair of inputs (typically a vector) and a desired output. The output of the function may be a continuous value or it may predict the classification name of the input object. The job of learning a map is to predict the value of a function for a valid input target by reporting only a small number of training examples, the input pair and the target outputs. To do this, the learner has to generalize from the current data to the invisible situation in a reasonable way.
The dialogue scenario database 150 stores the dialogue scenario database as a semantic vector, and the scenario mean vector learned in the dialogue scenario learning unit 140 is stored. At this time, the dialogue scenario consecutively stores semantic vectors composed of sentence-level questions and answers. Sentences, and scenarios are stored as one or more semantic words to help human be intuitively grasped.
The ontology relationship mapping unit 160 performs a function of mapping and displaying the dialog scenario of the continuous semantic vector stored in the ontology dialog network when the dialog scenario database 150 storing the continuous semantic vector of the dialog scenario is constructed . The ontology dialogue network is a multidimensional space composed of 300 to 600 vectors, but since the multidimensional vectors can not be represented physically, they are compressed in three dimensions and displayed. The ontology conversation network expresses one point on the three-dimensional space as a node, and each node can be represented from a word semantic vector, a sentence semantic vector, and a scenario semantic vector stored in the database 150. Here, the word semantic vector and the sentence semantic vector use the previously disclosed word2vec and sent2vec machine learning algorithms, and the scenarios can be expressed in scenario2vec as the scenario semantic vector, and the scenario can be expressed as the semantic vector of the question and answer sentences. .
The ontology dialogue network can upload the continuous semantic vector of the dialogue scenario stored in the dialogue scenario database 150 via the ontology relationship mapping unit 160. The scenario itself can be made into a semantic vector and uploaded, When searching similar scenarios, it is possible to map the scenario closest to the input scenario by distance calculation on the semantic vector space. At this time, words, sentences, and scenarios can be expressed on the semantic vector space, but words, sentences, and scenarios each have a separate semantic space. In addition, words, sentences, and scenarios can be expressed in space while mapping the semantic space of words, sentences, and scenarios to a single semantic space. In the method of mapping, the sentence is expressed as the sum or product of word vectors, and the scenario is expressed as the sum or product of the sentence vectors. The semantic vector of the sentence scenario is used to help the intuitive understanding of the human being And is represented by one or more semantic words.
On the other hand, the dialog scenario stored in the dialog scenario database 150 can directly input the dialog scenario in the ontology dialogue network through the ontology relationship mapping unit 160, and the relationship of the sentences constituting the scenario is described in the following [Embodiment 1], as shown in Fig.
[Example 1]
Scenario 1 = (Question 1) - (Answer 1) - (Question 2) - (Answer 2) .... (Question N) - (Answer N)
Scenario 1 = (Where did you see it?) - (Are you curious about my name?) - (Not where I saw you) - (Where did we last meet? I think I saw it last time at the last Yaliki Seminar.)
Scenario 1 = (name + ambiguous) - (name + confirmation) - (place + ambiguous) - (place + confirmation) .... (alias + accept) - (alias + answer)
Here, the semantic word " name ambiguity " is a new semantic vector that is a combination of the semantic word " name " and the semantic vector of the semantic word " ambiguity ", and is a semantic word representing a sentence such as "
One semantic word represents one or more sentences semantically, and it can also be called a representative word representing many sentences with the same meaning.
In addition, semantic vocabulary such as "semantic ambiguity" and "name resolution" represent the continuity of each question and answer constituting one scenario, such as "(Question 1)" and "(Answer 1)".
4 to 5 are views showing examples of sentence extraction during database construction of a dialogue scenario for the dialog system according to the present invention.
As shown in FIG. 4, the sentence extraction extracts a word from a DB pedia or WordNet, uses the word as a search word, searches for SNS posts and comment data Is parsed through the web. This is possible because when a post is posted on Twitter, the comment is similar to the format in which you are retweeting and exchanging conversations about a specific topic.
FIG. 5 retweets the article of the user who originally posted the article, and the other user continues to comment. At this time, the conversation development is similar to the tree form. Here, each tree is viewed as a dialog scenario, and a dialog scenario is expressed for each comment ID in order to facilitate understanding, as in the following [second embodiment].
[Example 2]
Scenario 1: ffebreze - hatter365 - fffebreze
Scenario 2: ffebreze - ffebreze - Teahya - ffebreze - Teahya
FIG. 6 is a screen showing an example of sentence analysis during the construction of a dialogue scenario database for the dialog system according to the present invention. As shown in FIG. 6, after parsing a posting, it is determined whether the corresponding posture corresponds to a question in the dialogue scenario none. Therefore, classify question and answer types by instructional learning of machine learning.
FIG. 7 is a view showing an example of scenario generation during construction of a dialogue scenario database for the dialog system according to the present invention, and FIG. 8 is a screen showing an example of dialogue scenario learning during dialogue scenario database construction for the dialogue system according to the present invention . As shown in FIG. 7, a question and an answer may not be displayed alternately for one subject, and a question and an answer may overlap (for example, question 1, answer 1, question 2, question 2, answer 2, question 3 , Answer 3, answer 3, answer 3, etc.) Among the questions and answers, questions and answers suitable for the scenario are selected, and the selected dialogue scenario as shown in FIG. 8 is displayed as one question and one answer It shows the iterative learning.
FIG. 9 is a screen showing an example in which a dialogue scenario database for an interactive system constructed according to the present invention is mapped to an ontology dialogue network.
The scenario is represented by a sequence of semantic vectors consisting of sentence-level questions and answers. Each question and answer that constitutes a scenario has a multidimensional semantic vector, and each question and answer sentence is automatically mapped to a dialogue intention (meaning word).
The dialogue intention (semantic word) expresses the currently used word and the semantic vector value of the word in a three-dimensional space in advance using word2vec or the like, (Meaning words) on a three-dimensional space.
When a large number of scenarios are input as a multidimensional semantic vector value, the distance between the input semantic vector value and the semantic vector value of the existing dialog intention (semantic word) is compared. When the input semantic vector value (Semantic word or semantic node or semantic cube) to the user. (The input semantic vector value is named as the name of the existing conversation)
When a large number of scenarios are inputted at the same time as the conversation intention (meaning word or semantic node or semantic cube) and dialogue sentence as well as the multidimensional semantic vector value, the space of each conversation intention (semantic word or semantic node or semantic cube) (Each question and answer). At this time, the semantic word may be automatically assigned to a semantic word by automatically analyzing the "conversational sentence" of the scenario, or a person may directly attach a semantic word suitable for the "conversational sentence".
[Example 3]
Scenario 1 = [Semantic word] [Conversation sentence] [Semantic vector], [Semantic word] [Conversation sentence] [Semantic vector] ....
Scenario 1 = [name ambiguity] [Where do you think] [2.382, 6.108, ...], [Name resolution] [Are you curious about Jay's name?] [8.730, 1,383, ....] ....
The method of automatically analyzing [semantic words] can be analyzed based on natural language processing methods, rule-based, statistical-based, and machine-based. In the case of machine learning based on the learning model, The intent of the conversation (intention word) is classified.
FIG. 10 shows an example of a dialog scenario stored in the dialogue scenario database according to the present invention. The dialogue scenario is usually displayed at 2 turn (relation of question 1 - answer 1 - question 2 - answer 2) The number of scenarios that can be displayed is also limited, and scenarios that are spatially close to the current scenario are also unknown. In addition, even if a scenario is input or modified, there is a problem that the correlation in the semantic space with other scenarios can not be known at all.
Dialogue scenario means and displayed as a continuous sequence of words, figure like the first scenario, the top, "the name is ambiguous", and certain characters in order to avoid duplication, such as "check C Name", "C yes", "KE name of IU" It can also be combined and displayed.
FIG. 11 is a screen in which a dialog scenario stored in the dialog scenario database according to the present invention is expressed by a three-dimensional scenario authoring tool, and the scenario can be displayed in a three-dimensional space as shown in FIG. Each question and answer in the scenario is represented by a semantic node (meaning cube or semantic word). Since semantic closeness between semantic nodes belonging to each scenario is displayed in the near space and displayed in the distant space, it is much easier to grasp the semantic relationship between the nodes constituting the scenario.
FIG. 12 is an example of a scenario input screen in which a dialog scenario stored in the dialog scenario database according to the present invention is expressed by a three-dimensional scenario authoring tool. As shown in FIG. 12, the scenario includes a right semantic word (semantic node or semantic cube) Dragging to the left margin space, one scenario is represented by a sequence of semantic words (eg, weekend schedule - schedule answer - hobby question - hobby answer). When the scenario input is completed, the scenario is automatically mapped in the three dimensional dialogue network . In addition, the right semantic word pattern has a multidimensional semantic vector value, and the semantic vector value is extracted by learning the combination of the semantic word (weekend + schedule) separately, or the semantic vector sum or product of the existing semantic word It is possible.
The right semantic word (for example, a weekend schedule) represents various sentences and has various sentence structures such as the following, which implies a "weekend schedule".
FIG. 13 is an example of a scenario modification screen in which a dialog scenario stored in the dialog scenario database according to the present invention is expressed by a three-dimensional scenario authoring tool. The scenario may be searched by a search function, and a function of modifying a searched scenario is provided . When the scenario is modified, the modified scenario immediately reflects in the 3D space.
FIG. 14 is an example of a scenario deletion screen in which a dialog scenario stored in the dialog scenario database according to the present invention is represented by a three-dimensional scenario authoring tool. The scenario can be deleted, and the deleted scenario is completely deleted in the three- Can not.
FIG. 15 is a block diagram showing a sequence in which continuous conversation is performed using the conversation scenario automatic collecting and ontology conversation network according to the present invention. FIG. 16 is a block diagram showing a sequence diagram of automatic conversation scenario collection and on- As a sequence diagram.
As described above, the ontology dialogue network is a multidimensional vector space composed of 300 to 600 vectors. However, since the multidimensional vector can not be represented physically, spatial compression (using a space compression algorithm such as PCA) is displayed in three dimensions. The ontology dialogue network expresses one point on the 3D space as nodes, and each node can represent a word semantic vector, a sentence semantic vector, and a scenario semantic vector. The term 'ontology dialogue network' expresses the entire structure of such a node and the configuration of a scenario by connection between nodes. When 'ontology dialogue network database' is used, the storage unit storing such data and structure relation is expressed, Hereinafter, they will be used in a mixed manner with no significant discrimination.
Also, as described above, the semantic word is a word representing a plurality of sentences having a similar meaning, and may be a single word or a combination of several words. For example, the semantic word " name ambiguity " is a new meaning vector in which the semantic vector of the semantic word " name " is combined with the semantic vector of the semantic word " ambiguity ", and the semantic word representing " Same as. That is, the semantic word can be represented by a semantic vector in the ontology dialog network space, and has one node.
Scenarios can be displayed on a three-dimensional ontology dialogue network space, and each question and answer in a scenario is represented by a semantic node (meaning cube or semantic word). Since semantic closeness between semantic nodes belonging to each scenario is displayed in the near space and displayed in the distant space, it is much easier to grasp the semantic relationship between the nodes constituting the scenario. Hereinafter, the semantic node will also be referred to simply as a 'node'.
Hereinafter, a continuous conversation process using the ontology conversation network will be described with reference to the block diagram sequence of FIG. 15 and the flowchart of FIG.
First, when the user speaks, the user's voice is input to the continuous conversation system 200 (see FIG. 19) using the ontology conversation network of the present invention (S210). The continuous conversation system 200 using the ontology conversation network analyzes a sentence to understand a sentence of the input voice (S220). The continuous conversation system 200 using the ontology conversation network determines whether a semantic word that is a word or combination of words representing the conversation sentence is matched with the conversation sentence from the ontology conversation relationship network database S230).
If the semantic word is not matched (S240), a semantic word is determined for the conversation sentence, and the semantic word is matched to the conversation sentence and stored in the ontology conversation network database (S250), which will be described later in detail with reference to FIG.
A meaning word is not matched to the conversation sentence, and a new meaning word is determined and matched to the conversation sentence, or when a meaning word already matches the conversation sentence, a meaning word is matched to the conversation sentence (Hereinafter referred to as 'second semantic word') following the semantic word (hereinafter referred to as 'first semantic word') matched to the conversational sentence, in accordance with the conversation scenario in the ontology conversational network database, (S260), selects one of the conversation sentences included in the corresponding word (S270), and outputs the selected conversation sentence to the user (S280). For example, the first semantic word may be a semantic word representing a question conversation sentence that the current user has spoken, and the second semantic word may be a semantic word that the contiguous conversation system 200 using the ontology conversation network decided to answer Answer It can be a semantic word representing a conversation sentence.
There may be a case where there are a plurality of semantic words following the first semantic word matched to the conversation sentence according to the conversation scenario in the ontology dialog network database. In this case, it is possible to extract a semantic word having the highest weight given to the scenario for each semantic word as a second semantic word. For example, if the scenario following the first semantic word is the first semantic word -> A (meaning word) (weight 0.7), the first semantic word -> B (semantic word) A (meaning word) '(weighting 0.7) when there are three kinds of meaning words ->' C (meaning word) '(weighting 0.1) You can decide by word. Such a weight can be set in advance through conversation quality control for diagnosing and analyzing a conversation with a user, which will be described later with reference to FIG.
FIG. 17 is a flow chart illustrating a method of mapping a new conversation sentence into an ontology conversation network when a new conversation sentence is input, in a continuous conversation using the ontology conversation network according to the present invention.
When a new scenario is input, the semantic vector values of the question and answer sentences of the scenario are extracted, and the conversation intention of the question answering semantic vector is classified and classified into input scenarios ("conversation intention (semantic word) Vector ") to the conversation network.
That is, when a semantic word is not matched to a conversation sentence input by a user, that is, when a new scenario is input (S240), a semantic vector of the conversation sentence is derived (S251) (S252). &Lt; / RTI > Methods for classifying semantic words can be based on one or more of rule based, statistical based, and machine learning based methods. Here, the machine learning corresponds to the 'conversation scenario learning' block in FIG.
If the semantic word corresponding to the conversation sentence is classified (S252), the semantic word is determined to be a semantic word of the conversation sentence, and is matched to the conversation sentence and stored in the ontology conversation network database (S253). At this time, a continuous list of semantic words, conversation sentences, and semantic vectors classified as described above can be stored in the ontology dialog network database.
If the semantic word corresponding to the conversation sentence is not classified (S252), a semantic word mapped to the node closest to the node indicated by the semantic vector is grasped from the ontology dialog network database, (S254). At this time, a continuous list of the classified semantic word, conversation sentence, and semantic vector can also be stored in the ontology dialog network database .
In this case, the semantic word mapped to the node closest to the node indicated by the semantic vector may be a single semantic word, or may be a new semantic word indicated by a semantic vector generated by a combination of semantic words.
A scenario can be expressed as a series of pairs of question answers. Just as each question answer forms a single scenario, a scenario can contain specific dialogue information (education, literature, common sense, everyday conversation, sports, movies, etc.) And scenarios can be represented by a combination of meaning words.
Scenario classifications can be categorized using one or more of the following: rule-based, statistical-based, and machine-based. Therefore, before classifying the conversation intention of each sentence, the scenario classification may be firstly selected and the search intention (semantic word) belonging to the scenario classification may be searched first to reduce the search time.
18 is a block diagram showing a sequence for implementing conversation quality improvement using continuous conversation using the ontology conversation network according to the present invention.
As the user interacts with the system, the user can immediately know whether the user's response is positive or negative. The user's reaction such as positive reaction, negative reaction, neglect reaction, topic change reaction can be analyzed by analyzing the user's conversation sentence, and various analysis methods such as rule based, statistical based, and machine learning based can be used.
When the user response is positive, the current scenario is recognized as the preferred dialog scenario, and the likelihood that the current scenario will be adopted is increased by weight adjustment, so that the preferred scenario is continuously selected, and the non-preferred scenario has priority , And managing the dialogue scenario continuously through the real-time user response and ultimately aiming at improving the conversation quality as the continuous conversation is performed.
This process will be described with reference to FIG. 18. Continuous conversation quality management is performed through a conversation scenario history between the user and the continuous conversation system 200 using the ontology conversation network of the present invention. In other words, the sentence inputted from the user is analyzed and the reaction of the user is analyzed from the analyzed conversation sentence. If the user reaction is analyzed to be positive, the likelihood that the dialog scenario leading to the corresponding sentence sentence or the semantic word of the corresponding sentence is selected is set to be high, and if the user reaction is analyzed as negative, the corresponding sentence or semantic word Is set to a low possibility.
Positive or negative here is a very generic term. As shown in the example of Fig. 18, it can be divided into details such as praise reaction, positive reaction, negative reaction, neglect reaction, topic change reaction, etc. However, it can be explained as a positive or negative reaction including all of them. This would be a more accurate expression of giving a degree or degree, rather than analyzing it with two kinds of reactions. In other words, the more responsive and favorable the user's response to the sentence analysis is, the more likely it is that the dialogue scenario with such response will be selected in the future through feedback, and the more the opposing dialogue scenario The possibility of being selected is lowered.
The setting of the likelihood that a future dialog scenario will be selected may be made by assigning a weight in relation to the preference / non-preference dialog scenario according to predetermined criteria in the dialog scenario in which the user response is made. This increases the probability that a conversation scenario with a higher weight will be selected in the future.
At this time, the analysis of the user reaction may be performed by analyzing the user's conversation sentence according to a rule (rule-based), performing based on the sentence statistics (statistical based), or machine learning Machine learning basis).
Alternatively, the analysis of the user response may be performed by a user preference in which the conversation sentence of the user in the ontology dialog network database is analyzed and mapped to the corresponding semantic word. That is, a semantic word space and a user preference space may be separately provided on the ontology dialog network database, and a specific semantic word node of each semantic word space may be mapped to a user preference of the same node in the user preference space. That is, since the user preference analyzed by the user reaction is mapped to the semantic word of the user, the weight for the conversation can be set therefrom.
19 is a diagram illustrating a configuration of a continuous conversation system using an ontology conversation network according to the present invention.
The control unit 201 controls each of the components of the continuous dialog system using the ontology dialog network to perform a series of processes related to the continuous dialog using the ontology dialog network.
The voice input unit 202 receives the voice of the user.
The sentence analysis unit 203 analyzes sentences from the input speech.
The dialog management unit 204 receives a semantic word (hereinafter referred to as a " first semantic word ") following a semantic word (hereinafter, referred to as 'first semantic word'), which is a combination of words or words representing the conversation sentence, Quot; second semantic word "), and selects one of the conversation sentences included in the corresponding semantic word as an output conversation sentence. In addition, in the case where there are a plurality of semantic words following the first semantic word matched to the conversation sentence in accordance with the conversation scenario in the ontology conversation relationship network database, And extracts the semantic word having the highest weight as the second semantic word.
If the semantic word is not matched with the analyzed conversation sentence, the new scenario processing unit 206 determines a new semantic word for the sentence, stores the new semantic word in the ontology dialog network database, matches the sentence with the sentence, And transmits it to the dialog management unit.
The new scenario processing unit can derive the semantic vector of the corresponding sentence sentence if the sentence sentence is not matched with the semantic word. And classifies the semantic word corresponding to the conversation sentence from the derived semantic vector. If the semantic word corresponding to the conversation sentence is classified, the semantic word is determined to be a semantic word of the conversation sentence, and the semantic word is matched to the conversation sentence and stored in the ontology conversation network database, The semantic vector mapped to the node closest to the node indicated by the semantic vector is grasped from the ontology dialog network database, and the identified semantic word is determined to be a semantic word of the conversation sentence and is matched to the sentence sentence Store it in the ontology dialog network database, and deliver the determined semantic word to the conversation management unit.
As described above with reference to FIG. 17, the semantic word mapped to the node closest to the node indicated by the semantic vector may be a single semantic word or a semantic word indicated by a semantic vector generated by a combination of semantic words.
The dialog quality management unit 205 analyzes the response of the user from the analyzed conversation sentence to determine the possibility that the conversation scenario leading to the corresponding sentence sentence or the semantic word of the corresponding sentence is selected if the user response is analyzed as being positive And when the user reaction is analyzed as negative, the conversation quality management is performed by setting a possibility that the conversation scenario leading to the conversation sentence or the meaning word of the conversation sentence is selected to be low. At this time, as described above with reference to FIG. 18, the setting of the possibility that the conversation scenario is selected may be made by weighting according to predetermined criteria. Also, analysis of the user response may be performed by analyzing the conversation sentence according to a rule (rule-based), based on statistics of a conversation sentence (statistical based), or by machine learning Based). Analysis of the user response may be performed by the user preference that is analyzed and mapped on the corresponding semantic word of the conversation sentence of the user in the ontology conversation network database. This has been described in detail with reference to FIG. 18 .
The dialogue output unit 207 outputs the dialogue sentence selected by the dialogue management unit to the user through voice or the like.
20 is a diagram showing an embodiment of the ontology talk network structure according to the present invention.
That is, referring to the embodiment of FIG. 20, the conversation of the ontology conversation network can be classified into a general conversation domain, a professional conversation domain, or another domain by setting another new domain as necessary. That is, a large number of nodes such as semantic words mapped in the ontology dialog network space may already be set in areas such as general conversation and professional conversation, respectively. That is, when a semantic vector for each sentence is obtained, the node indicated by each semantic vector according to the meaning of the sentence may be configured to point to an area of the classification determined on the ontology dialog network space. Of course, the same method can be used for detailed classification at the bottom of a general conversation or a professional conversation.
General conversations (daily conversations, common sense conversations, topic conversations, etc.) develop conversations based on conversation scenarios in a conversation network for various topics.
Specialized conversations (advance purchase, reservation, purchase, etc.) acquire specific information (Argument) according to a specific purpose of the strategy, and perform dialogue for the performance of the task. For example, in the case of KTX reservation, the user is asked about the information necessary for the reservation such as the reservation time, the number of the companion, the type of the seat, and the reservation is finished after the information is satisfied. Therefore, it is necessary to follow a dialogue procedure (Flow) for a specific purpose.
In this way, general conversation and professional conversation can be freely performed in the ontology conversation network.
21 to 24 show examples of conversations classified in the ontology dialog network as described above.
21 is a diagram illustrating an example of a general conversation classification structure in an ontology conversation network according to the present invention.
General conversations can include daily conversations, common conversations, topic conversations, emotional conversations, topic conversations, and conversations are possible by constantly moving conversation focus on various topics.
22 is a diagram illustrating an embodiment of a professional talk classification structure in an ontology talk network according to the present invention.
Professional conversation refers to conducting a specific purpose-based conversation, such as booking, reserving, or purchasing.
23 is a diagram illustrating an example of a conversation classification structure exchanged with an agent in a hospital call center during a professional conversation in an ontology conversation network according to the present invention.
24 is a view showing an embodiment of a classification structure in which a daily conversation, an emotional conversation, and a professional conversation are connected in an ontology conversation network according to the present invention.
Continuous conversation can be done while freely coming in everyday conversation, emotional conversation, and professional conversation.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. It will be understood that various modifications and changes may be made without departing from the scope of the appended claims.

Claims

A method for constructing a dialog scenario database for an interactive system applied to an ontology dialog network,
(a) extracting a sentence from a conversation-type voice file or a conversation-type article;
(b) classifying the sentence extracted in the step (a) as a question and an answer;
(c) selecting whether the question and answer sentences classified in step (b) are linked and connected to each other by emotion, intention, affirmation, and negation, and creating the selected question and answer sentence as a list-type scenario;
(d) converting a question and an answer sentence of a scenario, which is a list type generated in step (c), into a semantic vector and learning; And
(e) databaseing the scenario learned in the step (d) into a continuous scenario meaning vector
The method comprising the steps of:

The method according to claim 1,
After step (e), displaying the continuous semantic vector converted into the database on the ontology multidimensional space as one of a semantic word, a semantic node and a semantic cube
Further comprising the steps of: creating a dialog scenario database for a dialog system for a dialog system.

The method according to claim 1,
Wherein the conversation-type voice file in the step (a) is extracted after changing to text through speech recognition
The method comprising the steps of:

The method according to claim 1,
The extraction from the interactive-type article of step (a)
A specific word (word) is specified as a search term, and the data and the posts and comments retrieved by the specified search term are parsed and extracted
The method comprising the steps of:

The method according to claim 1,
The sentence extracted in the step (a) is automatically converted into a semantic vector
The method comprising the steps of:

A sentence extracting unit for extracting a sentence;
A sentence analyzing unit for classifying the sentence extracted by the sentence extracting unit into a question and an answer;
A dialog scenario generation unit for generating a scenario in which a sentence classified by the sentence analysis unit corresponds to each other and a dialog is connected;
A dialog scenario learning unit for learning a sentence of a dialogue scenario generated by the dialogue scenario generating unit;
A dialogue scenario database for storing the dialogue scenarios learned by the dialogue scenario learning unit as semantic vectors and storing them; And
An ontology relationship mapping unit for mapping a continuous semantic vector of the conversation scenario stored in the conversation scenario database on the ontology multidimensional space as one of a semantic word,
Wherein the dialog scenario database includes at least one of: