KR20200047272A

KR20200047272A - Indexing system and method using variational recurrent autoencoding

Info

Publication number: KR20200047272A
Application number: KR1020190032912A
Authority: KR
Inventors: 박지은; 이세리
Original assignee: 펄스나인 주식회사
Priority date: 2018-10-25
Filing date: 2019-03-22
Publication date: 2020-05-07
Also published as: KR102156249B1

Abstract

An objective of the present invention is to provide an automatic indexing system using variational recurrent autoencoding and a method thereof which can easily find an answer in a database. According to embodiments of the present invention, the indexing system using variational recurrent autoencoding comprises: a first variational recurrent autoencoder to learn potential variable values through a training process of encoding and decoding a preset prescribed keyword set and encode a keyword set corresponding to a character input or a voice of a user based on the learning to calculate potential variable values; a second variational recurrent autoencoder to learn potential variable values through a training process of encoding and decoding a preset prescribed image and encode an image input of a user based on the learning to calculate potential variable values; and an indexing unit to index an input from a user based on a result of comparing the potential variable values learned by the first variational recurrent autoencoder and the second variational recurrent autoencoder and the potential variable values calculated by the first variational recurrent autoencoder or the second variational recurrent autoencoder. The potential variable values learned by the first variational recurrent autoencoder and the second variational recurrent autoencoder are shared.

Description

INDEXING SYSTEM AND METHOD USING VARIATIONAL RECURRENT AUTOENCODING}

본 발명은 변분 순환 오토인코딩 방식을 이용한 자동 색인 시스템 및 방법에 관한 것이다.The present invention relates to an automatic indexing system and method using a variable circulation autoencoding method.

최근 다양한 서비스와 정보를 제공하는 지능형 정보 검색 시스템과 챗봇 서비스가 늘어나고 있다. 기존의 지능형 정보 검색 엔진은 질문에 대응하는 답변이 저장되어 있는 데이터베이스를 이용하며, 단어 빈도(Term Frequency)와 문서 역빈도(Inverse Document Frequency) 간 관계를 계산하여 입력된 질문과 가장 유사한 질문 및 이에 해당하는 답변을 찾아내는 방식을 이용한다. Recently, intelligent information search systems and chatbot services that provide various services and information are increasing. The existing intelligent information retrieval engine uses a database in which answers to questions are stored, and calculates the relationship between the term frequency and the inverse document frequency to find the most similar question to the entered question and Use the method to find the corresponding answer.

이와 관련하여, 공개특허공보 제-2018-0042763호는, 접속 채널을 통하여 입력된 완성 또는 미완성 문장의 형태인 사용자 질의를 수신하고 상기 사용자 질의에 대한 응답을 전송하는 통신 모듈; 상기 통신 모듈을 통하여 수신된 사용자 질의에 대한 구문 분석 및 형태소 분석을 수행하는 자연어 처리 엔진; 어학 사전 데이터베이스, 금융 지식 데이터베이스 및 금융 서비스 제공하기 위한 대화 시나리오 데이터베이스를 포함하는 데이터베이스; 상기 자연어 처리 엔진에 의해 분석된 구문 및 형태소에 근거하여 사용자 질의의 사전적 의미 및 문맥적 의미를 해석하고 상기 대화 시나리오 데이터베이스에 저장된 금융 서비스 제공 시나리오에 근거하여 상기 사용자 질의에 대한 응답을 생성하는 대화 처리 엔진을 포함하는 채팅형 금융 로봇을 제공한다.In this regard, Korean Patent Publication No. -2018-0042763 includes a communication module that receives a user query in the form of a complete or incomplete sentence input through an access channel and transmits a response to the user query; A natural language processing engine that performs syntax analysis and morpheme analysis on user queries received through the communication module; A database including a language dictionary database, a financial knowledge database, and a conversation scenario database for providing financial services; A dialogue that interprets the dictionary meaning and contextual meaning of the user query based on the syntax and morpheme analyzed by the natural language processing engine and generates a response to the user query based on the financial service provision scenario stored in the dialogue scenario database Provided is a chat-type financial robot including a processing engine.

그러나, 이는 답변(응답)을 직접적으로 찾아내는 것에 관한 것이며, 답변을 찾기 위해 질문을 색인하는 것에 관한 것은 아니다.However, this is about finding the answer (response) directly, not indexing the question to find the answer.

본 발명은 입력된 질문이 자연어 또는 이미지인 경우, 변분 순환 인코더의 잠재 변수의 값을 이용하여 색인(분류)함으로써, 데이터베이스에서 답변을 찾는 것이 용이하도록 하는 변분 순환 오토인코딩(Variational Recurrent Autoencoding) 방식의 자동 색인 시스템 및 방법을 제공하고자 한다.In the present invention, when the input question is a natural language or an image, a variable recurrent autoencoding method that facilitates finding an answer in a database by indexing (classifying) using the value of a latent variable of the variable recurrent encoder. It is intended to provide an automatic indexing system and method.

본 발명의 실시예에 따른 변분 순환 오토인코딩 방식의 색인 시스템은, 미리 설정된 소정의 키워드 셋을 인코딩 및 디코딩하는 트레이닝 과정을 통해 잠재 변수의 값을 학습하고, 이에 기초하여 사용자의 음성 또는 문자 입력에 해당하는 키워드 셋을 인코딩함으로써 잠재 변수의 값을 산출하는 제1 변분 순환 오토인코더(Variational Recurrent Autoencoder); 미리 설정된 소정의 이미지를 인코딩 및 디코딩하는 트레이닝 과정을 통해 잠재 변수의 값을 학습하고, 이에 기초하여 사용자의 이미지 입력을 인코딩함으로써 잠재 변수의 값을 산출하는 제2 변분 순환 오토인코더; 상기 제1 변분 순환 오토인코더 및 상기 제2 변분 순환 오토인코더에서 학습된 잠재 변수의 값과, 상기 제1 변분 순환 오토인코더 또는 상기 제2 변분 순환 오토인코더에서 산출된 잠재 변수의 값을 비교한 결과에 기초하여 사용자로부터의 입력을 색인(index)하는 색인부를 포함하며, 상기 제1 변분 순환 오토인코더와 상기 제2 변분 순환 오토인코더에서 학습되는 잠재 변수의 값은 공유된다.The index system of the variable circulating auto-encoding method according to an embodiment of the present invention learns the value of a potential variable through a training process of encoding and decoding a predetermined set of keywords, and based on this, inputs the user's voice or text. A first variable cyclic autoencoder that calculates a value of a latent variable by encoding a corresponding keyword set; A second variable cyclic autoencoder that learns the value of the latent variable through a training process of encoding and decoding a predetermined image, and calculates the latent variable value by encoding a user's image input based on the training process; The result of comparing the value of the potential variable learned by the first variable circulating autoencoder and the second variable circulating autoencoder and the value of the potential variable calculated by the first variable cyclic autoencoder or the second variable cyclic autoencoder And an index unit that indexes input from a user based on the values of the potential variables learned from the first variable cyclic autoencoder and the second variable cyclic autoencoder.

상기 변분 순환 오토인코딩 방식의 색인 시스템은, 상기 사용자의 텍스트 또는 음성 입력의 형태소를 분석함으로써 상기 사용자의 텍스트 또는 음성 입력에 해당하는 키워드 셋을 생성하는 키워드 생성부를 더 포함할 수 있다.The index system of the variable circulating auto-encoding method may further include a keyword generator configured to generate a keyword set corresponding to the user's text or voice input by analyzing a morpheme of the user's text or voice input.

상기 제1 변분 순환 오토인코더는, 상기 미리 설정된 소정의 키워드 셋 또는 상기 사용자의 음성 또는 문자 입력에 해당하는 키워드 셋을 인코딩하는 제1 인코딩부; 및 상기 제1 인코딩부가 상기 미리 설정된 소정의 키워드 셋을 인코딩함으로써 생성된 잠재 변수의 값을 디코딩하는 제1 디코딩부를 포함할 수 있다.The first incremental cyclic autoencoder may include: a first encoding unit encoding the preset predetermined keyword set or a keyword set corresponding to the user's voice or character input; And a first decoding unit decoding the potential variable generated by encoding the predetermined keyword set by the first encoding unit.

상기 제1 인코딩부는, 입력되는 키워드 셋을 구성하는 키워드의 개수와 동일한 개수의 LSTM(Long Short Term Memory)을 포함하고, 상기 키워드는 입력된 순서로 상기 LSTM에 각각 입력되고, 상기 LSTM은 체인 형식으로 연결될 수 있다.The first encoding unit includes the same number of Long Short Term Memory (LSTM) as the number of keywords constituting the input keyword set, and the keywords are respectively input to the LSTM in the input order, and the LSTM is a chain format. Can be connected to.

상기 제2 변분 순환 오토인코더는, 상기 미리 설정된 소정의 이미지 또는 상기 사용자의 이미지 입력을 인코딩하는 제2 인코딩부; 및 상기 제2 인코딩부가 상기 미리 설정된 소정의 이미지를 인코딩함으로써 생성된 잠재 변수의 값을 디코딩하는 제2 디코딩부를 포함하고, 상기 제2 인코딩부 및 상기 제2 디코딩부는 CNN(Convolutional Neural Network)을 포함할 수 있다.The second variable circulating autoencoder may include: a second encoding unit encoding the preset predetermined image or the user's image input; And a second decoding unit decoding a value of a latent variable generated by encoding the predetermined image by the second encoding unit, and the second encoding unit and the second decoding unit include a convolutional neural network (CNN). can do.

상기 제1 변분 순환 오토인코더 또는 상기 제2 변분 순환 오토인코더에서 산출된 잠재 변수의 값을 이용하여, 상기 색인부의 색인을 업데이트하는 업데이트부를 더 포함할 수 있다.The update unit may further include an update unit that updates the index of the index unit by using the value of the latent variable calculated by the first variable cyclic autoencoder or the second variable cyclic autoencoder.

상기 미리 설정된 소정의 키워드 셋은 세무에 관한 것이고, 상기 미리 설정된 소정의 이미지는 세무와 관련된 문서의 종류를 구분 가능한 이미지일 수 있다.The predetermined keyword set may be related to taxation, and the predetermined predetermined image may be an image capable of distinguishing a document type related to taxation.

본 발명의 실시예에 따른 변분 순환 오토인코딩 방식의 색인 방법은, 미리 설정된 소정의 키워드 셋을 제1 변분 순환 오토인코더를 이용하여 인코딩 및 디코딩하는 트레이닝 과정을 통해 잠재 변수의 값을 학습하고, 미리 설정된 소정의 이미지를 제2 변분 순환 오토인코더를 이용하여 인코딩 및 디코딩하는 트레이닝 과정을 통해 상기 제1 변분 순환 오토인코더에서 학습된 잠재 변수와 공유되는 잠재 변수의 값을 학습하는 단계; 상기 제1 변분 순환 오토인코더 또는 상기 제2 변분 순환 오토인코더를 이용하여 사용자의 입력에 해당하는 잠재 변수의 값을 산출하는 단계; 및 상기 학습된 잠재 변수의 값과 상기 산출된 잠재 변수의 값을 비교한 결과에 기초하여 사용자로부터의 입력을 색인하는 단계를 포함한다. In the indexing method of the variable cyclic autoencoding method according to an embodiment of the present invention, a potential variable is learned through a training process of encoding and decoding a predetermined set of keywords using a first variable cyclic autoencoder, and in advance Learning a value of a latent variable shared with a latent variable learned in the first variable cyclic autoencoder through a training process of encoding and decoding a predetermined image using a second variable cyclic autoencoder; Calculating a value of a potential variable corresponding to a user input using the first variable cyclic autoencoder or the second variable cyclic autoencoder; And indexing input from a user based on a result of comparing the value of the learned latent variable with the calculated latent variable.

상기 변분 순환 오토인코딩 방식의 색인 방법은, 사용자의 텍스트 또는 음성 입력의 형태소를 분석함으로써 상기 사용자의 텍스트 또는 음성 입력에 해당하는 키워드 셋을 생성하는 단계를 더 포함하고, 상기 산출하는 단계에서, 사용자의 입력이 텍스트 또는 음성일 때, 상기 제1 변분 순환 오토인코더를 이용하여 상기 생성된 키워드 셋에 해당하는 잠재 변수의 값을 산출할 수 있다.The index method of the cyclic cyclic auto-encoding method further includes generating a keyword set corresponding to the user's text or voice input by analyzing a morpheme of the user's text or voice input, and in the calculating step, the user When the input of is text or voice, the value of the potential variable corresponding to the generated keyword set may be calculated using the first variable cyclic autoencoder.

상기 변분 순환 오토인코딩 방식의 색인 방법은, 상기 제1 변분 순환 오토인코더 또는 상기 제2 변분 순환 오토인코더에서 산출된 잠재 변수의 값을 이용하여, 상기 색인하는 단계에서 사용되는 색인을 업데이트하는 단계를 더 포함할 수 있다.In the indexing method of the variable cyclic autoencoding method, updating the index used in the indexing step using the value of the latent variable calculated by the first variable cyclic autoencoder or the second variable cyclic autoencoder. It may further include.

사용자의 자연어 입력의 형태소를 분석함으로써 상기 사용자의 자연어 입력에 해당하는 키워드 셋을 생성하는 키워드 생성부;A keyword generator configured to generate a set of keywords corresponding to the user's natural language input by analyzing a morpheme of the user's natural language input;

본 발명의 실시예에 따른 변분 순환 오토인코딩 방식의 색인 시스템은, 미리 설정된 소정의 키워드 셋을 인코딩 및 디코딩하는 트레이닝 과정을 통해 잠재 변수의 값을 학습하고, 이에 기초하여 상기 키워드 생성부에서 생성된 키워드 셋을 인코딩함으로써 잠재 변수의 값을 산출하는 변분 순환 오토인코더; 상기 변분 순환 오토인코더에서 학습된 잠재 변수의 값과, 상기 변분 순환 오토인코더에서 산출된 잠재 변수의 값을 비교한 결과에 기초하여 사용자로부터의 입력을 색인하는 색인부; 및 상기 변분 순환 오토인코더에서 산출된 잠재 변수의 값을 이용하여, 상기 색인부의 색인을 업데이트하는 업데이트부를 포함한다.The index system of the variable cyclic autoencoding method according to an embodiment of the present invention learns the value of a potential variable through a training process of encoding and decoding a predetermined set of keywords, and based on this, generates the keyword generation unit A variable cyclic autoencoder that calculates the value of a latent variable by encoding a keyword set; An indexing unit that indexes input from a user based on a result of comparing a value of a potential variable learned by the variable cyclic autoencoder with a value of a potential variable calculated by the variable cyclic autoencoder; And an updating unit updating the index of the index unit by using the value of the potential variable calculated by the variable circulating autoencoder.

본 발명의 실시예에 의하면, 변분 순환 오토인코딩 방식을 이용하여 입력된 질문을 색인하기 때문에, 정확도가 높은 색인이 가능하다. According to the exemplary embodiment of the present invention, since the inputted question is indexed using the incremental cycle auto-encoding method, indexing with high accuracy is possible.

도 1은 본 발명의 실시예에 따른 변분 순환 오토인코딩 방식의 색인 시스템의 구성을 나타내는 도면이다.
도 2는 도 1의 형태소 분석기의 동작을 나타내는 도면이다.
도 3은 도 1의 변분 순환 오토인코더의 동작을 설명하기 위한 도면이다.
도 4는 도 3의 LSTM의 구조의 일 예를 나타내는 도면이다.
도 5는 본 발명의 다른 실시예에 따른 변분 순환 오토인코딩 방식의 색인 시스템의 구성을 나타내는 도면이다.
도 6은 도 5의 일부 구성의 동작을 설명하기 위한 도면이다.
도 7은 본 발명의 실시예에 따른 변분 순환 오토인코딩 방식의 전문분야 응답 서비스 시스템의 구성을 나타내는 도면이다.
도 8은 본 발명의 실시예에 따른 챗봇을 이용한 변분 순환 오토인코딩 방식의 전문분야 응답 서비스 방법을 나타내는 도면이다.
도 9는 본 발명의 실시예에 따른 변분 순환 오토인코딩 방식의 전문분야 응답 서비스 시스템 및 방법의 실험 결과를 나타내는 도면이다.1 is a view showing the configuration of the index system of the variable circulating auto-encoding method according to an embodiment of the present invention.
FIG. 2 is a view showing the operation of the morpheme analyzer of FIG. 1.
3 is a view for explaining the operation of the variable-circulation auto-encoder of FIG.
4 is a view showing an example of the structure of the LSTM of FIG. 3.
5 is a view showing the configuration of the index system of the variable circulating auto-encoding method according to another embodiment of the present invention.
FIG. 6 is a view for explaining the operation of some components of FIG. 5.
7 is a view showing the configuration of a specialized field response service system of a variable circulating auto-encoding method according to an embodiment of the present invention.
8 is a view showing a specialized field response service method using a variable rotation auto-encoding method using a chatbot according to an embodiment of the present invention.
9 is a diagram showing experimental results of a specialized field response service system and method of a variable circulating auto-encoding method according to an embodiment of the present invention.

발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여, 본 명세서 및 청구범위에 사용된 용어나 단어는 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 한다.Based on the principle that the inventor can appropriately define the concept of terms in order to explain his or her invention in the best way, terms or words used in the specification and claims conform to the technical spirit of the present invention. It should be interpreted as meaning and concept.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다. 또한, 한 구성요소가 다른 구성요소에 "연결", "전송", "송신", "수신" 또는 "전달"된다고 할 때, 이는 직접적으로 연결, 전송, 송신, 수신 또는 전달되는 경우뿐만 아니라 다른 구성요소를 개재하여 간접적으로 연결, 전송, 송신, 수신 또는 전달되는 경우도 포함한다. 또한, 명세서에 기재된 "…부", "…기", "모듈", "장치" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part “includes” a certain component, this means that other components may be further included rather than excluding other components unless specifically stated to the contrary. In addition, when one component is said to be "connected", "send", "send", "receive" or "forward" to another component, this is not only the case where it is directly connected, transmitted, sent, received or forwarded, It also includes the case of indirect connection, transmission, transmission, reception, or transmission via a component. In addition, terms such as “… unit”, “… group”, “module”, and “device” described in the specification mean a unit that processes at least one function or operation, which is hardware or software or a combination of hardware and software. Can be implemented as

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

먼저, 도 1~도 4를 참조하여 본 발명의 실시예 1에 대하여 설명한다.First, Example 1 of the present invention will be described with reference to FIGS. 1 to 4.

도 1은 본 발명의 실시예에 따른 변분 순환 오토인코딩 (Variational Recurrent Autoencoding) 방식의 색인 시스템(1)의 구성을 나타내는 도면이다.1 is a diagram showing the configuration of an indexing system 1 of a variable recurrent autoencoding method according to an embodiment of the present invention.

도 1을 참조하면, 색인 시스템(1)은 자연어(natural language)를 입력받아, 이에 해당하는 응답을 데이터베이스에서 검색하기 위한 색인값을 생성하는 시스템이다. 색인 시스템(1)은 형태소 분석기(100), 변분 순환 오토인코더(200), 색인부(300) 및 업데이트부(400)를 포함한다.Referring to FIG. 1, the index system 1 is a system that receives a natural language and generates an index value for searching a corresponding response in a database. The indexing system 1 includes a morpheme analyzer 100, a variance circulation autoencoder 200, an indexing unit 300, and an updating unit 400.

형태소 분석기(100)는 자연어의 형태소를 분석하여 자연어의 키워드 셋을 생성한다. 형태소 분석기(100)에 입력되는 자연어는 하나 이상의 단어로 이루어진 완성된 문장 또는 미완성된 문장일 수 있다. 자연어는 예를 들면 한국어 형태일 수 있다. 키워드 셋은 자연어 입력에서 특별한 의미를 갖지 않는 조사를 제외한 일반명사, 고유명사, 동사 등일 수 있다. 키워드 셋은 적어도 하나의 키워드를 포함한다.The morpheme analyzer 100 analyzes the morpheme of the natural language and generates a keyword set of the natural language. The natural language input to the morpheme analyzer 100 may be a completed sentence or an incomplete sentence composed of one or more words. The natural language may be, for example, Korean. The keyword set may be general nouns, proper nouns, verbs, etc., except for searches that do not have a special meaning in natural language input. The keyword set includes at least one keyword.

변분 순환 오토인코더(200)는 키워드 셋을 변분 순환 오토인코딩 방식으로 처리하여 잠재 변수를 생성한다. The variable circulating autoencoder 200 processes a set of keywords in a variable cyclic autoencoding method to generate latent variables.

색인부(300)는 변분 순환 오토인코더(200)에서 생성된 잠재 변수의 값에 따라 키워드 셋을 색인한다. The indexing unit 300 indexes the keyword set according to the value of the latent variable generated by the variable rotation auto-encoder 200.

업데이트부(400)는 변분 순환 오토인코더(200)에서 생성된 잠재 변수의 값을 이용하여 색인을 업데이트한다.The update unit 400 updates the index by using the value of the latent variable generated by the variable rotation auto-encoder 200.

도 2는 도 1의 형태소 분석기(100)의 동작을 나타내는 도면이다. 형태소 분석기(100)는 자연어를 토크나이징(Tokenizing)함으로써 키워드 셋을 생성한다.2 is a view showing the operation of the morpheme analyzer 100 of FIG. 1. The morpheme analyzer 100 generates a set of keywords by tokenizing natural language.

도 2를 참조하면, "올해는 법인세를"이라는 자연어가 입력되면, 키워드 셋으로서 {올해, 법인세}가 생성되는 것을 나타낸다. 본 실시예에서 키워드 셋 {올해, 법인세}는 "올해" 및 "법인세"라는 2개의 키워드로 구성된다. 예를 들어, 형태소 분석기(100)는 루씬(Lucene) 기반의 검색 엔진에서 사용되는 은전한닢이 이용될 수 있다.Referring to FIG. 2, when the natural language of “this year's corporate tax” is input, it indicates that {this year, corporate tax} is generated as a keyword set. In this embodiment, the keyword set {this year, corporate tax} is composed of two keywords: "this year" and "corporate tax". For example, the morpheme analyzer 100 may be used in the lucene (Lucene) based search engine used in search engines.

다음으로, 도 3을 참조하여 변분 순환 오토인코더(200)의 동작에 대해 설명한다.Next, the operation of the variable circulation autoencoder 200 will be described with reference to FIG. 3.

변분 순환 오토인코더는 비지도학습의 일종으로 차원 축소와 생성 모델에 많이 사용된다. 변분 순환 오토인코더의 핵심은 잠재 변수(Z)가 평균(μ)과 분산(σ)의 정규 분포(diagonal Gaussian)를 따르도록 학습하는 것이다. 후분포(Posterior) p(z|x)는 계산이 어렵기 때문에(Intractable), 변분 추론(Variational Inference)을 이용하여 q(z|x)와 p(z|x)를 근사시킨다. Kullback-Leibler Divergence를 이용하여 q(z|x)와 p(z|x)의 차이를 최소화하도록 다음과 같이 유도할 수 있다.The variable-circulation autoencoder is a kind of unsupervised learning, and is often used for dimensional reduction and generation models. The essence of the variable cyclic autoencoder is learning that the latent variable (Z) follows the normal distribution of the mean (μ) and variance (σ). Because the posterior p (z | x) is difficult to calculate (Intractable), q (z | x) and p (z | x) are approximated using variable inference. Using Kullback-Leibler Divergence, we can derive as follows to minimize the difference between q (z | x) and p (z | x).

위 식에서

는 항상 0보다 크거나 같다.

를 최대화하면

가 최대가 되도록 하는 하한(Evidence Lower Bound)이며 목적함수(Objective Function)가 된다.In the above equation

Is always greater than or equal to 0.

Maximizing

Evidence Lower Bound to be the maximum and it becomes the objective function.

도 3은 도 1의 변분 순환 오토인코더(200)의 동작을 설명하기 위한 도면이다.FIG. 3 is a view for explaining the operation of the variable circulating autoencoder 200 of FIG. 1.

도 3을 참조하면, 변분 순환 오토인코더(200)는 키워드 셋을 구성하는 하나 이상의 키워드가 각각 입력되는 인코딩 LSTM(Long Short Term Memory; 인코딩부; 210, 220)과, 인코딩 LSTM(210, 220)에서 생성된 잠재 변수(Z)가 디코딩 되는 디코딩 LSTM(디코딩부; 230, 240, 250)을 포함할 수 있다.Referring to FIG. 3, the variable cyclic autoencoder 200 includes encoding long short term memory (encoding units) 210 and 220 into which one or more keywords constituting a keyword set are input, and encoding LSTMs 210 and 220. It may include a decoding LSTM (decoding unit; 230, 240, 250) is the potential variable (Z) generated in the decoding.

먼저, 변분 순환 오토인코더(200)의 인코딩 LSTM(210, 220) 및 디코딩 LSTM(230, 240, 250)은 트레이닝 동작을 수행할 수 있다. 인코딩 LSTM(210, 220)은 미리 설정된 복수의 키워드 셋의 각각을 인코딩함으로써 잠재 변수를 산출하고, 산출된 잠재 변수에 대해 디코딩을 수행하는 방식으로 트레이닝을 수행한다. 이에 따라, 미리 설정된 복수의 키워드 셋에 대해 적절한 잠재 변수의 값이 산출될 수 있다. 도 3에서는 2개의 인코딩 LSTM(210, 220) 및 3개의 디코딩 LSTM(230, 240, 250)이 포함되는 것으로 도시하였지만, 인코딩 LSTM 및 디코딩 LSTM의 개수는 입력되는 키워드 셋의 키워드의 개수에 따라 달라질 수 있다.First, the encoding LSTMs 210 and 220 and the decoding LSTMs 230, 240 and 250 of the variable cyclic autoencoder 200 may perform training operations. The encoding LSTMs 210 and 220 calculate potential variables by encoding each of a plurality of preset keyword sets, and perform training in a manner of decoding the calculated potential variables. Accordingly, a value of a potential variable suitable for a plurality of preset keyword sets may be calculated. Although FIG. 3 shows that two encoding LSTMs 210 and 220 and three decoding LSTMs 230 and 240 and 250 are included, the number of encoding LSTM and decoding LSTM varies according to the number of keywords in the input keyword set You can.

도 3에 도시된 바와 같이, 앞 단의 LSTM(210)의 출력은 후단의 LSTM(220)의 입력이 되는 체인 구조일 수 있다. 본 실시예의 변분 순환 오토인코더(200)는 각 키워드가 순차적으로 LSTM에 입력되어 인코딩되기 때문에, 키워드간의 연계성이 고려될 수 있다. As illustrated in FIG. 3, the output of the LSTM 210 in the front stage may be a chain structure that is the input of the LSTM 220 in the rear stage. Since the variable circulating auto-encoder 200 of this embodiment sequentially inputs and encodes each keyword in LSTM, linkage between keywords may be considered.

다음으로, 사용자로부터 자연어가 입력되고, 형태소 분석부(100)에 의해 키워드 셋이 생성되면, 인코딩 LSTM(210, 220)은 자연어에 대응하는 키워드 셋에 대해 잠재 변수를 산출한다. 산출된 잠재 변수는 추후 색인의 기준이 될 수 있다.Next, when the natural language is input from the user and the keyword set is generated by the morpheme analysis unit 100, the encoding LSTMs 210 and 220 calculate the latent variable for the keyword set corresponding to the natural language. The calculated latent variables can serve as a basis for future indexing.

도 3에 된 바와 같이, {올해, 법인세}라는 키워드 셋이 입력되는 경우, LSTM(210)에 "올해"가 입력되고, LSTM(220)에 LSTM(210)의 출력과 "법인세"가 입력된다. 이에 따라, LSTM(220)에서 평균(μ)과 분산(σ)을 갖는 잠재변수(Z)가 생성된다. 도 3에서는 2개의 키워드에 대응하는 2개의 인코딩 LSTM(210, LSTM)이 포함되는 것을 예로 들었지만, 인코딩 LSTM의 개수는 키워드 셋을 구성하는 키워드의 개수에 따라 달라질 수 있다.As shown in FIG. 3, when the keyword set {this year, corporate tax} is input, “this year” is input to the LSTM 210 and the output of the LSTM 210 and the “corporate tax” are input to the LSTM 220. . Accordingly, a latent variable Z having an average (μ) and a variance (σ) is generated in the LSTM 220. Although FIG. 3 illustrates that two encoding LSTMs 210 and LSTMs corresponding to two keywords are included, the number of encoding LSTMs may vary depending on the number of keywords constituting the keyword set.

도 4는 도 3의 LSTM의 구조의 일 예를 나타내는 도면이다. 4 is a view showing an example of the structure of the LSTM of FIG. 3.

도 4를 참조하면, Xt-1 및 Xt는 형태소 분석기(100)에서 생성된 키워드 셋의 키워드들이 각각 순차적으로 입력된다. 도 4의 실시예에서 Xt-1에는 "올해"가 입력되고, Xt에는 "법인세"가 입력될 수 있다. ht-1, ht는 인코딩된 벡터값이며, 최종적으로 출력되는 ht는 도 3의 잠재변수 Z에 대응할 수 있다.Referring to FIG. 4, Xt-1 and Xt are sequentially input with keywords of the keyword set generated by the morpheme analyzer 100. In the embodiment of FIG. 4, "this year" may be input to Xt-1, and "corporate tax" may be input to Xt. ht-1 and ht are encoded vector values, and finally output ht may correspond to the latent variable Z in FIG. 3.

LSTM(210)과 LSTM(220)은 동일한 구조를 가질 수 있다. 따라서, 이하에서는 LSTM(220)을 중심으로 설명한다.The LSTM 210 and the LSTM 220 may have the same structure. Therefore, hereinafter, the LSTM 220 will be mainly described.

LSTM(220)은 전체적으로 보았을 때 셀 상태(cell state)를 Ct-1에서 Ct로 변화시킨다. 도 4에서 사각형 표시(41, 42, 43, 47)은 신경망층을 나타내고 동그라미 표시(44, 45, 46, 49)는 포인트 연산을 나타낸다. The LSTM 220 changes the cell state from Ct-1 to Ct when viewed as a whole. In Fig. 4, the square marks 41, 42, 43, and 47 indicate neural network layers, and the circle marks 44, 45, 46, and 49 represent point operations.

먼저 sigmoid layer(41)는 ht-1과 xt를 받아서 0과 1 사이의 값을 갖는 ft를 출력한다. 그리고 ft는 이전 셀 상태를 나타내는 Ct-1과 곱해져서(44를 참조) 이전 셀 상태를 얼마나 반영할지를 결정하게 된다. ft는 다음의 수식으로 표현될 수 있다.First, the sigmoid layer 41 receives ht-1 and xt and outputs ft having a value between 0 and 1. And ft is multiplied by Ct-1 representing the previous cell state (see 44) to determine how much to reflect the previous cell state. ft can be expressed by the following equation.

sigmoid layer(42) 및 tanh layer(43)는 ht-1과 xt를 입력받아서 현재 셀 상태(Ct)에 얼마나 반영할 지 결정한다(46을 참조). The sigmoid layer 42 and tanh layer 43 receive ht-1 and xt and determine how much to reflect in the current cell state (Ct) (see 46).

sigmoid layer(42)는 sigmoid layer(41)과 유사하게 ht-1과 xt를 받아서 0과 1 사이의 값을 갖는 it를 출력하며, it는 다음의 수식으로 표현될 수 있다.The sigmoid layer 42 receives ht-1 and xt similarly to the sigmoid layer 41 and outputs it having a value between 0 and 1, which can be expressed by the following equation.

tanh layer(43)는 ht-1과 xt를 입력받아서 새로운 후보값(벡터)인

를 생성한다. tanh layer(213)의 출력값

는 다음의 수식으로 표현될 수 있다.The tanh layer 43 receives ht-1 and xt, and is a new candidate value (vector).

Produces Output value of tanh layer (213)

Can be expressed by the following equation.

이에 따라, 현재 셀 상태 Ct는 다음의 수식에 의해 표현될 수 있다(44, 45 및 46을 참조).Accordingly, the current cell state Ct can be expressed by the following equation (see 44, 45 and 46).

다음으로, sigmoid layer(47)는 ht-1과 xt를 입력받아서 0과 1사이의 값을 출력하며, sigmoid layer(47)의 출력 ot는 다음의 식으로 표현될 수 있다.Next, the sigmoid layer 47 receives ht-1 and xt and outputs a value between 0 and 1, and the output ot of the sigmoid layer 47 can be expressed by the following equation.

ht는 포인트 연산(48, 49)을 거쳐 출력되며, ht는 다음과 같이 표현될 수 있다.ht is output through point operations 48 and 49, and ht can be expressed as follows.

이상, 도 4의 구조를 갖는 LSTM을 예로 들어 설명하였지만, 본 발명의 범위는 이에 한하지 않는다. 예를 들어, 신경망층(41, 42, 43, 47)의 일부가 생략될 수도 있고, 신경망층(41, 42, 43, 47) 및 포인트 연산(44, 45, 46, 48 및 49)의 연결관계가 상이할 수도 있다.The LSTM having the structure of FIG. 4 has been described as an example, but the scope of the present invention is not limited thereto. For example, a portion of the neural network layers 41, 42, 43, 47 may be omitted, and the connection of the neural network layers 41, 42, 43, 47 and point operations 44, 45, 46, 48, and 49 Relationships may be different.

다음으로, 도 5 및 도 6을 참조하여 본 발명의 실시예 2에 대하여 설명한다.Next, Embodiment 2 of the present invention will be described with reference to FIGS. 5 and 6.

도 5는 본 발명의 실시예에 따른 변분 순환 오토인코딩 방식의 색인 시스템(1000)의 구성을 나타내는 도면이다.5 is a view showing the configuration of the index system 1000 of the variable circulating auto-encoding method according to an embodiment of the present invention.

도 5를 참조하면, 색인 시스템(1000)은 텍스트 또는 음성 형태의 자연어를 입력받거나 이미지를 입력받아, 이에 해당하는 응답을 검색하기 위해 색인값을 출력하는 시스템이다. 색인 시스템(2)은 형태소 분석부(1100), 변분 순환 오토인코더(1200), 색인부(1300) 및 업데이트부(1400)를 포함한다.Referring to FIG. 5, the indexing system 1000 is a system that receives a natural language in the form of text or voice or receives an image, and outputs an index value to search for a response corresponding thereto. The indexing system 2 includes a morphological analysis unit 1100, a variable circulating autoencoder 1200, an indexing unit 1300, and an updating unit 1400.

도 5를 참조하면, 형태소 분석부(1100)는 사용자의 텍스트 또는 음성 형태의 자연어를 입력받아 키워드 셋을 생성한다. 형태소 분석부(1100)의 기능은 도 1 및 도 2를 참조하여 설명한 형태소 분석부(100)와 동일하므로 구체적인 설명을 생략한다.Referring to FIG. 5, the morpheme analysis unit 1100 receives a user's text or voice type natural language and generates keyword sets. Since the function of the morpheme analysis unit 1100 is the same as the morpheme analysis unit 100 described with reference to FIGS. 1 and 2, a detailed description thereof will be omitted.

변분 순환 오토인코더(1200)는 제1 변분 순환 오토인코더(VAE1)와 제2 변분 순환 오토인코더(VAE2)를 포함한다. The variable circulating autoencoder 1200 includes a first variable circulating autoencoder VAE1 and a second variable circulating autoencoder VAE2.

제1 변분 순환 오토인코더(VAE1)는 미리 설정된 소정의 키워드 셋을 인코딩 및 디코딩하는 트레이닝 과정을 통해 잠재 변수(Z)의 값을 학습하고, 이에 기초하여 사용자의 음성 또는 문자 입력에 해당하는 키워드 셋을 인코딩함으로써 잠재 변수(Z)의 값을 산출한다. The first variable cyclic autoencoder (VAE1) learns the value of the latent variable (Z) through a training process of encoding and decoding a predetermined predetermined keyword set, and based on this, sets the keyword corresponding to the user's voice or character input The value of the latent variable Z is calculated by encoding.

제2 변분 순환 오토인코더(VAE2)는 미리 설정된 소정의 이미지를 인코딩 및 디코딩하는 트레이닝 과정을 통해 잠재 변수(Z)의 값을 학습하고, 이에 기초하여 사용자의 이미지 입력을 인코딩함으로써 잠재 변수(Z)의 값을 산출한다.The second variable cyclic autoencoder (VAE2) learns the value of the latent variable (Z) through a training process of encoding and decoding a predetermined predetermined image, and based on this, encodes the user's image input to determine the latent variable (Z) Calculate the value of

색인부(1300)는 제1 변분 순환 오토인코더(VAE1) 또는 제2 변분 순환 오토인코더(VAE2)에서 학습된 잠재 변수(Z)의 값과, 상기 제1 변분 순환 오토인코더(VAE1) 또는 상기 제2 변분 순환 오토인코더(VAE2)에서 산출된 잠재 변수(Z)의 값을 비교한 결과에 기초하여 사용자로부터의 입력을 색인한다.The indexing unit 1300 includes the value of the latent variable Z learned from the first variable circulating autoencoder VAE1 or the second variable cyclic autoencoder VAE2, and the first variable cyclic autoencoder VAE1 or the first The input from the user is indexed based on the result of comparing the value of the latent variable Z calculated by the two-variable cyclic autoencoder (VAE2).

업데이트부(1400)는 제1 변분 순환 오토인코더(VAE1) 또는 상기 제2 변분 순환 오토인코더(VAE2)에서 산출된 잠재 변수(Z)의 값을 이용하여, 색인을 업데이트한다.The updater 1400 updates the index using the value of the latent variable Z calculated by the first variable cyclic autoencoder VAE1 or the second variable cyclic autoencoder VAE2.

도 6은 도 5의 제1 변분 순환 오토인코더(VAE1), 제2 변분 순환 오토인코더(VAE2) 및 업데이트부(1400)의 동작을 설명하기 위한 도면이다.FIG. 6 is a view for explaining the operation of the first variable circulating autoencoder (VAE1), the second variable circulating autoencoder (VAE2), and the updating unit 1400 of FIG. 5.

제1 변분 순환 오토인코더(VAE1)는 잠재변수(Z)가 제2 변분 순환 오토인코더(VAE2)와 공유된다는 점을 제외하고는 변분 순환 오토인코더(200)와 동일하다. 구체적으로, 제1 변분 순환 오토인코더(VAE1)는, 미리 설정된 소정의 키워드 셋 또는 상기 사용자의 음성 또는 문자 입력에 해당하는 키워드 셋을 인코딩하는 제1 인코딩부(1210, 1220); 및 제1 인코딩부(1210, 1220)가 미리 설정된 소정의 키워드 셋을 인코딩함으로써 생성된 잠재 변수(Z)의 값을 디코딩하는 제1 디코딩부(1230, 1240, 1250)를 포함할 수 있다.The first variable circulating autoencoder VAE1 is the same as the variable circulating autoencoder 200 except that the latent variable Z is shared with the second variable circulating autoencoder VAE2. Specifically, the first incremental cyclic autoencoder (VAE1) includes: a first encoding unit (1210, 1220) for encoding a predetermined set of keywords or a keyword set corresponding to the user's voice or character input; And first decoding units 1230, 1240, and 1250 that decode the value of the latent variable Z generated by the first encoding units 1210 and 1220 encoding predetermined predetermined keyword sets.

제1 인코딩부(1210, 1220)는, 입력되는 키워드 셋을 구성하는 키워드의 개수와 동일한 개수의 LSTM(1210, 1220)을 포함하고, 키워드는 입력된 순서로 LSTM(1210, 1220)에 각각 입력되고, LSTM(1210, 1220)은 체인 형식으로 연결될 수 있다.The first encoding units 1210 and 1220 include the same number of LSTMs 1210 and 1220 as the number of keywords constituting the input keyword set, and the keywords are input to the LSTMs 1210 and 1220, respectively, in the input order. The LSTMs 1210 and 1220 may be connected in a chain form.

제2 변분 순환 오토인코더(VAE2)는, 미리 설정된 소정의 이미지 또는 상기 사용자의 이미지 입력을 인코딩하는 제2 인코딩부(1211, 1221); 및 제2 인코딩부(1211, 1221)가 미리 설정된 소정의 이미지를 인코딩함으로써 생성된 잠재 변수(Z)의 값을 디코딩하는 제2 디코딩부(1231, 1241)를 포함하고, 제2 인코딩부(1211, 1221) 및 상기 제2 디코딩부(1231, 1241)는 CNN(Convolutional Neural Network)을 포함할 수 있다.The second incremental cyclic autoencoder (VAE2) includes: a second encoding unit 1211, 1221 for encoding a predetermined predetermined image or the image input of the user; And a second decoding unit 1231, 1241 for decoding the value of the latent variable Z generated by the second encoding units 1211, 1221 encoding a predetermined predetermined image, and the second encoding unit 1211 , 1221) and the second decoding units 1231 and 1241 may include a convolutional neural network (CNN).

제2 변분 순환 오토인코더(VAE2)는 사용자로부터의 이미지가 입력된다. 예를 들어, 세무 관련 문서는 종류에 따라 상이한 이미지를 포함할 수 있다. 법인세 지연, 소득세 환급, 부가세 등 종류에 따라 문서에 포함되는 서식 등의 이미지나 색상이 상이할 수 있다. 제2 변분 순환 오토인코더(VAE2)는 이러한 이미지에 따라 잠재 변수(Z)의 값을 산출한다. An image from a user is input to the second incremental circulation autoencoder (VAE2). For example, tax-related documents may include different images depending on the type. Depending on the type of corporate tax delay, income tax refund, VAT, etc., the image or color of the form included in the document may be different. The second variable circulation autoencoder (VAE2) calculates the value of the latent variable (Z) according to this image.

도 6에 도시된 바와 같이, 제2 변분 순환 오토인코더(VAE2)는 CNN(Convolutional Neural Network)을 포함한다. CNN은 기존의 DNN(Deep Neural etwork)의 FC(Fully-Connected Layer)를 Convolution Layer로 대처한 구조를 말한다. FC는 1차원 데이터만 받을 수 있기 때문에 3차원인 이미지 데이터(Width, Height, Channel(RGB의 경우 3 Channel, 흑백의 경우 1 Channel))를 처리할 때 공간 정보를 유실하게(부적합) 된다. CNN은 일반적으로 Convolutional Layer - Pooling - Convolutional Layer - Pooling - Fully Connected Layer 으로 구성된 것을 의미하며, 이미지 처리에 적합하다. Convolutional Layer는 필터(커널) 입력 이미지와 element-wise multiplication의 합을 구하여 Feature Map을 만든다. As shown in FIG. 6, the second incremental circulation autoencoder (VAE2) includes a convolutional neural network (CNN). CNN refers to the structure of coping with the existing deep neural etwork (DNN) full-connected layer (FC) as a convolution layer. Since FC can receive only one-dimensional data, spatial information is lost (inappropriate) when processing three-dimensional image data (Width, Height, Channel (3 Channel for RGB, 1 Channel for Black and White)). CNN generally consists of Convolutional Layer-Pooling-Convolutional Layer-Pooling-Fully Connected Layer, and is suitable for image processing. The Convolutional Layer creates a Feature Map by summing the filter (kernel) input image and element-wise multiplication.

이때, 제1 변분 순환 오토인코더(VAE1)와 제2 변분 순환 오토인코더(VAE2)에서 학습되는 잠재 변수(Z)의 값은 공유된다. 이에 따라, 제1 변분 순환 오토인코더(VAE1) 및 제2 변분 순환 오토인코더(VAE2)는 서로의 영향하에서 잠재 변수(Z)의 값을 학습 및 산출하게 된다. 다시 말해, 제1 변분 순환 오토인코더(VAE1)는 제1 변분 순환 오토인코더(VAE1)뿐만 아니라 제2 변분 순환 오토인코더(VAE2)에서 학습 및 산출된 잠재 변수(Z)의 값에 기초하여 사용자의 입력에 해당하는 잠재 변수(Z)의 값을 산출하고, 제2 변분 순환 오토인코더(VAE2)는 제2 변분 순환 오토인코더(VAE2)뿐만 아니라 제1 변분 순환 오토인코더(VAE1)에서 학습 및 산출된 잠재 변수(Z)의 값에 기초하여 사용자의 입력에 해당하는 잠재 변수(Z)의 값을 산출한다. At this time, the value of the potential variable Z learned from the first variable circulating autoencoder VAE1 and the second variable circulating autoencoder VAE2 is shared. Accordingly, the first variable circulating autoencoder VAE1 and the second variable circulating autoencoder VAE2 learn and calculate the value of the potential variable Z under the influence of each other. In other words, the first variable cyclic autoencoder (VAE1) is based on the value of the potential variable (Z) learned and calculated by the second variable cyclic autoencoder (VAE2) as well as the first variable cyclic autoencoder (VAE1). The value of the latent variable Z corresponding to the input is calculated, and the second variable circulating autoencoder VAE2 is learned and calculated by the first variable cyclic autoencoder VAE2 as well as the second variable cyclic autoencoder VAE2. Based on the value of the latent variable Z, a value of the latent variable Z corresponding to a user's input is calculated.

업데이트부(1400)는 제1 변분 순환 오토인코더(VAE1) 또는 제2 변분 순환 오토인코더(VAE2)에서 산출된 잠재 변수(Z)의 값을 이용하여, 색인부(1300)의 색인을 업데이트한다.The updating unit 1400 updates the index of the indexing unit 1300 by using the value of the latent variable Z calculated by the first variable circulating autoencoder VAE1 or the second variable circulating autoencoder VAE2.

예를 들면, 도 6에 도시된 바와 같이, 사용자가 "올해는 법인세를"이라는 텍스트를 입력하여, 제1 변분 순환 오토인코더(VAE1)로부터 잠재 변수(Z)의 값이 [0.5, 2.0, 0.4]가 출력되었다면, 업데이트부(1400)는 [0.5, 2.0, 0.4]의 잠재 변수의 값과 법인세 항목의 색인값이 대응하도록 색인부를 업데이트 한다.For example, as illustrated in FIG. 6, the user inputs the text “This year's corporate tax”, and the value of the potential variable Z from the first variable circulation autoencoder (VAE1) is [0.5, 2.0, 0.4. ] Is output, the update unit 1400 updates the index unit so that the value of the potential variable of [0.5, 2.0, 0.4] and the index value of the corporate tax item correspond.

또는, 사용자가 도 6에 도시된 이미지를 입력하여, 제2 변분 순환 오토인코더(VAE2)로부터 잠재 변수(Z)의 값이 [1.5, 0.7, 0.9]가 출력되었다면, 업데이트부(1400)는 [1.5, 0.7, 0.9]의 잠재 변수의 값과 연말정산 항목의 색인값이 대응하도록 색인부를 업데이트 한다.Alternatively, if the user inputs the image shown in FIG. 6 and the value of the potential variable Z is output from the second variable circulating autoencoder (VAE2) [1.5, 0.7, 0.9], the updater 1400 displays [ 1.5, 0.7, 0.9] The index is updated so that the value of the latent variable corresponds to the index value of the year-end settlement item.

다음으로, 도 7을 참조하여 본 발명의 실시예에 따른 변분 순환 오토인코딩 방식의 전문분야 응답 서비스 시스템(2)에 대하여 설명한다.Next, with reference to FIG. 7, a specialized field response service system 2 of a variable circulating auto-encoding method according to an embodiment of the present invention will be described.

도 7을 참조하면, 전문분야 응답 서비스 시스템(2)은 입출력부(2100), 색인 시스템(2200), 검색부(2300) 및 데이터베이스(2400)를 포함한다. Referring to FIG. 7, the specialized field response service system 2 includes an input / output unit 2100, an indexing system 2200, a search unit 2300, and a database 2400.

입출력부(2100)는 사용자가 휴대폰, 컴퓨터 등의 입력 단말의 화면에 표시된 채팅창에 질문을 입력하면, 입력된 질문을 색인 시스템(2200)으로 전송한다. 또한, 입출력부(2100)는 입력된 질문에 대응하는 답변이 데이터베이스(2400)에서 검색되면, 검색된 답변을 채팅창에 출력한다. 입출력부(2100)는 사용자가 음성으로 입력하는 경우 이를 텍스트로 변환하는 기능을 수행할 수 있다.When the user inputs a question into a chat window displayed on a screen of an input terminal such as a mobile phone or a computer, the input / output unit 2100 transmits the input question to the indexing system 2200. In addition, when the answer corresponding to the input question is searched in the database 2400, the input / output unit 2100 outputs the searched answer in the chat window. The input / output unit 2100 may perform a function of converting text into text when a user inputs it by voice.

추가적으로, 입출력부(2100)는 사용자별로 인증키를 발급함으로써 무분별한 데이터의 접근을 차단할 수도 있다. 구체적으로, 입출력부(210)는 사용자별로 인증키를 생성하여 사용자 데이터베이스(미도시)에 저장하고, 생성된 인증키를 사용자에게 제공할 수 있다.Additionally, the input / output unit 2100 may block access to indiscriminate data by issuing an authentication key for each user. Specifically, the input / output unit 210 may generate an authentication key for each user, store it in a user database (not shown), and provide the generated authentication key to the user.

색인 시스템(2200)은 본 발명의 실시예 1에 따른 색인 시스템(1) 또는 본 발명의 실시예 2에 따른 색인 시스템(1000)일 수 있다. 색인 시스템(2200)은 사용자의 입력에 따라 잠재 변수의 값을 산출하고, 산출된 잠재 변수의 값에 해당하는 색인값을 출력한다.The indexing system 2200 may be the indexing system 1 according to Embodiment 1 of the present invention or the indexing system 1000 according to Embodiment 2 of the present invention. The indexing system 2200 calculates the value of the latent variable according to a user's input, and outputs an index value corresponding to the calculated value of the latent variable.

검색부(2300)는 색인시스템(2200)로부터 수신한 색인값에 따라 데이터베이스(2400)에서 답변을 검색한다. 그리고, 검색된 답변을 입출력부(2100)에 제공함으로써 사용자에게 출력되도록 한다.The search unit 2300 searches for an answer in the database 2400 according to the index value received from the index system 2200. Then, the searched answer is provided to the input / output unit 2100 to be output to the user.

데이터베이스(2400)에는 복수의 질문과 그에 따른 답변이 저장되어 있으며, 복수의 질문은 색인값에 따라 분류되어 있다. 데이터베이스(2400)에 저장된 질문과 그에 대응하는 답변은 특정 전문분야, 예를 들어 세무 분야에 관한 것일 수 있다. 그러나, 본 발명의 범위는 이에 한하지 않으며, 법률, 금융 등 다른 전문분야에 관한 것일 수도 있다.A plurality of questions and corresponding answers are stored in the database 2400, and the plurality of questions are classified according to index values. The questions stored in the database 2400 and answers corresponding thereto may be related to a specific specialized field, for example, a tax field. However, the scope of the present invention is not limited to this, and may be related to other specialized fields such as law and finance.

도 8은 본 발명의 실시예에 따른 챗봇을 이용한 변분 순환 오토인코딩 방식의 전문분야 응답 서비스 방법을 나타내는 도면이다.8 is a view showing a specialized field response service method using a variable rotation auto-encoding method using a chatbot according to an embodiment of the present invention.

도 8을 참조하면, 전문분야 응답 서비스 방법은, 먼저, 전문분야의 데이터베이스(2400)를 생성한다(S100). 전술한 바와 같이, 데이터베이스(2400)에는 특정 전문분야의 질문에 따른 답변이 저장되어 있고, 질문은 색인(분류)되어 있을 수 있다. 또한, 데이터베이스(2400)의 생성과 함께, 색인 시스템(2200)변분 순환 오토인코더를 이용하여 잠재 변수의 값을 학습한다. 예를 들어, 실시예 1의 경우에는 미리 설정된 소정의 키워드 셋을 LSTM을 포함하는 변분 순환 오토인코더를 이용하여 인코딩 및 디코딩하는 트레이닝 과정을 통해 잠재 변수의 값을 학습할 수 있다. 실시예 2의 경우에는, 미리 설정된 소정의 키워드 셋을 LSTM을 포함하는 제1 변분 순환 오토인코더를 이용하여 인코딩 및 디코딩하는 트레이닝 과정을 통해 잠재 변수의 값을 학습하고, 미리 설정된 소정의 이미지를 CNN을 포함하는 제2 변분 순환 오토인코더를 이용하여 인코딩 및 디코딩하는 트레이닝 과정을 통해 제1 변분 순환 오토인코더에서 학습된 잠재 변수와 공유되는 잠재 변수의 값을 학습할 수 있다.Referring to FIG. 8, in a method for responding to a specialized field, first, a database 2400 for a specialized field is generated (S100). As described above, in the database 2400, answers according to questions in a specific field of expertise are stored, and the questions may be indexed (classified). In addition, with the creation of the database 2400, the index system 2200 learns the value of the latent variable using the variable cyclic autoencoder. For example, in the case of the first embodiment, the value of the latent variable may be learned through a training process of encoding and decoding a predetermined set of predetermined keywords using a variable cyclic autoencoder including LSTM. In the case of the second embodiment, a potential variable is learned through a training process of encoding and decoding a predetermined set of keywords using a first variable cyclic autoencoder including LSTM, and a predetermined image is CNN. Through the training process of encoding and decoding using the second variable cyclic autoencoder including, the value of the latent variable shared with the latent variable learned in the first variable cyclic autoencoder may be learned.

다음으로, 사용자로부터 입력을 수신한다(S110). 사용자는 사용자 단말을 통해 전문분야에 대한 질문을 텍스트나 음성 형태로 입력하거나 이미지의 형태로 입력할 수 있다. 도 7의 입출력부(2100)는 입력된 질문을 색인 시스템(2200)에 전달한다.Next, the input is received from the user (S110). The user may input a question about a specialized field in the form of text or voice through the user terminal or in the form of an image. The input / output unit 2100 of FIG. 7 transmits the input question to the indexing system 2200.

다음으로, 색인 시스템(2200)의 형태소 분석부는 수신된 자연어의 형태소를 분석하여 자연어에 대응하는 키워드 셋을 생성한다(S120). 형태소 분석부는 특별한 의미를 갖지 않는 조사를 제외한 하나 이상의 키워드들로 구성된 키워드 셋을 생성할 수 있다. 본 단계는 사용자의 입력이 텍스트나 음성 형태인 경우에만 수행되며, 이미지 형태인 경우에는 생략된다. Next, the morpheme analysis unit of the indexing system 2200 analyzes the received morpheme of the natural language to generate a set of keywords corresponding to the natural language (S120). The morpheme analysis unit may generate a keyword set composed of one or more keywords except for a survey having no special meaning. This step is performed only when the user's input is in the form of text or voice, and is omitted in the form of an image.

다음으로, 색인 시스템(2200)의 변분 순환 오토인코더는 형태소 분석부에서 생성된 키워드 셋 또는 사용자로부터 입력된 이미지를 인코딩함으로써 잠재 변수를 산출한다(S130).Next, the variable cycle autoencoder of the indexing system 2200 calculates a potential variable by encoding a keyword set generated by the morpheme analysis unit or an image input from a user (S130).

다음으로, 색인 시스템(2200)의 업데이트부는 변분 순환 오토인코더에서 생성된 잠재 변수의 값에 따라 색인값을 업데이트 한다(S140).Next, the updating unit of the indexing system 2200 updates the index value according to the value of the potential variable generated by the variable rotation auto-encoder (S140).

다음으로, 검색부(2300)는 색인값에 따라 데이터베이스(2400)에서 답변을 검색한다(S150).Next, the search unit 2300 searches for an answer in the database 2400 according to the index value (S150).

실시예에 따라 색인 업데이트 단계(S140)와 검색 단계(S150)는 그 순서가 바뀔 수도 있다.Depending on the embodiment, the order of the index updating step S140 and the searching step S150 may be changed.

도 9는 본 발명의 실시예 1에 따른 변분 순환 오토인코딩 방식의 전문분야 응답 서비스 시스템 및 방법의 실험 결과를 나타내는 도면이다.9 is a diagram showing experimental results of a specialized field response service system and method of a variable circulating auto-encoding method according to Embodiment 1 of the present invention.

실험에 사용한 데이터는 지식IN 네이버 세무 분야에서 법인세, 부가가치세, 연말정산 등 질문 및 전문가답변을 각각 1000 세트 이상 사용한 것이다. learning rate는 0.001, batch size는 32로 셋팅하고, latent space는 2차원 및 3차원을 각각 학습하였다.The data used in the experiments used more than 1000 sets of questions and expert answers, such as corporate tax, value-added tax, and year-end settlement in the Knowledge IN Naver tax field. The learning rate was set to 0.001 and the batch size to 32, and the latent space was studied in 2D and 3D, respectively.

도 9의 (a) 및 (b)에서 파란색은 법인세, 빨간색은 부가가치세, 녹색은 연말정산을 나타낸다. 도 9에 도시된 바와 같이, 잠재 변수의 값에 따라 법인세, 부가가치세 및 연말정산은 확연히 구분됨을 알 수 있다.In FIGS. 9A and 9B, blue represents corporate tax, red represents VAT, and green represents year-end settlement. As shown in FIG. 9, it can be seen that corporate tax, VAT, and year-end settlement are clearly classified according to the value of the potential variable.

이상, 바람직한 실시예를 통하여 본 발명에 관하여 상세히 설명하였으나, 본 발명은 이에 한정되는 것은 아니며, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 다양하게 변경, 응용될 수 있음은 당해 기술분야의 통상의 기술자에게 자명하다. 따라서, 본 발명의 진정한 보호 범위는 다음의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술적 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다. As described above, the present invention has been described in detail through preferred embodiments, but the present invention is not limited thereto, and various modifications and applications can be made without departing from the spirit of the present invention. It is obvious to the technician. Therefore, the true scope of protection of the present invention should be interpreted by the following claims, and all technical spirits within the equivalent scope should be interpreted as being included in the scope of the present invention.

Claims

The first variable for calculating the value of the potential variable by learning the value of the potential variable through the training process of encoding and decoding the predetermined keyword set, and encoding the keyword set corresponding to the user's voice or character input based on this training process Cyclic autoencoder (Variational Recurrent Autoencoder);
A second variable circulating autoencoder that learns the value of the latent variable through a training process of encoding and decoding a predetermined image and calculates the latent variable value by encoding a user's image input based on the training process;
The result of comparing the value of the potential variable learned by the first variable circulating autoencoder and the second variable circulating autoencoder and the value of the potential variable calculated by the first variable cyclic autoencoder or the second variable cyclic autoencoder An index unit that indexes input from a user based on the;
It includes,
The index system of the variable circulating autoencoding method, characterized in that the values of the potential variables learned from the first variable cyclic autoencoder and the second variable cyclic autoencoder are shared.

According to claim 1,
A keyword generator that generates a set of keywords corresponding to the user's text or voice input by analyzing the morphemes of the user's text or voice input
Index system of the variable circulating auto-encoding method further comprising a.

According to claim 1,
The first variable circulation autoencoder,
A first encoding unit encoding the predetermined keyword set or a keyword set corresponding to the user's voice or character input; And
A first decoding unit for decoding the value of the latent variable generated by the first encoding unit encoding the predetermined keyword set
Index system of the variable circulating auto-encoding method comprising a.

According to claim 3,
The first encoding unit,
Includes the same number of long term term memory (LSTM) as the number of keywords constituting the input keyword set,
The keywords are respectively input to the LSTM in the input order,
The LSTM is an index system of a variable circulating autoencoding method, characterized in that it is connected in a chain form.

According to claim 1,
The second variable circulation autoencoder,
A second encoding unit encoding the preset predetermined image or the user's image input; And
A second decoding unit decoding the potential variable generated by encoding the predetermined image by the second encoding unit
Including,
The second encoding unit and the second decoding unit comprises a convolutional neural network (CNN) index system of the variable cyclic auto-encoding method.

According to claim 1,
An update unit that updates the index of the index unit by using the value of the latent variable calculated by the first variable cyclic autoencoder or the second variable cyclic autoencoder.
Index system of the variable cyclic auto-encoding method further comprising a.

According to claim 1,
The predetermined keyword set is related to taxation,
The preset predetermined image is an index system of the variable circulating auto-encoding method, characterized in that it is an image capable of distinguishing a type of a document related to taxation.

A potential variable is learned through a training process of encoding and decoding a predetermined set of predetermined keywords using a first variable cyclic autoencoder, and encoding and decoding a predetermined image using a second variable cyclic autoencoder. Learning a value of a latent variable shared with a latent variable learned in the first variable circulation autoencoder through a training process;
Calculating a value of a potential variable corresponding to a user input using the first variable cyclic autoencoder or the second variable cyclic autoencoder; And
Indexing input from a user based on a result of comparing the value of the learned latent variable with the calculated latent variable value;
Index method of the cyclic cyclic autoencoding method comprising a.

The method of claim 8,
Generating a keyword set corresponding to the user's text or voice input by analyzing a morpheme of the user's text or voice input
Further comprising,
In the calculating step,
When the user input is text or voice, the index system of the variable circulating autoencoding method, characterized in that the potential variable corresponding to the generated keyword set is calculated using the first variable cyclic autoencoder.

The method of claim 8,
Updating the index used in the indexing step by using the value of the latent variable calculated by the first variable cyclic autoencoder or the second variable cyclic autoencoder.
Indexing method of the circulating variable auto-encoding method further comprising a.

The method of claim 8,
The predetermined keyword set is related to taxation,
The preset predetermined image is an index method of a variable circulating auto-encoding method, characterized in that it is an image capable of distinguishing a type of a document related to taxation.

A keyword generator configured to generate a set of keywords corresponding to the user's natural language input by analyzing a morpheme of the user's natural language input;
A variable cyclic autoencoder that learns the value of a potential variable through a training process of encoding and decoding a predetermined keyword set and calculates the value of the potential variable by encoding the keyword set generated by the keyword generator based on this;
An index unit that indexes input from a user based on a result of comparing a value of a potential variable learned by the variable cyclic autoencoder with a value of a potential variable calculated by the variable cyclic autoencoder; And
An update unit that updates the index of the index unit by using the value of a latent variable calculated by the variable cyclic autoencoder.
Index system of the variable circulating auto-encoding method comprising a.