KR20200057611A

KR20200057611A - a communication typed question and answer system with data supplying in statistic database

Info

Publication number: KR20200057611A
Application number: KR1020190120434A
Authority: KR
Inventors: 김영호; 홍은영
Original assignee: 지의소프트 주식회사
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2020-05-26
Also published as: KR102220894B1

Abstract

The present invention relates to a question-answer system capable of quickly transferring statistical information. The question-answer system comprises a terminal, a transceiving unit, an analysis unit, and a database. The terminal includes an input unit allowing a user to input question content and an output unit allowing the user to check an answer to the question content. The transceiving unit receives the question content from the terminal and transmits the answer to the received question content to the terminal. The database, in which question form data to ask the question are stored, stores one or more candidate questions and answers responding to the candidate questions. The analysis unit includes: a text conversion module to convert the question content received from the terminal via the transceiving unit into text data; a keyword extraction module morpheme-decomposes the question content data converted into text and stored in a noun-oriented manner to extract only a keyword; a code conversion module to convert the extracted keyword into a data-analyzing code; a determination module to compare and determine the converted data-analyzing code with a question form data code stored in the database to determine and extract an answer to a corresponding question; and an answer generation module to transmit answer data to question content matched by the determination module.

Description

{A communication typed question and answer system with data supplying in statistic database}

본 발명은 통계 데이터베이스의 데이터와 그 구조 (스키마)를 수집하여 자연어 질의를 통해 질의에 해당하는 통계값을 제시할 수 있는 질의 답변 시스템에 관한 것으로서, 더욱 구체적으로는 키워드 추출모듈을 통하여 텍스트 변환모듈을 통해 저장된 질의 내용 데이터를 질의 내용과 질의 어구를 분해하여 질의 어구를 삭제하고 질의 어구가 삭제된 질의 내용을 명사 위주로 형태소를 분해하며, 조사 등은 제거하여 적어도 하나 이상의 핵심 키워드를 추출하도록 하는 것을 부가한 통계 데이터베이스의 데이터를 대화형으로 제공하는 질의 답변 시스템에 관한 것이다.The present invention relates to a query answering system that collects data of a statistical database and its structure (schema) and can present statistical values corresponding to a query through a natural language query, and more specifically, a text conversion module through a keyword extraction module. Decomposing the query contents data stored through query contents and query phrases to delete query phrases, decomposing morphemes based on nouns of the query contents deleted query phrases, and removing at least one key keyword by removing investigations It relates to a question-and-answer system that interactively provides data of the added statistics database.

종래의 정보 검색 방법은 키워드 패턴 매칭 방식의 검색에 국한하고 형태적 동일성에 의존하여 검색 키위드와 모양이 같은 형태가 들어있는 모든 자료를 검색하여 제공한다.The conventional information retrieval method is limited to the search of the keyword pattern matching method and relies on the morphological identity to search and provide all data having the same shape as the search keyword.

이러한 정보 검색 방법은 키워드 패턴 매칭에 의한 과다한 검색 결과로 인하여 원하는 내용을 일일이 찾아야 하는 수고를 피할 수 없었다.This information retrieval method could not avoid the effort of finding the desired content one by one due to excessive search results by keyword pattern matching.

종래의 정보 검색 방법은 키워드 패턴 매칭 방식의 검색에 국한하고 형태적 동일 성에 의존하여 검색 키워드와 모양이 같은 형태가 들어있는 모든 자료를 검색하여 제공한다.The conventional information retrieval method is limited to the search of the keyword pattern matching method and relies on the morphological identity to search and provide all data having the same shape as the search keyword.

이러한 정보 검색 방법은 키워드 패턴 매칭에 의한 과다한 검색 결과로 인하여 원하는 내용을 일일이 찾아야 하는 수고를 피할 수 없다.This information retrieval method cannot avoid the effort of finding the desired content individually due to excessive search results by keyword pattern matching.

형태적 동일 여부에 의존하는 방식은 예를 들어, "트와이스는 인원수는?"이라는 질의에 "트와이스"와 "인원수"가 포함된 많은 자료를 나열식으로 보여주는 것에 불과할 뿐 "9명"이라는 정확한 답을 제공할 수 없었다.The method of relying on morphological equality is only to list a lot of data including "Twice" and "Number of people" in a query, for example, "What is the number of people in TWICE?" I couldn't provide an exact answer.

또한, 종래의 정보 검색 방법은 예를 들어 "맛있는 배"의 검색 결과에 "타는 배", "신체 부위 배", "갑절 배" 등 잉여 정보를 끝없이 찾아 검색 결과로 제공한다. 따라서, 정보를 검색하는 이용자는 과다한 검색 결과로 인하여 자신이 원하는 정보를 신속하게 검색할 수 없는 문제점이 있었다.In addition, the conventional information retrieval method continuously searches for surplus information such as "burning pear", "body part pear", and "double pear" in search results of "delicious pear", and provides them as a search result. Accordingly, there is a problem in that a user who searches for information cannot quickly search for information desired by the user due to excessive search results.

대한민국 등록특허공보 제10-1832816호Republic of Korea Registered Patent Publication No. 10-1832816 대한민국 공개특허공보 제10-2017-0129352호Republic of Korea Patent Publication No. 10-2017-0129352 대한민국 공개특허공보 제10-2018-0086801호Republic of Korea Patent Publication No. 10-2018-0086801

본 발명은 상술한 바와 같은 문제점을 해결하기 위하여 안출한 것으로서 정보를 검색하는 이용자는 과다한 검색 결과로 인하여 자신이 원하는 정보를 신속하게 검색할 수 없는 문제점이 있었는바 본 발명은 이를 완전하게 해결하는 사용자의 질의에 대한 답변을 생성하기 위한 장치 또는 시스템을 제공하고자 한다.The present invention has been devised to solve the problems as described above, and a user who searches for information has a problem in that he or she cannot quickly search for desired information due to excessive search results. It is intended to provide a device or a system for generating an answer to a query.

또한 본 발명은 통계 관련 데이터베이스의 구조인 스키마의 컬럼과 데이터인 레코드 value를 구조체로 데이터를 수집하고, 사용자가 자연어 - 문장으로 질의하였을 때 수집한 데이터를 활용하여 통계 정보를 빠르게 전달 할 수 있는 통계 데이터베이스의 데이터를 대화형으로 제공하는 질의 답변 시스템을 제공하고자 한다.In addition, the present invention collects data as a structure of a schema column, which is a structure of a database related to statistics, and a record value, which is data, and when users query in natural language-sentences, statistics that can quickly transmit statistical information by using the collected data We want to provide a query answering system that interactively provides database data.

상기한 바와 같은 목적을 달성하기 위한 본 발명의 질의 답변 시스템에 따르면, 단말기, 송수신부, 분석부, 데이터베이스를 포함하는 질의 답변 시스템에 있어서, 상기 단말기는 사용자가 질의 내용을 입력할 수 있는 입력부와 상기 사용자의 질의 내용에 대한 답변을 확인할 수 있는 출력부를 포함하고, 상기 단말기로부터 질의 내용을 수신하고, 수신한 질의 내용에 대한 답변을 단말기로 송신하는 송수신부와 상기 질의를 하고자하는 질의 형식 데이터가 저장되어 있고, 적어도 하나의 후보 질의들 및 후보 질의들 각각에 대응하는 답변들을 저장하는 데이터베이스와 상기 분석부는 송수신부를 통해 단말기로부터 제공받은 질의 내용을 텍스트 데이터로 변환하는 텍스트 변환모듈과 상기 텍스트로 변환 및 저장되는 질의 내용 데이터를 명사 위주의 형태소 분해하여 키워드만 추출하는 키워드 추출모듈과 상기 추출된 키워드를 데이터 분석형 코드로 변환하는 코드변환모듈과 상기 변환된 데이터 분석형 코드를 데이터베이스에 저장되어 있는 질의 형식 데이터 코드와 비교 판단하여 해당 질의에 대한 답변을 판단 및 추출하는 판단모듈 및 상기 판단모듈을 통해 매칭된 질의 내용에 대한 답변 데이터를 송신하는 답변생성모듈을 포함한다.According to the query answering system of the present invention for achieving the above object, in a query answering system including a terminal, a transmitting and receiving unit, an analysis unit, a database, the terminal includes an input unit for a user to input the query content It includes an output unit that can confirm the answer to the user's query content, receives the query content from the terminal, and transmits and receives a response to the received query content to the terminal and the query format data to be queried A text conversion module and a text conversion module that converts the content of a query provided from a terminal into text data through a database that is stored and stores at least one candidate queries and answers corresponding to each of the candidate queries. And a keyword extraction module for extracting only keywords by morphologically decomposing the stored query content data into nouns, a code conversion module for converting the extracted keywords into data analysis code, and the converted data analysis code stored in a database. It includes a decision module that compares and determines the query format data code to determine and extract the answer to the query and an answer generation module that transmits answer data for the matched query content through the decision module.

상기 키워드 추출모듈은 텍스트 변환모듈을 통해 저장된 질의 내용 데이터를 질의 내용과 질의 어구를 분해하여 질의 어구를 삭제하고 질의 어구가 삭제된 질의 내용을 명사 위주로 형태소를 분해하며, 조사 등은 제거하여 적어도 하나 이상의 핵심 키워드를 추출하도록 한다.The keyword extraction module decomposes the query content data stored through the text conversion module, decomposes the query content and the query phrase, deletes the query phrase, decomposes the query contents from which the query phrase has been deleted, and decomposes the morpheme, and removes the investigation, etc. Let's extract the above key keywords.

상기 코드변환모듈은 키워드 추출모듈에서 추출된 핵심 키워드를 데이터 분석형 코드로 변환하도록 하되, 상기 코드는 질의 내용의 핵심 키워드를 자음과 모음으로 분류하고 자음과 모음에 대한 각각의 코드 번호를 부여하여 나열된 모음과 자음을 부여되는 코드 번호로 변환하며, 받침에 따른 코드는 자음이 이중으로 사용되는 것으로 판단하여 해당 받침으로 쓰이는 자음을 분류하여 해당 두개의 알파벳 대문자로 이루어지는 코드로 변환이 되어 데이터베이스에 저장된다.The code conversion module converts the core keyword extracted from the keyword extraction module into a data analysis type code, wherein the code classifies the core keywords of the query content into consonants and vowels and assigns respective code numbers for consonants and vowels. It converts the listed vowels and consonants to the code number to be assigned, and the code according to the final consonant is judged to be used in duplicate and the consonants used as the corresponding consonant are classified and converted into codes consisting of the corresponding two alphabetic capital letters and stored in the database. do.

또한, 상기 데이터베이스에 저장되는 질의 형식 데이터는 사용자의 질의 내용 데이터와 매칭하여 질의 답변을 생성하기 위하여 질의어 매칭 컬럼과 답변 컬럼 Value를 구분하여 데이터 테이블로 변환하는 질의 매칭 데이터 변환부를 더 포함하여 형성한다.In addition, the query format data stored in the database is formed by further including a query matching data conversion unit that converts a query word matching column and an answer column value into a data table by matching the query content data of the user to generate a query answer. .

본 발명은 단말기, 송수신부, 분석부, 데이터베이스를 포함하는 질의 답변 시스템에 있어서,The present invention in a question and answer system including a terminal, a transceiver, an analysis unit, a database,

상기 단말기는 사용자가 질의 내용을 입력할 수 있는 입력부와; The terminal includes an input unit through which a user can input query content;

상기 사용자의 질의 내용에 대한 답변을 확인할 수 있는 출력부;를 포함하고,Includes; an output unit for checking the answer to the user's query content,

상기 단말기로부터 질의 내용을 수신하고, 수신한 질의 내용에 대한 답변을 단말기로 송신하는 송수신부와;A transmitting and receiving unit that receives the query content from the terminal and transmits an answer to the received query content to the terminal;

상기 질의를 하고자하는 질의 형식 데이터가 저장되어 있고, 적어도 하나의 후보 질의들 및 후보 질의들 각각에 대응하는 답변들을 저장하는 데이터베이스와;A database in which query format data for inquiring the query is stored, and storing at least one candidate queries and answers corresponding to each of the candidate queries;

상기 분석부는 송수신부를 통해 단말기로부터 제공받은 질의 내용을 텍스트 데이터로 변환하는 텍스트 변환모듈과;The analysis unit is a text conversion module for converting the query content received from the terminal to the text data through the transceiver unit;

상기 텍스트로 변환 및 저장되는 질의 내용 데이터를 명사 위주의 형태소 분해하여 키워드만 추출하는 키워드 추출모듈과;A keyword extraction module that extracts only keywords by decomposing the noun-oriented morphemes into the query contents data converted and stored into the text;

상기 추출된 키워드를 데이터 분석형 코드로 변환하는 코드변환모듈과;A code conversion module for converting the extracted keyword into a data analysis type code;

변환된 데이터 분석형 코드를 데이터베이스에 저장되어 있는 질의 형식 데이터 코드와 비교 판단하여 해당 질의에 대한 답변을 판단 및 추출하는 판단모듈; 및 상기 판단모듈을 통해 매칭된 질의 내용에 대한 답변 데이터를 송신하는 답변생성모듈을 포함하는 것을 특징으로 하는 통계 데이터베이스의 데이터를 대화형으로 제공하는 질의 답변 시스템을 제공한다.A judgment module that compares and determines the converted data analysis type code with the query type data code stored in the database to determine and extract an answer to the query; And an answer generation module that transmits answer data for the matched query content through the determination module.

또한 본 발명은 상기 키워드 추출모듈은 텍스트 변환모듈을 통해 저장된 질의 내용 데이터를 질의 내용과 질의 어구를 분해하여 질의 어구를 삭제하고 질의 어구가 삭제된 질의 내용을 명사 위주로 형태소를 분해하며, 조사 등은 제거하여 적어도 하나 이상의 핵심 키워드를 추출하도록 하는 것을 특징으로 하는 질의 답변 시스템을 제공한다.In addition, in the present invention, the keyword extraction module decomposes query contents and query phrases from query contents data stored through a text conversion module to delete query phrases, decomposes morphemes based on nouns, and query contents from which query phrases are deleted. Provides a query answering system characterized by extracting at least one or more key keywords.

또한 본 발명은 상기 코드변환모듈은 키워드 추출모듈에서 추출된 핵심 키워드를 데이터 분석형 코드로 변환하도록 하되, In addition, the present invention, the code conversion module is to convert the key keywords extracted from the keyword extraction module into data analysis code,

상기 코드는 질의 내용의 핵심 키워드를 자음과 모음으로 분류하고 자음과 모음에 대한 각각의 코드 번호를 부여하여 나열된 모음과 자음을 부여되는 코드 번호로 변환하며, 받침에 따른 코드는 자음이 이중으로 사용되는 것으로 판단하여 해당 받침으로 쓰이는 자음을 분류하여 해당 두개의 알파벳 대문자로 이루어지는 코드로 변환이 되어 데이터베이스에 저장되는 것을 특징으로 하는 질의 답변 시스템을 제공한다.The above code classifies the key keywords of the query contents into consonants and vowels, and converts the listed vowels and consonants to the code numbers given by assigning each code number for consonants and vowels. It provides a question and answer system characterized by classifying consonants used as the corresponding base and converting them into codes consisting of the corresponding two alphabetic uppercase letters and storing them in a database.

또한 본 발명은 상기 데이터베이스에 저장되는 질의 형식 데이터는 사용자의 질의 내용 데이터와 매칭하여 질의 답변을 생성하기 위하여 질의어 매칭 컬럼과 답변 컬럼 Value를 구분하여 데이터 테이블로 변환하는 질의 매칭 데이터 변환부를 더 포함하여 형성될 수 있는 것을 특징으로 하는 질의 답변 시스템을 제공한다.In addition, the present invention further includes a query matching data conversion unit that converts the query word matching column and the answer column value into a data table by classifying the query type data stored in the database to match the user's query content data to generate a query answer. It provides a question and answer system characterized in that it can be formed.

또한 본 발명은 상기 텍스트 변환모듈(310)은 사용자가 입력장치를 통해 텍스트로 질의 내용을 입력 및 저장될 수 있고, 질의 내용이 음성으로 인식되면 음성 및 음성에 대응되는 텍스트들을 통해 학습되는 기계 학습 기반의 음성 인식 모델을 이용하여 텍스트로 변환할 수 있는 것을 특징으로 하는 질의 답변 시스템을 제공한다.In addition, in the present invention, the text conversion module 310 allows a user to input and store query content as text through an input device, and when the query content is recognized as speech, machine learning is performed through text corresponding to speech and speech. It provides a question and answer system characterized by being able to convert to text using a voice recognition model based.

또한 본 발명은 상기 텍스트 변환모듈은 숫자에 대한 텍스트 또는 음성 데이터를 제공받을 때 숫자는 텍스트 입력시 로마숫자로 저장되도록 하되,In addition, in the present invention, when the text conversion module receives text or voice data for a number, the number is stored in Roman numerals when text is input.

상기 로마숫자로 저장된 숫자데이터는 한글 표기법으로 변환되도록 하는 숫자 변환 모델을 더 포함하며, 음성으로 인식한 숫자 데이터는 사용자의 음성 인식에 따라 한글로 표기하도록 저장될 수 있도록 하는 것을 특징으로 하는 질의 답변 시스템을 제공한다.The number data stored in the Roman numerals further includes a number conversion model to be converted into Hangul notation, and the numeric data recognized by voice can be stored to be written in Hangul according to the user's voice recognition. Provide a system.

본 발명에 통계 데이터베이스의 데이터를 대화형으로 제공하는 질의 답변 시스템은 시스템이 보유하고 있는 통계 데이터베이스의 정보를 손쉽게 구조체 데이터로 변환하여 사용자가 자연어 질의를 통해 요청하는 통계 정보에 대하여 통계값을 신속하게 제공할 수 있어, 기존의 1차 키워드 검색 후 사용자가 스스로 다시 통계값을 찾아가는 불편함을 없앨 수 있는 이점이 있다. The query answering system that interactively provides the data of the statistical database to the present invention can easily convert the information of the statistical database held by the system into structure data to quickly obtain statistical values for statistical information requested by the user through natural language queries. Since it can provide, it has the advantage of eliminating the inconvenience of users retrieving statistics by themselves after searching the existing primary keyword.

또한 본 발명의 효과는, 사용자의 질의에 대한 답변을 생성하기 위한 장치를 제공할 수 있다.Also, the effect of the present invention can provide an apparatus for generating an answer to a user's query.

또한, 사용자의 질의에 대해 실시간으로 답변을 생성하여 제공할 수 있다.In addition, it is possible to generate and provide answers to users' queries in real time.

도 1은 본 발명의 바람직한 실시예에 따른 통계 데이터베이스의 데이터를 대화형으로 제공하는 질의 답변 시스템의 구성을 도시한 구성도이다.
도 2는 본 발명의 바람직한 실시예에 따른 질의 답변 시스템의 코드를 도시한 표이다.
도 3은 본 발명의 바람직한 실시예에 따른 질의 답변 시스템의 질의 형식 데이터를 도시한 표이다.
도 3b는 본 발명의 실시예에 따른 통계 데이터베이스의 데이터를 대화형 데이터로 변환하여 구조체 데어터로 저장하는 것을 보여주는 도면이다.
도 3c는 도 3b에서 저장해놓은 구조체 데이터를 바탕으로 사용자의 질의에 구조체를 검색하고 결과를 제시하는 것을 보여주는 도면이다.1 is a configuration diagram showing a configuration of a question and answer system that interactively provides data of a statistical database according to a preferred embodiment of the present invention.
2 is a table showing the code of the query answering system according to a preferred embodiment of the present invention.
3 is a table showing query format data of a query answering system according to a preferred embodiment of the present invention.
3B is a diagram showing that data of a statistical database according to an embodiment of the present invention is converted into interactive data and stored as structure data.
FIG. 3C is a diagram illustrating searching a structure and presenting a result to a user's query based on the structure data stored in FIG. 3B.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention, and methods for achieving them will be clarified with reference to embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the embodiments allow the disclosure of the present invention to be complete, and common knowledge in the technical field to which the present invention pertains. It is provided to completely inform the person having the scope of the invention, and the present invention is only defined by the scope of the claims. The same reference numerals refer to the same components throughout the specification.

이하, 본 발명의 실시예들에 의하여 데이터베이스의 데이터를 대화형으로 제공하는 질의 답변 시스템을 설명하기 위한 도면들을 참고하여 본 발명에 대해 설명하도록 한다.Hereinafter, the present invention will be described with reference to the drawings for explaining a question and answer system for interactively providing database data according to embodiments of the present invention.

상기한 데이터베이스는 통상의 데이터베이스 및 각종 통계 데이터베이스를 포함하는 개념이다.The above database is a concept including a normal database and various statistical databases.

도 1은 본 발명의 바람직한 실시예에 따른 질의 답변 시스템의 구성을 도시한 구성도이고 도 2는 본 발명의 바람직한 실시예에 따른 질의 답변 시스템의 코드를 도시한 표이며 도 3은 본 발명의 바람직한 실시예에 따른 질의 답변 시스템의 질의 형식 데이터를 도시한 표이다.1 is a block diagram showing the configuration of a question and answer system according to a preferred embodiment of the present invention, FIG. 2 is a table showing the code of a question and answer system according to a preferred embodiment of the present invention, and FIG. 3 is a preferred embodiment of the present invention A table showing query format data of a query answering system according to an embodiment.

도 1을 참조하면, 본 발명의 일 실시예에 따른 질의 답변 시스템은 단말기(100), 송수신부(200), 분석부(300), 데이터베이스(400)를 포함한다.Referring to FIG. 1, a question answering system according to an embodiment of the present invention includes a terminal 100, a transmitting and receiving unit 200, an analysis unit 300, and a database 400.

상기 단말기(100)는 사용자가 질의 내용을 입력할 수 있는 입력부(110)와 사용자의 질의 내용에 대한 답변을 확인할 수 있는 출력부(120)를 포함한다.The terminal 100 includes an input unit 110 through which a user can input query content and an output unit 120 through which an answer to the user's query content can be checked.

상기 입력부(110)는 사용자의 질의 내용을 입력하여 입력 받은 질의 내용을 송수신부(200)를 통해 분석부(300)로 송신이 가능한 것이다.The input unit 110 is capable of transmitting the query contents received by inputting the user's query contents to the analysis unit 300 through the transceiver 200.

또한, 상기 입력부(110)에 사용자가 입력하는 질의 내용은 음성 또는 텍스트로 입력받을 수 있는 것이다.In addition, the query content input by the user to the input unit 110 may be input by voice or text.

상기 출력부(120)는 사용자의 질의 내용에 대한 답변을 분석부(300)로부터 송수신부(200)로 수신이 가능한 것이다.The output unit 120 is capable of receiving an answer to a user's query content from the analysis unit 300 to the transmission / reception unit 200.

또한, 상기 출력부(120)에 질의 내용에 대한 답변은 음성 또는 텍스트로 출력될 수 있는 것이다.In addition, an answer to the query content in the output unit 120 may be output by voice or text.

한편, 사용자 단말기(100)는 예를 들어, 데스크톱 PC, 랩톱 PC, 태블릿 PC, 스마트폰, PDA(Personal Digital Assistant) 등과 같이 유무선 네트워크를 통한 데이터 통신 기능 및 정보 처리 기능을 구비한 다양한 형태의 장치를 포함할 수 있는 것이 바람직하다.On the other hand, the user terminal 100 is, for example, a desktop PC, a laptop PC, a tablet PC, a smart phone, a PDA (Personal Digital Assistant), etc. Various types of devices with data communication and information processing functions through a wired or wireless network It is preferable to be able to include.

상기 송수신부(200)는 단말기(100)로부터 질의 내용을 수신하고, 수신한 질의 내용에 대한 답변을 단말기(100)로 송신한다.The transceiver 200 receives the query content from the terminal 100 and transmits an answer to the received query content to the terminal 100.

구체적으로, 사용자가 단말기(100)를 통해 입력한 질의 내용을 분석부(300)로 제공하고, 분석부(300)를 통해 질의 내용에 대한 답변을 수신하여 단말기(100)를 통해 디스플레이 화면 상에 표시하기 위한 애플리케이션이 설치될 수 있다.Specifically, the query content input by the user through the terminal 100 is provided to the analysis unit 300, and an answer to the query content is received through the analysis unit 300 on the display screen through the terminal 100. An application for display can be installed.

상기 데이터베이스(400)는 질의를 하고자하는 질의 형식 데이터가 저장되어 있고, 적어도 하나의 후보 질의들 및 후보 질의들 각각에 대응하는 답변들을 저장한다.The database 400 stores query format data to be queried, and stores at least one candidate queries and answers corresponding to each of the candidate queries.

이때, 데이터베이스(400)는 예를 들어, 단말기(100)의 내부 또는 외부에 존재하는 하드디스크드라이브, ROM(Read Only Memory), RAM(Random Access Memory), 메모리카드 등일 수 있다. 또한, 후보 질의들 및 후보 질의들 각각에 대응하는 답변들을 텍스트 형태 및 코드 데이터로 저장될 수 있다.In this case, the database 400 may be, for example, a hard disk drive, a read only memory (ROM), a random access memory (RAM), a memory card, or the like existing inside or outside the terminal 100. Also, candidate queries and responses corresponding to each of the candidate queries may be stored in text form and code data.

상기 분석부(300)는 송수신부(200)를 통해 단말기(100)로부터 제공받은 질의 내용을 텍스트 데이터로 변환하는 텍스트 변환모듈(310)과, 텍스트로 변환 및 저장되는 질의 내용 데이터를 명사 위주의 형태소 분해하여 키워드만 추출하는 키워드 추출모듈(320)과, 상기 추출된 키워드를 데이터 분석형 코드로 변환하는 코드변환모듈(330)과; 변환된 데이터 분석형 코드를 데이터베이스(400)에 저장되어 있는 질의 형식 데이터 코드와 비교 판단하여 해당 질의에 대한 답변을 판단 및 추출하는 판단모듈(340) 및 상기 판단모듈을 통해 매칭된 질의 내용에 대한 답변 데이터를 송신하는 답변생성모듈(350)을 포함한다.The analysis unit 300 mainly focuses on the text conversion module 310 for converting the query content received from the terminal 100 to text data through the transceiver 200 and the query content data converted and stored into text. A keyword extraction module 320 for morphological decomposition to extract only keywords, and a code conversion module 330 for converting the extracted keywords into data analysis type codes; A judgment module 340 for determining and extracting an answer to the query by comparing and determining the converted data analysis type code with the query type data code stored in the database 400, and for the matched query content through the determination module It includes an answer generation module 350 for transmitting the response data.

상기 텍스트 변환모듈(310)은 송수신부(200)를 통해 단말기(100)로부터 제공받은 질의 내용을 텍스트 데이터로 변환하여 저장될 수 있다.The text conversion module 310 may convert and store the query content received from the terminal 100 through text transceiving unit 200 into text data.

상기 텍스트 변환모듈(310)은 사용자가 입력장치를 사용하여 타이핑을 통해 텍스트로 질의 내용을 입력할 수 있는 것이다. The text conversion module 310 allows a user to input a query content as text through typing using an input device.

또는 음성 인식 모델을 이용하여 질의 내용이 음성으로 인식되면 텍스트로 변환할 수 있다.Alternatively, if the query content is recognized as speech by using a speech recognition model, the query content may be converted into text.

이때, 음성 인식 모델은 음성 및 음성에 대응되는 텍스트들을 통해 학습되는 기계 학습 기반의 음성 인식 모델일 수 있다.In this case, the speech recognition model may be a speech recognition model based on machine learning that is learned through speech and texts corresponding to speech.

또한, 텍스트 변환모듈(310)은 사용자의 단말기(100)를 통해 제공받은 질의 내용이 음성일 경우는 음성 인식 모델을 통해 텍스트로 1차 변환하는 것이다.In addition, the text conversion module 310 performs a primary conversion to text through a voice recognition model when the query content provided through the user's terminal 100 is voice.

단말기(100)를 통해 제공받은 질의 내용이 음성일 경우나 텍스트일 경우 텍스트 변환모듈(310)을 통해 질의 내용이 텍스트로 저장이 되는 것이다.When the query content provided through the terminal 100 is voice or text, the query content is stored as text through the text conversion module 310.

본 발명의 음성 인식 모델은 입력 음성의 음소들을 구분하여 각 음소별 특징벡터를 추출하고 추출된 음소별 특징벡터를 상기 음소별 기준 모델들과 패턴매칭 수행하여 음성 인식하는 기능을 수행하는 것을 특징으로 한다.The voice recognition model of the present invention is characterized by performing a function of recognizing speech by classifying phonemes of the input voice, extracting a feature vector for each phoneme, and performing pattern matching on the extracted phoneme feature vectors with the reference models for each phoneme. do.

본 발명의 음성 인식 모델은 상기한 텍스트 변환모듈(310)에 부가된 음성 인식 모듈(미도시)에서 수행하게 되며,The speech recognition model of the present invention is performed by the speech recognition module (not shown) added to the text conversion module 310 described above.

본 발명에 따른 음성인식 방법은, 사용자가 입력한 음성에 의해 생성된 각 음소별 특징벡터의 기준 모델을 생성하여 저장하는 단계와; 음성을 입력받는 단계와; 상기 입력된 음성의 음소들을 구분하여 각 음소별 특징벡터를 추출하는 단계와; 상기 추출된 음소별 특징벡터를 저장된 상기 음소별 기준 모델들과 패턴매칭 수행하여 음성을 인식하는 단계;를 포함함을 특징으로 한다.The voice recognition method according to the present invention includes the steps of generating and storing a reference model of a feature vector for each phoneme generated by a voice input by a user; Receiving a voice input; Separating the phonemes of the input voice and extracting a feature vector for each phoneme; And recognizing speech by performing pattern matching with the stored reference models for each phoneme by storing the extracted feature vectors for each phoneme.

또한 본 발명의 음소 모델링 방법은 사용자가 발음하는 문자에 대한 음성을 최소 단위인 음소로 구분하고 각 음소별 기준 모델을 생성하여 데이터베이스화하는 단계와, 입력되는 음성의 음소들을 구분하여 각 음소별 특징벡터를 추출하고 그 추출된 음소별 특징벡터를 데이터베이스화된 음소별 기준 모델들과 패턴매칭 수행하여 음성 인식하는 단계로 구분할 수 있다.In addition, the phoneme modeling method of the present invention divides the voice of a user-pronounced character into phonemes that are the smallest unit, and generates and models a reference model for each phoneme, and distinguishes phonemes of the input voices, and features each phoneme. The vector can be extracted, and the extracted feature vector for each phoneme can be classified into a step of speech recognition by performing pattern matching with database-based reference models for each phoneme.

상술한 방법을 통해 본 발명은 한글 자모 각각에 대한 기준 모델을 생성하여 놓을 수 있기 때문에, 사용자가 표시 문자에 대한 음성을 입력하는 것만으로 각 음소별 기준 모델이 지속적으로 갱신되기 때문에 결과적으로는 음성 인식률을 향상시Through the above-described method, since the present invention can generate and place a reference model for each Hangul alphabet, the reference model for each phoneme is continuously updated only by a user inputting a voice for a display character, resulting in voice. When raising recognition rate

킬 수 있는 효과를 누릴 수 있다.You can enjoy the killing effect.

또한, 모든 한글 어휘에 대한 음성인식이 가능함으로서 사용자는 음성 인식을 위해 필요한 음성들을 반복 입력하여야만 하는 불편함으로부터 해방될 수 있는 이점도 있다.In addition, since voice recognition for all Hangul vocabularies is possible, there is an advantage that a user can be freed from the inconvenience of having to repeatedly input voices necessary for voice recognition.

또한 본 발명의 음성 인식 모델은 입력 음성이 외국어 또는 외래어가 혼합되어 입력이 되는 경우 외국어 또는 외래어에 대한 음성 인식을 구분하여 음성 인식 기능을 수행하고 그에 따라 외국어 또는 외래어에 대한 음성 인식에 대한 한국어 번역문을 제시하여 입력 음성이 번역 음성 인식되도록 하는 기능을 수행하는 것을 특징으로 한다.In addition, the speech recognition model of the present invention performs a speech recognition function by distinguishing speech recognition for a foreign language or a foreign language when the input speech is input by mixing a foreign language or a foreign language, and accordingly Korean translation for speech recognition for a foreign language or foreign language It is characterized in that to perform the function of presenting the input speech to be recognized by the translation speech.

본 발명의 상기한 외국어 또는 외래어에 대한 음성 인식에 대한 한국어 번역문을 제시하여 입력 음성이 번역 음성 인식되도록 하는 방법은 다음과 같이 수행되게 된다.The method of presenting the Korean translation for voice recognition for the foreign language or foreign language of the present invention so that the input voice is translated voice recognition is performed as follows.

먼저 입력 음성에서 '외국어 또는 외래어'를 추출하는 단계를 수행한다.First, a step of extracting 'foreign language or foreign language' from the input voice is performed.

상기한 입력 음성에서 '외국어 또는 외래어'를 추출하는 과정은 기설정된 한국어 데이터베이스 및 기설정된 '외국어 또는 외래어 데이터베이스'과 대비하여 한국어인지 '외국어 또는 외래어'인지 여부를 판단하여 '외국어 또는 외래어'로 추출되는 단계가 수행된다.The process of extracting the 'foreign language or foreign language' from the input voice is determined by determining whether it is Korean or 'foreign language or foreign language' in comparison with the preset Korean database and the preset 'foreign language or foreign language database' and extracting it as 'foreign language or foreign language' Is performed.

그리고, 상기 추출된 '외국어 또는 외래어'를 기설정된 외국어 데이터베이스에서 원문으로 변환하게 된다. Then, the extracted 'foreign language or foreign language' is converted into an original text in a preset foreign language database.

그리고, 상기 변환된 외국어 또는 외래어에 대한 번역문을 기설정된 번역 데이터베이스에서 선택하는 단계를 수행한 후, 상기 선택된 여러 개의 번역문 중에서 1번째 또는 선택 빈도수가 높은 번역문을 선택하는 단계를 수행한다.Then, after performing the step of selecting a translation for the converted foreign language or foreign language from a preset translation database, a step of selecting a first or higher frequency translation among the selected translations is performed.

상기 추출된 번역문을 입력 음성의 '외국어 또는 외래어'와 치환하는 단계를 수행하게 된다.Substituting the extracted translation with the 'foreign language or foreign language' of the input voice is performed.

즉, 상기한 외국어 또는 외래어에 대한 음성 인식에 대한 한국어 번역문을 제시하여 입력 음성이 번역 음성 인식되도록 하는 방법에 대한 실시예는 아래와 같다.That is, an embodiment of a method for presenting a Korean translation of speech recognition for a foreign language or a foreign language as described above so that the input speech is translated speech recognition is as follows.

즉, 사용자가 "서울의 가방 프라이스는 얼마인가?"라는 음성으로 입력을 했을 경우 "프라이스"가 한국어인지 '외국어 또는외래어'인지 여부는 기설정된 한국어 데이터베이스 및 기설정된 '외국어 또는 외래어 데이터베이스'에서 대비하여 판단하여 '프라이스'가 외국어로 판단되는바 외국어인 "프라이스"가 추출되는 단계를 수행된다.In other words, if the user inputs "How much is Seoul's bag price?", Whether "Price" is Korean or "Foreign language or foreign language" is compared to the preset Korean database and the preset "Foreign language or foreign language database". It is determined that 'Price' is a foreign language, so a step of extracting the foreign language "Price" is performed.

그리고 추출된 "프라이스"에 대하여 "price"로 원문 변환을 하는 단계를 수행하게 된다.Then, the original text is converted to “price” for the extracted “price”.

그리고 상기 원문 변환된 "price"에 대한 기설정된 번역 데이터베이스의 번역문인 "1. 가격, 값, 물가 2. (치러야 할) 대가 3. (상금의) 배당률"을 선택하는 단계를 수행한다.Then, the step of selecting the translation text of the preset translation database for the original translated "price", "1. price, value, price 2. (to be paid) price 3. (to pay) odds".

더불어 상기 선택된 여러 개의 번역문 중에서 1번째 번역문인 "가격"이 선택되어 추출되는 단계가 수행된다.In addition, a step of selecting and extracting the first translation "price" from among the selected translations is performed.

그리고 "프라이스"는 "프라이스"에 대한 추출된 번역문인 "가격"으로 치환되는 단계가 수행되게 된다.Then, "Price" is replaced with "Price", which is an extracted translation of "Price".

따라서 "프라이스"는 영어의 "price"로 해당되고 "prcie"는 "가격"으로 번역되어 "서울의 가방 가격은 얼마인가?"라고 번역되어 음성 인식이 되게 된다.Therefore, "Price" is equivalent to "price" in English, and "prcie" is translated to "price", which translates to "How much is the bag price in Seoul?"

본 발명은 이와 같이 입력 음성이 외국어 또는 외래어가 혼합되어 입력이 되는 경우 외국어 또는 외래어에 대한 음성 인식을 구분하여 음성 인식 기능을 수행하고 그에 따라 외국어 또는 외래어에 대한 음성 인식에 대한 한국어 번역문을 제시하여 입력 음성이 번역 음성 인식되도록 하는 기능을 수행하여, 사용자에 의하여 다양하게 쓰이는 외국어 또는 외래어에 대한 한국어 번역문으로 통일되어 입력하게 함으로써 사용자가 자연어 질의를 통해 요청하는 통계 정보에 대하여 통계값을 신속하게 제공할 수 있는 기술적 특징이 있다.According to the present invention, when the input voice is input by mixing a foreign language or a foreign language, the speech recognition function of the foreign language or the foreign language is classified to perform the speech recognition function, and accordingly, a Korean translation of the speech recognition for the foreign language or the foreign language is presented. Performs a function that allows the input voice to be recognized as a translated voice, and provides a statistical value for statistical information requested by the user through a natural language query by allowing the user to unify and input Korean translations of foreign languages or foreign languages widely used by users There are technical features that can be done.

상기 키워드 추출모듈(320)은 텍스트 변환모듈(310)을 통해 저장된 질의 내용 데이터를 질의 내용과 질의 어구를 분해하여 질의 어구를 삭제하고 질의 어구가 삭제된 질의 내용을 명사 위주로 형태소를 분해하며, 조사 등은 제거하여 적어도 하나 이상의 핵심 키워드를 추출하도록 한다.The keyword extraction module 320 decomposes the query content data stored through the text conversion module 310 to decompose the query content and the query phrase, deletes the query phrase, and decomposes the query contents from which the query phrase is deleted, into a noun, and investigates Remove the etc to extract at least one key keyword.

예를 들어, "삼척의 남성 인원수가 궁금합니다."라는 질의 내용 데이터가 생성되면, 키워드 추출모듈(320)은 "삼척의 남성 인원수가 궁금합니다."라는 질의 내용 중 "궁금합니다" 질의 어구를 판단하고 삭제한다. "궁금합니다"라는 질의 어구를 삭제하고 남은 질의 내용인 "삼척의 남성 인원수가"를 명사인 "삼척", "남성", "인원수"를 추출하고, "의", "가"인 조사를 제거한다.For example, when the query content data of "I am curious about the number of males in Samcheok" is generated, the keyword extraction module 320 may use the phrase "I am curious" in the query "I am curious about the number of males in Samcheok." Judge and delete. Removed the phrase "I'm curious" and extracted the nouns "Samcheok", "Men", "Number of Persons", and the investigations with "righteousness" and "a" were removed. do.

이에 핵심 키워드는 "삼척", "남성", "인원수"로 추출되어 지는 것이다.The key keywords are extracted as "Samcheok", "Male", and "Number of people".

본 발명은 이와 같이 데이터베이스에서 추출된 질의 내용 데이터에 대한 핵심 키워드가 데이터베이스에 저장하게 되게 된다.In the present invention, key keywords for the query content data extracted from the database are stored in the database.

또한 상기한 질의 내용에 대하여 추출된 답변 데이터가 데이터베이스에 저장되게 된다.In addition, the answer data extracted for the above-mentioned query contents is stored in the database.

이와 같은 데이터 베이스 저장 형식을 대화형 데이터 변환 저장 형식이라고 하고, 이와 같이 저장된 데이터를 구조체 저장 정보(구조체 데이터)라고 한다.Such a database storage format is called an interactive data conversion storage format, and the data thus stored is called structure storage information (structure data).

도 3b에서 보는 바와 같이, 통계 데이터 베이스 정보에서 1차로 질의 검색을 하는 질의 내용 데이터가 핵심 키워드로 추출되어 저장되고, 1차로 질의 검색을 하는 내용에 대한 답변 데이터가 검색되어 구조체 저장 정보(구조체 데이터)로 한 짝(a pairing) 또는 한 묶음(a set)으로 매칭되어 저장되게 된다.As shown in FIG. 3B, the query content data for the primary query search from the statistical database information is extracted and stored as a key keyword, and the answer data for the content for the primary query search is retrieved and the structure storage information (structure data) ) Is matched and stored as a pairing or a set.

아래에서 더 상세히 설명할 바와 같이 이와 같은 구조체 저장 정보(구조체 데이터)는 질의 매칭 데이터 변환부를 통하여 질의어 매칭 컬럼과 답변 컬럼 Value로 한 짝(a pairing) 또는 한 묶음(a set)으로 매칭되어 저장되게 된다.As described in more detail below, such structure storage information (structure data) is matched and stored as a pairing or a set of query matching column and answer column values through the query matching data conversion unit. do.

도 3c는 상기에서 저장해 놓은 구조체 저장 정보를 통하여 사용자의 질의에 구조체 데이터를 검색하여 결과를 제시해주는 것을 나타내준다.3C shows that the structure data is searched for the user's query through the structure storage information stored above, and the result is presented.

이와 같이 본 발명에 통계 데이터베이스의 데이터를 대화형으로 제공하는 질의 답변 시스템은 시스템이 보유하고 있는 통계 데이터베이스의 정보를 손쉽게 구조체 데이터로 변환하여 사용자가 자연어 질의를 통해 요청하는 통계 정보에 대하여 통계값을 신속하게 제공할 수 있어, 기존의 1차 키워드 검색 후 사용자가 스스로 다시 통계 값을 찾아가는 불편함을 없앨 수 있는 이점이 있다. As described above, the query answering system that interactively provides the data of the statistical database to the present invention easily converts information of the statistical database held by the system into structured data, and provides statistical values for statistical information requested by the user through natural language queries. Since it can be provided quickly, there is an advantage that the user can search for the statistical value again by himself after searching the existing primary keyword.

또한, 상기 코드변환모듈(330)은 키워드 추출모듈(320)에서 추출된 핵심 키워드를 데이터 분석형 코드로 변환하도록 한다.In addition, the code conversion module 330 converts the core keywords extracted from the keyword extraction module 320 into data analysis type codes.

상기 코드는 질의 내용의 핵심 키워드를 자음과 모음으로 분류하고 자음과 모음에 대한 각각의 코드 번호를 부여하여 나열된 모음과 자음을 부여되는 코드 번호로 변환하는 것이다.The above code is to classify the key keywords of the query contents into consonants and vowels, and assign each code number for consonants and vowels to convert the listed vowels and consonants into code numbers.

도 2를 참조하여 코드 번호를 설명하면, 자음인 ㄱ, ㄴ, ㄷ, ... ㅆ, ㅉ 은 알파벳 대문자로 A, B, C, ... R, S 로 코드 변환이 가능하고, 모음인 ㅏ, ㅓ, ㅗ, ... ㅞ, ㅢ 는 알파벳 소문자로 a, b, c, ... t, u 로 코드 변환이 가능한 것이다.If the code number is described with reference to FIG. 2, the consonants ㄱ, ㄴ, ㄷ, ... ㅆ, ㅉ can be converted to A, B, C, ... R, S in uppercase alphabetic characters, and vowels ㅏ, ㅓ, ㅗ, ... ㅞ, ㅢ are lowercase alphabetic letters and can be coded into a, b, c, ... t, u.

또한, 받침에 따른 코는 자음이 이중으로 사용되는 것으로 판단하여 해당 받침으로 쓰이는 자음을 분류하여 해당 두개의 알파벳 대문자로 이루어지는 코드로 변환이 되어 사용하는 것이다.In addition, the nose according to the final consonant is determined to be used as a double, and the consonant used as the corresponding final is classified and converted into a code composed of the corresponding two alphabetic capital letters.

예를 들면, "앉다"를 코드로 변환하면 "ㅇ", "ㅏ", "ㄴㅈ", "ㄷ", "ㅏ"로 분류되어 코드를 적용하면, "ㅇ=H", "ㅏ=a", "ㄴㅈ=BI", "ㄷ=C", "ㅏ=a"로 "HaBICa"로 변환될 수 있는 것이다. For example, if "sit" is converted to a code, it is classified as "ㅇ", "ㅏ", "ㄴㅈ", "c", "ㅏ". If you apply the code, "ㅇ = H", "ㅏ = a" , "ㄴㅈ = BI", "ㄷ = C", "ㅏ = a" can be converted to "HaBICa".

상기 코드변환모듈(330)이 핵심 키워드를 코드로 변환하는 예를 들면, "삼척", "남성", "인원수"이라는 의 핵심키워드를 자음과 모음으로 분류하고 "ㅅ, ㅏ, ㅁ, ㅊ, ㅓ, ㄱ"과, "ㄴ, ㅏ, ㅁ, ㅅ, ㅓ"과, "ㅇ, ㅣ, ㄴ, ㅇ, ㅝ, ㄴ, ㅅ, ㅜ"로 분류하고 이를 도 2의 코드 표를 통하여 코드 번호를 생성하면, "GaEJbA"과, "BaEGbH"과, "HfBHsBGd"의 코드로 변환될 수 있는 것이다.For example, when the code conversion module 330 converts key keywords into codes, key keywords such as “Samcheok”, “Men”, and “Number of Persons” are classified into consonants and vowels, and “ㅅ, ㅏ, ㅁ, ㅊ, ㅓ, ㄱ "," b, ㅏ, ㅁ, ㅅ, ㅓ ", and" ㅇ, ㅣ, ㄴ, ㅇ, ㅝ, ㄴ, ㅅ, ㅜ "and classify the code number through the code table in FIG. When created, it can be converted into codes of "GaEJbA", "BaEGbH", and "HfBHsBGd".

이렇게 변환된 코드 데이터들은 데이터베이스(400)에 저장되도록 한다.The code data thus converted is stored in the database 400.

또한, 상기 데이터베이스(400)에 저장되어 있는 질의 형식 데이터들도 텍스트 변환모듈(310)로 텍스트 데이터로 변환하고, 키워드 추출모듈(320)로 텍스트로 변환 및 저장되는 질의 형식 데이터를 명사 위주의 형태소를 분해하여 키워드만 추출하며, 코드변환모듈(330)을 통해 추출된 키워드를 데이터 분석형 코드로 변환하여 질의 내용 데이터와 마찬가지로 데이터베이스(400)에 저장되는 것이다.In addition, the query format data stored in the database 400 is also converted into text data by the text conversion module 310, and the query format data converted and stored into text by the keyword extraction module 320 is a noun-oriented morpheme. Decomposes to extract only the keywords, and converts the keywords extracted through the code conversion module 330 into data analysis type codes and is stored in the database 400 like the query content data.

상기 판단모듈(340)은 변환된 데이터 분석형 코드를 데이터베이스(400)에 저장되어 있는 질의 형식 데이터 코드와 비교 판단하여 해당 질의에 대한 답변을 판단 및 추출하는 것이다.The determination module 340 compares and determines the converted data analysis type code with the query format data code stored in the database 400 to determine and extract the answer to the query.

구체적으로 도 2와, 도 3을 참고하여 설명하면, "삼척", "남성", "인원수"이라는 의 핵심키워드를 도 2의 코드 표를 통하여 코드 번호를 생성하면, "GaEJbA"과, "BaEGbH"과, "HfBHsBGd"의 코드로 변환하여 데이터 분석형 코드로 지정된다.Specifically, referring to FIGS. 2 and 3, when key codes of “Samcheok”, “Men”, and “Number of Persons” are generated through the code table of FIG. 2, “GaEJbA” and “BaEGbH” Converted to codes of "and" HfBHsBGd "and designated as data analysis type codes.

도 3의 제공되는 질의 형식 데이터는 사용자의 질의 내용 데이터와 매칭하여 질의 답변을 생성하기 위하여 질의어 매칭 컬럼과 답변 컬럼 Value를 구분하여 데이터 테이블로 변환하는 질의 매칭 데이터 변환부(미도시)를 더 포함하여 형성될 수 있다.The provided query format data of FIG. 3 further includes a query matching data conversion unit (not shown) that classifies the query word matching column and the answer column value and converts them into a data table to generate a query answer by matching the user's query content data. Can be formed.

예를 들어 질의 형식 데이터는 다음 표 1와 같다.For example, the query format data is shown in Table 1 below.

항목Item 전체all 남성male 여성female 남 비율Male ratio 여 비율F ratio 삼척Samcheok 1,5701,570 702702 868868 45%45% 55%55%

상기 질의 형식 데이터가 생성되면, 질의 매칭 데이터 변환부(미도시)는 질의어 매칭 컬럼과 답변 컬럼 Value를 다음 표 2와 같이 생성된다. When the query format data is generated, the query matching data conversion unit (not shown) generates query word matching columns and answer column values as shown in Table 2 below.

col_listcol_list data_valuedata_value 시도별 성별 인원수 : 삼척 : 전체Number of genders by province: Samcheok: Total 1,5701,570 시도별 성별 인원수 : 삼척 : 남성Sex number by province: Samcheok: Male 702702 시도별 성별 인원수 : 삼척 : 여성Sex number by province: Samcheok: Female 868868 시도별 성별 인원수 : 삼척 : 남 비율Sex number by province: Samcheok: Male ratio 45%45% 시도별 성별 인원수 : 삼척 : 여 비율Number of gender by province: Samcheok: Female ratio 55%55%

상기 표 2와 같이 질의 매칭 데이터로 변환되면 코드변환모듈(330)을 통해 키워드를 추출하고 질의어 매칭 컬럼 정보는 데이터 분석형 코드로 변환되어 표 3과 같이 생성된다.When converted to query matching data as shown in Table 2, the keyword is extracted through the code conversion module 330, and the query word matching column information is converted into data analysis type codes and generated as shown in Table 3.

col_listcol_list data_valuedata_value GfCcFlD GbHFlD HfBHsBGd : GaEJbA : IbBJhGfCcFlD GbHFlD HfBHsBGd: GaEJbA: IbBJh 1,5701,570 GfCcFlD GbHFlD HfBHsBGd : GaEJbA : BaEGbHGfCcFlD GbHFlD HfBHsBGd: GaEJbA: BaEGbH 702702 GfCcFlD GbHFlD HfBHsBGd : GaEJbA : HlGbHGfCcFlD GbHFlD HfBHsBGd: GaEJbA: HlGbH 868868 GfCcFlD GbHFlD HfBHsBGd : GaEJbA : BaE FfHnDGfCcFlD GbHFlD HfBHsBGd: GaEJbA: BaE FfHnD 45%45% GfCcFlD GbHFlD HfBHsBGd : GaEJbA : Hl FfHnDGfCcFlD GbHFlD HfBHsBGd: GaEJbA: Hl FfHnD 55%55%

상기 표 3과 같이 데이터 분석형 코드로 변환되는 데이터는 변환모듈(340)을 통해 질의 내용 데이터인 "GaEJbA"과, "BaEGbH"과, "HfBHsBGd"와 매칭하여 가장 유사 또는 적합한 답변 컬럼 Value의 중 질의 내용과 가장 적합한 매칭 데이터에 대한 답변 데이터를 추출하며 표 4와 같이 추출되는 것이다.As shown in Table 3, the data converted to the data analysis type code matches the query content data "GaEJbA", "BaEGbH", and "HfBHsBGd" through the conversion module 340, and among the most similar or suitable answer column values It extracts the answer data for the query content and the most suitable matching data and extracts it as shown in Table 4.

col_listcol_list data_valuedata_value GfCcFlD GbHFlD HfBHsBGd : GaEJbA : BaEGbHGfCcFlD GbHFlD HfBHsBGd: GaEJbA: BaEGbH 702702

상기 답변생성모듈(350)은 판단모듈(340)을 통해 질의 내용에 대한 가장 유사 또는 적합한 답변 컬럼 Value인 답변 데이터를 송수신부(200)로 송신한다.The answer generation module 350 transmits the answer data that is the most similar or suitable answer column value for the query content to the transceiver 200 through the determination module 340.

"삼척의 남성 인원수가 궁금합니다."의 질의 내용에 대한 답변 데이터로 "702"가 추출되고 답변생성모듈(350)은 이를 송수신부(200)로 전달하고 송수신부(200)는 출력부(120)를 통해 사용자에게 답변 "702"를 제공하도록 한다."702" is extracted as the answer data for the query content of "Samcheok's male head count." The answer generation module 350 transmits it to the transceiver 200 and the transceiver 200 outputs 120 ) To provide the answer "702" to the user.

또한, 동일한 것이 다수 개일 경우 최장일치된 1개의 답변 데이터를 제공하며, 유사한 정보는 후보로 제시될 수 있는 것이다.In addition, if there are multiple identical items, one longest matched answer data is provided, and similar information can be presented as a candidate.

유사도 산정 기준은 최장일치 1개 이외에 매칭 정확도가 90% 이상으로 나온 후보군이 나열될 수 있으며, 매칭 정확도는 사용자가 설정할 수 있다. In addition to the longest match, the criteria for calculating similarity may include candidate groups with matching accuracy of 90% or higher, and the matching accuracy may be set by the user.

예를 들면, "삼척의 남성 인원수가 궁금합니다."에 대한 답변 데이터가 "702"로 제공되고 유사도에 따라 유사값으로 "2017년 삼척 남성 인원수 : 103명" 또는 "2018년 삼척 남성 인원수 : 200명" 등 유사도 설정값에 따라 더 제공될 수 있는 것이다.For example, the answer data for "I'm curious about the number of males in Samcheok." Is provided as "702", and according to the similarity, "Samcheok male males in 2017: 103 people" or "Samcheok male males in 2018: 200" The similarity may be further provided according to the setting value.

상기 텍스트 변환모듈(310)은 숫자에 대한 텍스트 또는 음성 데이터를 제공받을 때 숫자는 텍스트 입력시 로마숫자로 저장되도록 하고 로마숫자로 저장된 숫자데이터는 한글로 읽혀지는 대로 변환되도록 하는 숫자 변환 모델을 이용하여 로마숫자를 한글 표기법으로 변환시키고 저장되는 것이다.When the text conversion module 310 receives text or voice data for a number, the number conversion model is used so that the number is stored in Roman numerals when text is input and the numeric data stored in Roman numerals is converted as it is read in Korean. Thus, Roman numerals are converted into Korean notation and stored.

또한, 음성으로 인식한 숫자 데이터는 사용자의 음성 인식에 따라 한글로 표기하도록 저장 하는 것이 바람직할 것이다.In addition, it may be desirable to store the numeric data recognized by voice to be written in Korean according to the voice recognition of the user.

예를 들어, '2018년 삼척 남성 인원수"에서 "2018"은 텍스트 변환모듈(310)을 통해 "2018"이라는 로마숫자 표기법에서 "이천십팔"이라는 한글 표기법으로 변환 및 저장되도록 하고 키워드 추출모듈(320)로 텍스트로 변환 및 저장되는 질의 형식 데이터를 명사 위주의 형태소를 분해하여 키워드만 추출하며, 코드변환모듈(330)을 통해 추출된 키워드를 데이터 분석형 코드로 변환하여 질의 내용 데이터와 마찬가지로 데이터베이스(400)에 저장되는 것이다.For example, "2018" in "2018 Samcheok male headcount" allows the text conversion module 310 to convert and store the Roman numeral notation "2018" into the Korean notation of "two thousand eighteen" and extract the keyword (320) ) To extract only the keywords by decomposing the morpheme-oriented morphemes into the query format data that is converted and stored into text, and convert the keywords extracted through the code conversion module 330 into data-analysis-type codes, similar to the query content data database ( 400).

이와 같이 본 발명은 시스템이 보유하고 있는 통계 데이터베이스의 정보를 손쉽게 구조체 데이터로 변환하여 사용자가 자연어 질의를 통해 요청하는 통계 정보에 대하여 통계 값을 신속하게 제공할 수 있어, 기존의 1차 키워드 검색 후 사용자가 스스로 다시 통계 값을 찾아가는 불편함을 없앨 수 있는 효과가 나타나게 된다.As described above, the present invention can easily convert the information in the statistical database possessed by the system into structure data, thereby quickly providing statistical values for statistical information requested by the user through natural language query, after searching for the existing primary keyword. The effect that the user can eliminate the inconvenience of going to the statistical value again by himself will appear.

도 1에 도시된 단말기(100), 입력부(110), 출력부(120), 송수신부(200), 분석부(300), 텍스트 변환모듈(310), 키워드 추출모듈(320), 코드변환모듈(330), 판단모듈(340), 답변생성모듈(350) 및 데이터베이스(400)은 하나 이상의 프로세서 및 그 프로세서와 연결된 컴퓨터 판독 가능 기록 매체를 포함하는 하나 이상의 컴퓨팅 장치 상에서 구현될 수 있다. 컴퓨터 판독 가능 기록 매체는 프로세서의 내부 또는 외부에 있을 수 있고, 잘 알려진 다양한 수단으로 프로세서와 연결될 수 있다. 컴퓨팅 장치 내의 프로세서는 각 컴퓨팅 장치로 하여금 본 명세서에서 기술되는 예시적인 실시예에 따라 동작하도록 할 수 있다. 예를 들어 프로세서는 컴퓨터 판독 가능 기록 매체에 저장된 명령어를 실행할 수 있고, 컴퓨터 판독 가능 기록 매체에 저장된 명령어는 프로세서에 의해 실행되는 경우 컴퓨팅 장치로 하여금 본 명세서에 기술되는 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있는 것이다.The terminal 100 shown in FIG. 1, the input unit 110, the output unit 120, the transmission / reception unit 200, the analysis unit 300, the text conversion module 310, the keyword extraction module 320, and the code conversion module The 330, the determination module 340, the answer generation module 350, and the database 400 may be implemented on one or more computing devices including one or more processors and a computer-readable recording medium connected to the processor. The computer-readable recording medium may be inside or outside the processor, and may be connected to the processor by various well-known means. A processor in the computing device may cause each computing device to operate in accordance with the example embodiments described herein. For example, a processor may execute instructions stored on a computer-readable recording medium, and instructions stored on a computer-readable recording medium may cause the computing device to perform operations according to the exemplary embodiments described herein when executed by the processor. It can be configured to perform.

본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구의 범위에 의하여 나타내어지며, 특허청구의 범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다. 더불어, 상술하는 과정에서 기술된 구성의 작동순서는 반드시 시계열적인 순서대로 수행될 필요는 없으며, 각 구성 및 단계의 수행 순서가 바뀌어도 본 발명의 요지를 충족한다면 이러한 과정은 본 발명의 권리범위에 속할 수 있음은 물론이다.Those of ordinary skill in the art to which the present invention pertains will appreciate that the present invention may be implemented in other specific forms without changing its technical spirit or essential features. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. The scope of the present invention is indicated by the scope of the claims, which will be described later, rather than the detailed description, and all the modified or modified forms derived from the meaning and scope of the claims and their equivalent concepts are included in the scope of the present invention. Should be interpreted. In addition, the operational order of the components described in the above-described process does not necessarily have to be performed in chronological order, and even if the order of performance of each component and step is changed, these processes will fall within the scope of the present invention. Of course it can.

100 : 단말기
200 : 송수신부
300 : 분석부
400 : 데이터베이스100: terminal
200: transceiver
300: analysis unit
400: database

Claims

In the question and answer system including a terminal, a transmitting and receiving unit, an analysis unit, a database,
The terminal includes an input unit through which a user can input query content;
Includes; an output unit for checking the answer to the user's query content,
A transmitting and receiving unit that receives the query content from the terminal and transmits an answer to the received query content to the terminal;
A database in which query format data for inquiring the query is stored, and storing at least one candidate queries and answers corresponding to each of the candidate queries;
The analysis unit is a text conversion module for converting the query content received from the terminal to the text data through the transceiver unit;
A keyword extraction module that extracts only keywords by decomposing the noun-oriented morphemes into the query contents data converted and stored into the text;
A code conversion module for converting the extracted keyword into a data analysis type code;
A judgment module that compares and determines the converted data analysis type code with the query type data code stored in the database to determine and extract an answer to the query; And an answer generation module that transmits answer data for the matched query content through the determination module.

According to claim 1,
The keyword extraction module decomposes the query content data stored through the text conversion module to decompose the query content and the query phrase, deletes the query phrase, decomposes the query contents from which the query phrase has been deleted, and decomposes the morpheme, and removes the investigation, etc. Query answering system characterized in that to extract the above key keywords.

According to claim 1,
The code conversion module is to convert the key keywords extracted from the keyword extraction module into data analysis code,
The above code classifies the key keywords of the query contents into consonants and vowels, and converts the listed vowels and consonants into code numbers given by assigning each code number for consonants and vowels. It is judged that it is possible to classify the consonant used as the corresponding base, and it is converted into a code composed of the corresponding two alphabetic uppercase letters and stored in the database.