KR102174148B1

KR102174148B1 - Speech Recognition Method Determining the Subject of Response in Natural Language Sentences

Info

Publication number: KR102174148B1
Application number: KR1020180154879A
Authority: KR
Inventors: 정민화; 이규환; 김종인; 정지오
Original assignee: 서울대학교산학협력단
Priority date: 2018-12-05
Filing date: 2018-12-05
Publication date: 2020-11-04
Also published as: KR20200068193A

Abstract

본 발명은 키워드 호출을 하지 않고 자연스러운 발화 중 자연언어처리(Natural Language Processing)와 심층 신경망(Deep Neural Network, DNN) 기술을 활용해 자연어 문장에서 응대 여부를 판단하는 음성인식 방법에 관한 것으로, 호출어를 이용하지 않고도 음성인식을 통해 응대 및 비응대 문장을 판별할 수 있는 방법을 제공한다. 토픽별 언어모델을 구성하여 수많은 고유명사나 전문용어를 모두 인식사전에 반영하지 않고도 판별이 가능하다. 토픽별 언어모델을 이용하여 도메인에 특화된 언어모델을 쉽게 구성할 수 있으며, 추후에 보수 및 개선이 가능하고, 메모리의 부담이 적어진다. 서비스영역에 해당하는 토픽정보와 질문 혹은 지시여부에 대한 정보를 함께 얻을 수 있으므로, 응용 서비스 회사에서 시나리오 구성 시에 유용하게 사용될 수 있으며, 토픽 영역도 확장 및 변경 가능하기 때문에 다양한 서비스에 적용할 수 있다.The present invention relates to a speech recognition method for determining whether to respond in a natural language sentence using Natural Language Processing and Deep Neural Network (DNN) technology during natural speech without calling keywords. It provides a method for discriminating responded and non-responsive sentences through voice recognition without using. By configuring language models for each topic, it is possible to discriminate without reflecting all of the numerous proper nouns or terminology in the recognition dictionary. By using the language model for each topic, a domain-specific language model can be easily configured, and maintenance and improvement are possible in the future, and the burden of memory is reduced. Since topic information corresponding to the service area and information on questions or instructions can be obtained together, it can be used usefully when configuring a scenario in an application service company, and can be applied to various services because the topic area can also be expanded and changed. have.

Description

Speech Recognition Method Determining the Subject of Response in Natural Language Sentences

본 발명은 대화형 음성인식 방법에 관한 것으로, 특히 상세하게는 키워드 호출을 하지 않고 자연스러운 발화 중 자연언어처리(Natural Language Processing)와 심층 신경망(Deep Neural Network, DNN) 기술을 활용해 자연어 문장에서 응대 여부를 판단하는 음성인식 방법에 관한 것이다. The present invention relates to an interactive speech recognition method, and in particular, responding to natural language sentences using natural language processing and deep neural network (DNN) technologies during natural speech without calling keywords. It relates to a voice recognition method to determine whether or not.

음성 인식(Speech Recognition)이란 사람이 말하는 음성 언어를 컴퓨터가 해석해 그 내용을 문자 데이터로 전환하는 처리로, 미리 기록해 둔 특정인의 음성 패턴과 비교해 인증용도로 사용하는 화자인식과는 구별되는 기술이다. 정보통신과 자동차 산업이 융합된 텔레매틱스(telematics)나 로봇 등 지능형 기계에서 음성으로 기기를 제어하고 정보를 검색하는데 폭넓게 사용된다. 사용자 범위를 넓힐 수 있도록 다양한 화자들이 발성한 음성을 통계적으로 모델링하여 음향모델을 구성하고, 말뭉치 수집을 통해 언어모델을 구성한다. Speech Recognition is a process in which a computer interprets the speech language spoken by a person and converts the contents into text data. It is a technology that is distinguished from speaker recognition, which is used for authentication by comparing it with the speech pattern of a specific person previously recorded. It is widely used in intelligent machines such as telematics and robots, where information communication and automobile industries are fused, to control devices and retrieve information by voice. In order to expand the user range, an acoustic model is constructed by statistically modeling the voices uttered by various speakers, and a language model is constructed through corpus collection.

'말'을 이용하여 인간과 기계의 대화가 가능하기 위해서는 지능형 기계의 입출력 인터페이스가 음성이어야 하고, 이러한 기계를 음성인식 기기라고도 한다. 음성인식 기기의 음성인식율 정확도가 높아지면서 음성인식기술의 응용서비스도 확대되어 스마트폰의 비서형 음성인식 시스템에서 스피커형 인공지능(Artificial Intelligence)비서를 거쳐 사물인터넷(Internet of Things)의 입력기술로 확장되고 있다. In order to be able to communicate between humans and machines using'words', the input/output interface of intelligent machines must be voice, and these machines are also called voice recognition devices. As the accuracy of the voice recognition rate of voice recognition devices increases, the application service of voice recognition technology has also been expanded.From the secretary-type voice recognition system of a smartphone to a speaker-type artificial intelligence assistant, the input technology of the Internet of Things It is expanding.

종래의 음성인식 기기에서는 "시리"나 "알렉사" 등의 호출어를 이용하여 음성인식 모드를 활성화하거나, 호출어범위를 확대하여 미리 정해진 키워드로 활성화하기도 한다. 이처럼 키워드가 정해져 있는 경우에는 자연스러운 대화 중 발화어를인식하여 기기를 제어할 수 없으므로, 기기의 범용성이 낮아지게 된다.In a conventional voice recognition device, a voice recognition mode is activated using a call word such as "Siri" or "Alexa", or a range of call words is expanded to activate a predetermined keyword. When a keyword is set in this way, since the device cannot be controlled by recognizing the spoken word during natural conversation, the versatility of the device is lowered.

대한민국 공개특허 2014-0073889호는 '대화형 음성인식을 위한 호출어 버퍼링 및 필링 인터페이스'에 관한 것으로, 상기 발명은 사용자가 매번 음성을 입력할 때마다 호출어를 반복하여 입력하지 않더라도 사용자의 자연스러운 대화형 음성입력으로부터 명령어 구문에 대한 음성인식을 수행하여 처리할 수 있도록 하는 기술을 개시한다. Republic of Korea Patent Publication No. 2014-0073889 relates to a'caller buffering and filling interface for interactive voice recognition', and the present invention provides a natural conversation of the user even if the user does not repeatedly input the caller each time the user inputs a voice. Disclosed is a technology that enables speech recognition for command phrases to be processed from type speech input.

그러나, 상기 발명은 사용자가 자신의 목소리로 직접 호출어를 입력하는 과정을 거치고, 호출어와 함께 입력하는 음성파형을 인식해서 기기가 질문힐 때 답변에서 그 파형을 재인식하는 방식이므로 자연어 음성인식 기술이라고 보기는 어렵다. However, the above invention is a natural language speech recognition technology because the user goes through the process of directly inputting a call word with his or her own voice, recognizes the voice waveform input with the call word, and re-recognizes the waveform in the answer when the device asks a question. It is difficult to see.

상기 토픽 분류기 및 상기 의도분류기의 입력 단위는 문자(Character)이며, 각 단어를 n-gram의 문자(character)로 임베딩함.The input unit of the topic classifier and the intention classifier is a character, and each word is embedded with an n-gram character.

대한민국 공개특허 2014-0073889호Republic of Korea Patent Publication No. 2014-0073889

상기와 같은 문제점을 해결하기 위해 본 발명은 키워드 없이 문장식별을 통해 사용자의 발화내용을 판단해서 음성인식 기기가 사용자에게 응대해야하는 내용과 그렇지 않은 내용을 식별해내는 음성인식 방법을 제공하고자 한다.In order to solve the above problems, an object of the present invention is to provide a voice recognition method in which a user's speech content is determined through sentence identification without a keyword, and the content that the voice recognition device should respond to the user and the content that is not.

본 발명은, 자연어 문장에서 응대 여부를 판단하는 음성인식 방법으로:The present invention is a voice recognition method for determining whether to respond in a natural language sentence:

상기 방법은, 사용자가 발화한 음성을 문장 단위로 음성입력 장치에 입력하는 단계; 상기 입력된 음성 문장을 상기 음성입력 장치와 연결된 음성인식기에서 단어별로 인식하는 단계; 상기 음성인식기에서 인식된 단어를 상기 음성인식기와 연결된 토픽분류기에서 미리 정한 클래스의 토픽 및 기타로 분류하는 단계; 상기 분류하는 단계에서 상기 미리 정한 클래스의 토픽으로 분류된 단어가 포함된 문장은 의도분류기로 보내고, 나머지 문장은 응대 대상에서 제외하는 단계; 및 상기 의도분류기에서 입력된 문장을 명령문, 평서문 및 의문문으로 분류하고, 이 중 명령문과 의문문을 응대대상 문장으로 판단하는 단계를 포함하고, 상기 토픽분류기 및 상기 의도분류기는, 자연어처리 툴킷인 Fasttext의 문장분류 알고리즘인 Linear Bag of Words Classifier를 이용하는, 자연어 문장에서 응대 여부를 판단하는 음성인식 방법을 제공한다.The method includes the steps of inputting a voice uttered by a user into a voice input device in sentence units; Recognizing the input voice sentence for each word by a voice recognizer connected to the voice input device; Classifying the words recognized by the speech recognizer into topics and others of a predetermined class by a topic classifier connected to the speech recognizer; Sending a sentence containing words classified as topics of the predetermined class in the classifying step to an intention classifier, and excluding the remaining sentences from a response target; And classifying the sentence inputted in the intention classifier into a command sentence, a plain sentence, and a question sentence, and determining the command sentence and the question sentence as a response target sentence, wherein the topic classifier and the intention classifier are provided by Fasttext, a natural language processing toolkit. Provides a speech recognition method that determines whether or not to respond in natural language sentences using the Linear Bag of Words Classifier, which is a sentence classification algorithm.

본 발명은 또한, 상기 미리 정한 클래스의 토픽은 이메일(email), 주택 제어(house control), 날씨(weather), 및 일정(schedule)이며, 상기 토픽분류기는, 상기 미리 정한 클래스의 토픽에 새로운 토픽을 추가하는 토픽추가부를 더 포함하는, 자연어 문장에서 응대 여부를 판단하는 음성인식 방법을 제공한다.In the present invention, the topic of the predetermined class is email, house control, weather, and schedule, and the topic classifier is a new topic to the topic of the predetermined class. It provides a speech recognition method for determining whether to respond in a natural language sentence further comprising a topic addition unit for adding a.

본 발명은 또한, 상기 토픽분류기는, 단어 데이터베이스를 포함하고, 상기 단어 데이터베이스는 상기 각 토픽별 단어 및 유사단어 데이터를 포함하며, 상기 각 토픽별 단어 및 유사단어 데이터는 미리 정한 기간 단위로 갱신하여 저장하는, 자연어 문장에서 응대 여부를 판단하는 음성인식 방법을 제공한다.In the present invention, the topic classifier includes a word database, the word database includes word and similar word data for each topic, and the word and similar word data for each topic are updated in units of a predetermined period. Provides a voice recognition method for determining whether to respond in a stored, natural language sentence.

본 발명은 또한, 상기 의도분류기는, 문장 데이터베이스를 포함하고, 상기 문장 데이터베이스는, 상기 입력된 문장을 명령문, 평서문 및 의문문으로 분류하기 위한 명령문, 평서문 및 의문문별 문장 데이터를 포함하며, 상기 명령문, 평서문, 및 의문문별 문장 데이터는 미리 정한 기간 단위로 갱신하여 저장하는, 자연어 문장에서 응대 여부를 판단하는 음성인식 방법을 제공한다.In the present invention, the intention classifier further includes a sentence database, and the sentence database includes sentence data for each sentence for classifying the input sentence into an instruction sentence, a plain sentence sentence, and a question sentence, and the sentence sentence, Provides a voice recognition method for determining whether to respond in natural language sentences by updating and storing the plain text and sentence data for each questionnaire in a predetermined period.

본 발명은 또한, 상기 문장 데이터베이스는, 상기 판단하는 단계에 따른 토픽별 답변을 명령문 및 의문문에 응대하는 평서문 문장 데이터로 더 포함하고, 상기 판단하는 단계는, 스피커로 상기 응대하는 평서문 문장을 발화하는 단계를 더 포함하는, 자연어 문장에서 응대 여부를 판단하는 음성인식 방법을 제공한다.In the present invention, the sentence database further includes a topic-specific answer according to the determining step as plain preface sentence data responding to a command sentence and a question, and the determining step includes uttering the responding plain preface sentence by a speaker. It provides a voice recognition method for determining whether to respond in a natural language sentence, further comprising the step.

본 발명은 또한, 사용자가 발화한 음성을 문장 단위로 음성입력 장치에 입력하도록 프로그램된 코드 부분; 상기 입력된 음성 문장을 상기 음성입력 장치와 연결된 음성인식기에서 단어별로 인식하도록 프로그램된 코드 부분; 상기 음성인식기에서 인식된 단어를 상기 음성인식기와 연결된 토픽분류기에서 미리 정한 클래스의 토픽 및 기타로 분류하도록 프로그램된 코드 부분; 상기 분류하는 단계에서 상기 미리 정한 클래스의 토픽으로 분류된 단어가 포함된 문장은 의도분류기로 보내고, 나머지 문장은 응대 대상에서 제외하도록 프로그램된 코드 부분; 및 상기 의도분류기에서 입력된 문장을 명령문, 평서문 및 의문문으로 분류하고, 이 중 명령문과 의문문을 응대대상 문장으로 판단하도록 프로그램된 코드 부분을 포함하고, 상기 토픽분류기 및 상기 의도분류기는 툴킷으로 Fasttext의 Linear Bag of Words Classifier를 이용하는, 자연어 문장에서 응대 여부를 판단하도록 프로그램된 음성인식 컴퓨터 프로그램을 저장하는 컴퓨터 판독 가능 저장매체를 제공한다.The present invention also includes a code portion programmed to input a voice uttered by a user into a voice input device in sentence units; A code portion programmed to recognize the input speech sentence for each word by a speech recognizer connected to the speech input device; A code portion programmed to classify words recognized by the speech recognizer into topics of a predetermined class and others by a topic classifier connected to the speech recognizer; A code portion programmed to send a sentence containing a word classified as a topic of the predetermined class in the classifying step to an intention classifier, and exclude the remaining sentences from the subject of response; And a code portion programmed to classify the sentence input from the intention classifier into a command sentence, a plain sentence, and a question sentence, and among them, a command sentence and a question sentence, and a code portion programmed to determine the sentence to be answered, the topic classifier and the intention classifier as a toolkit. Provides a computer-readable storage medium that stores a speech recognition computer program programmed to determine whether or not to respond in natural language sentences using Linear Bag of Words Classifier.

본 발명은 호출어를 이용하지 않고도 음성인식을 통해 응대 및 비응대 문장을 판별할 수 있는 방법을 제공한다. 토픽별 언어모델을 구성하여 수많은 고유명사나 전문용어를 모두 인식사전에 반영하지 않고도 판별이 가능하다. 토픽별 언어모델을 이용하여 도메인에 특화된 언어모델을 쉽게 구성할 수 있으며, 추후에 보수 및 개선이 가능하고, 메모리의 부담이 적어진다. 서비스영역에 해당하는 토픽정보와 질문 혹은 지시여부에 대한 정보를 함께 얻을 수 있으므로, 응용 서비스 회사에서 시나리오 구성 시에 유용하게 사용될 수 있으며, 토픽 영역도 확장 및 변경 가능하기 때문에 다양한 서비스에 적용할 수 있다.The present invention provides a method capable of discriminating responded and non-responsive sentences through voice recognition without using a pager. By configuring language models for each topic, it is possible to discriminate without reflecting all of the numerous proper nouns or terminology in the recognition dictionary. By using the language model for each topic, a domain-specific language model can be easily configured, and maintenance and improvement are possible in the future, and the burden of memory is reduced. Since topic information corresponding to the service area and information on questions or instructions can be obtained together, it can be used usefully when configuring a scenario in an application service company, and can be applied to various services because the topic area can also be expanded and changed. have.

도 1은 본 발명에 따른 토픽분류기와 의도분류기를 통한 응대 및 비응대 문장 판별방법의 예시적인 구조를 나타낸다.
도 2는 본 발명에 따른 Fasttext에서 문장 분류에 사용하는 알고리즘인 Linear Bag of Words Classifier를 나타낸다.
도 3은 본 발명에 따른 토픽분류기와 의도분류기를 이용한 응대 및 비응대 문장 판별방법의 흐름도를 나타낸다. 1 shows an exemplary structure of a method for discriminating response and non-response sentences through a topic classifier and an intention classifier according to the present invention.
2 shows a Linear Bag of Words Classifier, an algorithm used for sentence classification in Fasttext according to the present invention.
3 is a flowchart illustrating a method for determining responded and non-responsive sentences using a topic classifier and an intention classifier according to the present invention.

다양한 양상이 도면을 참조하여 개시된다. 하기 설명에서는 설명을 목적으로, 하나 이상의 양상의 전반적 이해를 돕기 위해 다수의 구체적인 세부사항이 개시된다. 그러나 이러한 양상은 각각의 구체적인 세부사항 없이도 실행될 수 있다는 점이 인식될 것이다. 이후의 기재 및 첨부된 도면은 하나 이상의 양상에 대한 특정한 예시적인 양상을 상세하게 기술한다. 하지만, 이러한 양상은 예시적인 것이고 다양한 양상의 원리에서 다양한 방법 중 일부가 이용될 수 있으며 기술되는 설명은 그러한 양상 및 그 균등물을 모두 포함하고자 하는 의도이다. Various aspects are disclosed with reference to the drawings. In the following description, for illustrative purposes, a number of specific details are disclosed to aid in an overall understanding of one or more aspects. However, it will be appreciated that this aspect can be implemented without the specific details of each. The following description and the accompanying drawings set forth in detail certain illustrative aspects of one or more aspects. However, these aspects are exemplary and some of the various methods may be used in the principles of the various aspects, and the description described is intended to include all such aspects and their equivalents.

다양한 양상 및 특징이 다수의 장치, 모듈 등을 포함할 수 있는 시스템에 의하여 제시될 것이다. 다양한 시스템이 추가적인 장치, 부품, 구성품 등을 포함할 수 있고 그리고/또는 도면들과 관련하여 논의된 장치, 부품, 구성품 등 모두를 포함할 수 없다는 점 또한 이해되고 인식되어야 한다. Various aspects and features will be presented by a system that may include a number of devices, modules, and the like. It should also be understood and recognized that various systems may include additional devices, parts, components, etc. and/or may not include all of the devices, parts, components, etc. discussed in connection with the drawings.

본 명세서에서 사용되는 "실시례", "예", "양상", "예시" 등은 기술된 임의의 양상 또는 설계가 다른 양상 또는 설계들보다 양호하다거나, 이점이 있는 것으로 해석되지 않아야 한다. 아래에서 사용되는 용어인 '시스템' '서버' 단말기 등은 일반적으로 컴퓨터 관련 실체(computer-related entity)를 의미하며, 예를 들어, 하드웨어, 하드웨어와 소프트웨어의 조합, 소프트웨어를 의미할 수 있다.As used herein, “examples”, “examples”, “aspects”, “examples” and the like should not be construed as having any aspect or design described being better or advantageous than other aspects or designs. The terms used below, such as'system','server' terminal, etc. generally mean a computer-related entity, and may mean, for example, hardware, a combination of hardware and software, and software.

더불어, 용어 "또는"은 배타적 "또는"이 아니라 내포적 "또는"을 의미하는 것으로 의도된다. 즉, 달리 특정되지 않거나 문맥상 명확하지 않은 경우에, "X는 A 또는 B를 이용한다"는 자연적인 내포적 치환 중 하나를 의미하는 것으로 의도된다. 즉, X가 A를 이용하거나; X가 B를 이용하거나; 또는 X가 A 및 B 모두를 이용하는 경우, "X는 A 또는 B를 이용한다"가 상기 경우 어느 것으로도 적용될 수 있다. 또한, 본 명세서에 사용된 "및/또는"이라는 용어는 열거된 관련 항목 중 하나 이상 항목의 가능한 모든 조합을 지칭하고 포함하는 것으로 이해되어야 한다.In addition, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise or is not clear from the context, "X employs A or B" is intended to mean one of the natural inclusive substitutions. That is, X uses A; X uses B; Or, when X uses both A and B, "X uses A or B" may be applied to either of the above cases. In addition, the term "and/or" as used herein should be understood to refer to and include all possible combinations of one or more of the listed related items.

또한, "포함한다" 및/또는 "포함하는"이라는 용어는, 해당 특징, 단계, 동작, 모듈, 및/또는 구성요소가 존재함을 의미하지만, 하나 이상의 다른 특징, 단계, 동작, 모듈, 구성요소, 및/또는 이 그룹의 존재 또는 추가를 배제하지 않는 것으로 이해되어야 한다. 더불어, 본 명세서에서 제1 및 제2 등의 용어가 다양한 구성요소를 설명하기 위해 사용될 수 있지만, 이들 구성요소는 이러한 용어에 의해 한정되지 아니한다. 즉, 이러한 용어는 둘 이상의 구성요소 간의 구별을 위해서 사용될 뿐이고, 순서 또는 우선순위를 의미하는 것으로 해석되지 않아야 한다. 또한, 달리 특정되지 않거나 단수 형태를 지시하는 것으로 문맥상 명확하지 않은 경우에, 본 명세서와 청구범위에서 단수는 일반적으로 "하나 또는 그 이상"을 의미하는 것으로 해석되어야 한다. 이하 첨부된 도면을 참조하여 본 발명의 실시예를 설명한다.In addition, the terms "comprising" and/or "comprising" mean that a corresponding feature, step, action, module, and/or component is present, but one or more other features, steps, actions, modules, components It is to be understood that it does not exclude the presence or addition of elements, and/or this group. In addition, in the present specification, terms such as first and second may be used to describe various elements, but these elements are not limited by these terms. That is, these terms are only used to distinguish between two or more components, and should not be interpreted as meaning order or priority. In addition, unless otherwise specified or when the context is not clear as indicating a singular form, the singular in the specification and claims should be interpreted as meaning "one or more" in general. Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

키워드 호출 없는 로봇 음성인식 기술은 호출어를 이용한 종래의 음성인식 인터페이스와는 다르게 사용자의 발화내용을판단해서 사람과 로봇 사이의 보다 자연스러운 대화 인터페이스를 제공할 수 있다. 뿐만 아니라 기존 시스템은 (호출어 인식 -> 음성 인식 -> 테스크) 수행 이었다면, 본 발명은 (음성인식 -> 테스크) 수행으로 즉각적으로 사용자의 요구를 반영할 수 있다.The robot speech recognition technology without keyword calls can provide a more natural dialogue interface between a person and a robot by judging the contents of a user's speech, unlike a conventional speech recognition interface using a pager. In addition, if the existing system was to perform (calling word recognition -> voice recognition -> task), the present invention can immediately reflect the user's request by performing (voice recognition -> task).

도 1은 본 발명에 따른 토픽분류기와 의도분류기를 통한 응대 및 비응대 문장 판별방법의 예시적인 구조를 나타낸다. 본 발명은 자연어 문장에서 응대 여부를 판단하는 음성인식 방법이다. 본 발명의 일 구현예에서 상기 방법은 사용자가 발화한 음성을 문장 단위로 음성입력 장치에 입력하는 단계; 상기 입력된 음성 문장을 상기 음성입력 장치와 연결된 음성인식기에서 단어별로 인식하는 단계; 상기 음성인식기에서 인식된 단어를 상기 음성인식기와 연결된 토픽분류기에서 미리 정한 클래스의 토픽 및 기타로 분류하는 단계; 상기 분류하는 단계에서 상기 미리 정한 클래스의 토픽으로 분류된 단어가 포함된 문장은 의도분류기로 보내고, 나머지 문장은 응대 대상에서 제외하는 단계; 및 상기 의도분류기에서 입력된 문장을 명령문, 평서문 및 의문문으로 분류하고, 이 중 명령문과 의문문을 응대대상 문장으로 판단하는 단계를 포함한다. 본 발명의 일 구현예에서 상기 토픽분류기 및 상기 의도분류기는 자연어처리 툴킷(tool kit)인 Fasttext의 문장분류 알고리즘인 Linear Bag of Words Classifier를 이용한다. 1 shows an exemplary structure of a method for discriminating response and non-response sentences through a topic classifier and an intention classifier according to the present invention. The present invention is a voice recognition method for determining whether to respond in natural language sentences. In one embodiment of the present invention, the method includes the steps of inputting a voice uttered by a user into a voice input device in units of sentences; Recognizing the input voice sentence for each word by a voice recognizer connected to the voice input device; Classifying the words recognized by the speech recognizer into topics and others of a predetermined class by a topic classifier connected to the speech recognizer; Sending a sentence containing words classified as topics of the predetermined class in the classifying step to an intention classifier, and excluding the remaining sentences from a response target; And classifying the sentence input by the intention classifier into an instruction sentence, a plain sentence sentence, and a question sentence, and determining the sentence sentence and the question sentence as a response target sentence. In one embodiment of the present invention, the topic classifier and the intention classifier use the Linear Bag of Words Classifier, which is a sentence classification algorithm of Fasttext, which is a natural language processing tool kit.

도 2는 본 발명에 따른 Fasttext에서 문장 분류에 사용하는 알고리즘인 Linear Bag of Words Classifier를 나타낸다. 본 발명의 일 구현예에 따른 자연어처리 툴킷인 Fasttext의 문장분류 알고리즘인 Linear Bag of Words Classifier 분석절차는 아래와 같다. 우선 음성인식 결과 문장을 여러 개의 단어로 Tokenization을 하는데, 한 개의 문장은 N개의 Word로 표현되며 각각의 Word는 하나의 Vector로 표현된다. Word에서 vec로 표현되는 과정에서 Distributional Hypothesis를 기반으로, 두 단어의 인접 단어의 분포가 유사하다면, 각 단어는“의미”가 유사하다는 가정을 바탕으로 하여 단어의 의미와 맥락을 고려한다. 이 과정을 통해 각 단어 하나는 하나의 벡터로 표현된다. 2 shows a Linear Bag of Words Classifier, an algorithm used for sentence classification in Fasttext according to the present invention. The analysis procedure of Linear Bag of Words Classifier, a sentence classification algorithm of Fasttext, a natural language processing toolkit according to an embodiment of the present invention, is as follows. First, the speech recognition result sentence is tokenized into several words. One sentence is expressed as N words, and each word is expressed as a vector. In the process of being expressed as vec in Word, based on Distributional Hypothesis, if the distribution of adjacent words of two words is similar, each word considers the meaning and context of the word based on the assumption that “meaning” is similar. Through this process, each word is represented as a vector.

하나의 문장은 N개의 word vector로 표현되는데, 이를 평균화(averaging)하여 Document Vector를 만들고, 이 Document Vector를 감춰진 층(hidden layer) 한 층을 포함한 선형 분류기(linear classifier)에 넣고 이를 통해, 문장 분류를 각각의 클래스로 분류한다. 본 발명의 일 구현예에서, 토픽 분류기와 의도 분류기는 Fasttext의 Linear Bag of Words Classifier를 이용하여 만들어진다. 본 발명의 일 구현예에서 상기 미리 정한 클래스의 토픽은 이메일(email), 주택 제어(house control), 날씨(weather), 및 일정(schedule)이며, 상기 토픽분류기는, 상기 미리 정한 클래스의 토픽에 새로운 토픽을 추가하는 토픽추가부를 더 포함한다. 토픽 분류 모델은 대용량 음성인식 시스템에서 해당 서비스별 분산 언어모델 구성을 가능하게 한다. 최근에 음성 인식기 기술이 많은 분야에 적용되고 있으나, 수많은 고유명사나 전문용어들을 모두 인식 사전에 반영하기에는 리소스(메모리) 한계가 있다. 토픽 별 언어모델을 구성하여 토픽 분류의 결과에 따라 해당 언어모델(탐색 네트워크)을 탐색하면 인식 성능 향상도 기대할 수 있고 결과적으로 서비스 질 향상도 기대할 수 있다. 본 발명의 일 구현예에서 상기 토픽분류기는, 단어 데이터베이스를 포함하고, 상기 단어 데이터베이스는 상기 각 토픽별 단어 및 유사단어 데이터를 포함하며, 상기 각 토픽별 단어 및 유사단어 데이터는 미리 정한 기간 단위로 갱신하여 저장한다. One sentence is expressed as N word vectors, averaging them to create a document vector, and putting this document vector in a linear classifier including one layer of hidden layer and classifying sentences through this Classify into each class. In one embodiment of the present invention, a topic classifier and an intention classifier are made using Fasttext's Linear Bag of Words Classifier. In one embodiment of the present invention, the topic of the predetermined class is email, house control, weather, and schedule, and the topic classifier is applied to the topic of the predetermined class. It further includes a topic addition unit for adding a new topic. The topic classification model makes it possible to construct a distributed language model for each service in a large-capacity speech recognition system. Recently, speech recognizer technology has been applied to many fields, but there is a resource (memory) limitation in reflecting all of the numerous proper nouns or terminology in the recognition dictionary. If a language model for each topic is configured and the corresponding language model (search network) is searched according to the results of topic classification, recognition performance can be improved, and service quality can be improved as a result. In one embodiment of the present invention, the topic classifier includes a word database, the word database includes word and similar word data for each topic, and the word and similar word data for each topic are in units of a predetermined period. Update and save.

본 발명의 일 구현예에서 상기 의도분류기는, 문장 데이터베이스를 포함하고, 상기 문장 데이터베이스는, 상기 입력된 문장을 명령문, 평서문 및 의문문으로 분류하기 위한 명령문, 평서문 및 의문문별 문장 데이터를 포함하며, 상기 명령문, 평서문, 및 의문문별 문장 데이터는 미리 정한 기간 단위로 갱신하여 저장한다. 본 발명은 2 pass 방식으로 응대/비응대를 식별하도록 구성되는데 토픽 분류기와 의도 분류기를 순차적으로 적용한다. 따라서 응대가 되었을 경우에 토픽(서비스 영역)정보와 의도 정보를 함께 얻을 수 있다. 이 정보는 응용 서비스 회사에서 시나리오 구성 시에 유용하게 사용될 수 있고 토픽 영역도 확장 및 변경 가능하기 때문에 다양한 서비스에 적용할 수 있다.In one embodiment of the present invention, the intention classifier includes a sentence database, and the sentence database includes sentence data for each sentence, plain sentence, and question sentence for classifying the input sentence into an instruction sentence, a plain sentence sentence, and a question sentence. The sentence data for each statement, plain preface, and question is updated and stored in units of a predetermined period. The present invention is configured to identify responding/non-corresponding to a 2-pass method, and a topic classifier and an intention classifier are sequentially applied. Therefore, when a response is received, topic (service area) information and intention information can be obtained together. This information can be usefully used when configuring a scenario in an application service company, and can be applied to various services because the topic area can be expanded and changed.

도 3은 본 발명에 따른 토픽분류기와 의도분류기를 이용한 응대 및 비응대 문장 판별방법의 흐름도를 나타낸다. 본 발명에서 제안하는 문장 식별 알고리즘을 순서도로 표현하면 아래와 같다. 본 발명의 일 구현예에서는 사용자가 로봇에게 질의 명령하는 경우에만 음성인식을 수행할 수 있도록 응대 및 비응대 분류기를 설계한다. 호출어 없이 응대 문장을 분류하기 위해 토픽 분류기와 의도 분류기를 구분한 Two-Pass Cascade 형태의 분류기를 이용하여 문장의 응대 여부를 판단한다. 본 발명의 일 구현예에서 사용자에 의해 발화된 문장이 토픽에 포함되면, 의도 분류기에 의해 요구, 의문형으로 분류된 문장만을 응대로 간주한다. 본 발명에서 제시하는 기법은 로봇 HW에 내장할 수 있는 키워드 호출 없는 자연어 대화음성인식 기술이다. 의미기반의 문장식별 알고리즘을 이용하여, 로봇이 사용자의 발화 문장에 대해 응대/비응대 문장인지를 자동으로 판별하고자 함이다. 이를 달성하기 위해서, 토픽 분류 모델과 의도 분류 모델을 이용하였다. 즉 음성인식 문장들을 토픽 분류기에서 서비스 종류를 분류하고, 의도 분류기에서 명령, 요구에 해당되는 문장만을 분류해 내어 응대/비응대 여부를 판단하는 시스템이다. 본 발명의 일 구현예에서, 상기 문장 데이터베이스는, 상기 판단하는 단계에 따른 토픽별 답변을 명령문 및 의문문에 응대하는 평서문 문장 데이터로 더 포함하고, 상기 판단하는 단계는, 스피커로 상기 응대하는 평서문 문장을 발화하는 단계를 더 포함한다. 본 발명의 일 구현예에서, 응대 및 비응대 문장을 구별하기 위해서 토픽 분류 모델과 의도 분류 모델을 설계하였다. 3 is a flowchart illustrating a method for determining responded and non-responsive sentences using a topic classifier and an intention classifier according to the present invention. The sentence identification algorithm proposed in the present invention is expressed in a flow chart as follows. In one embodiment of the present invention, a response and non-response classifier is designed so that voice recognition can be performed only when a user makes a query command to a robot. In order to classify the responding sentences without a caller, a two-pass cascade type classifier that separates the topic classifier and the intention classifier is used to determine whether the sentence responds. In one embodiment of the present invention, when a sentence uttered by a user is included in a topic, only sentences classified as requested or questionable by the intention classifier are considered as responses. The technique proposed in the present invention is a natural language conversational speech recognition technique without keyword calls that can be embedded in a robot HW. The purpose is to automatically determine whether the robot responds or does not respond to the user's uttered sentence by using a meaning-based sentence identification algorithm. To achieve this, a topic classification model and an intention classification model were used. In other words, it is a system that classifies the service type of speech recognition sentences in the topic classifier, and classifies only the sentences corresponding to commands and requests in the intention classifier to determine response/non-response status. In one embodiment of the present invention, the sentence database further includes a topic-specific answer according to the determining step as plain preface sentence data corresponding to a command sentence and a question, and the determining step includes the plain preface sentence corresponding to the speaker It further includes the step of igniting. In one embodiment of the present invention, a topic classification model and an intention classification model are designed to distinguish responding and non-corresponding sentences.

토픽 분류 모델은 사용자가 문장을 발화 할 경우에 어떤 토픽(서비스)에 해당하는지 분류해 준다. 일반적인 가정용 AI 스피커 상황을 가정했을 때, 스피커가 수행할 수 있는 특정한 업무들이 존재한다. 이러한 업무들은 일반적으로 토픽으로 분류될 수 있다. 사용자가 발화한 문장은 특정한 토픽에서 특정한 업무를 수행해줄 것을 요구한다. 예를 들면, 사용자가 발화한 문장은 이메일, 스케줄, 하우스 컨트롤, 날씨 등으로 AI 스피커가 수행할 수 있는 문장을 토픽 별로 분류할 수 있다. 만약 문장이 해당 토픽에 포함되지 않는다면, 비응대로 간주한다. 또한 문장이 토픽 내로 분류될 경우, 의도 분류기를 통해 응대 비응대 여부를 판단한다. 본 발명의 일 구현예에서 의도 분류 모델은 크게 명령문, 평서문, 의문문으로 구분한다. 토픽 분류기에서 토픽으로 분류된 문장만을 의도 분류기에 인풋으로 넣어 명령문, 평서문, 의문문 중 명령문, 의문문에 해당하는 문장만을 응대로 분류한다. The topic classification model classifies which topic (service) corresponds to when a user utters a sentence. Assuming a typical home AI speaker situation, there are specific tasks a speaker can perform. These tasks can generally be categorized into topics. The sentences uttered by the user require specific tasks to be performed on a specific topic. For example, the sentences uttered by the user may be classified by topics such as email, schedule, house control, weather, etc., which the AI speaker can execute. If a sentence is not included in the topic, it is considered non-responsive. In addition, when a sentence is classified within a topic, it is determined whether a response is non-response or not through an intention classifier. In one embodiment of the present invention, the intention classification model is largely divided into a command sentence, a plain sentence sentence, and a question sentence. In the topic classifier, only sentences classified as topics are input into the intention classifier, and only sentences corresponding to the command sentence and the question sentence are classified accordingly.

본 발명의 일 구현예에서 상기 방법은 프로그램된 컴퓨터 코드로 구현되어 컴퓨터 판독가능한 저장매체에 저장될 수 있다. 즉, 자연어 문장에서 응대 여부를 판단하도록 프로그램된 음성인식 컴퓨터 프로그램을 저장하는 컴퓨터 판독 가능 저장매체로 상기 저장매체는, 사용자가 발화한 음성을 문장 단위로 음성입력 장치에 입력하도록 프로그램된 코드 부분; 상기 입력된 음성 문장을 상기 음성입력 장치와 연결된 음성인식기에서 단어별로 인식하도록 프로그램된 코드 부분; 상기 음성인식기에서 인식된 단어를 상기 음성인식기와 연결된 토픽분류기에서 미리 정한 클래스의 토픽 및 기타로 분류하도록 프로그램된 코드 부분; 상기 분류하는 단계에서 상기 미리 정한 클래스의 토픽으로 분류된 단어가 포함된 문장은 의도분류기로 보내고, 나머지 문장은 응대 대상에서 제외하도록 프로그램된 코드 부분; 및 상기 의도분류기에서 입력된 문장을 명령문, 평서문 및 의문문으로 분류하고, 이 중 명령문과 의문문을 응대대상 문장으로 판단하도록 프로그램된 코드 부분을 포함한다. 본 발명의 일 구현예에서 상기 토픽분류기 및 상기 의도분류기는 툴킷 Fasttext의 문장분류 알고리즘 Linear Bag of Words Classifier를 이용한다. In one embodiment of the present invention, the method may be implemented as programmed computer code and stored in a computer-readable storage medium. That is, a computer-readable storage medium that stores a speech recognition computer program programmed to determine whether or not to respond in a natural language sentence, the storage medium comprising: a code portion programmed to input a speech uttered by a user into a speech input device in units of sentences; A code portion programmed to recognize the input speech sentence for each word by a speech recognizer connected to the speech input device; A code portion programmed to classify words recognized by the speech recognizer into topics of a predetermined class and others by a topic classifier connected to the speech recognizer; A code portion programmed to send a sentence containing a word classified as a topic of the predetermined class in the classifying step to an intention classifier, and exclude the remaining sentences from the subject of response; And a code portion programmed to classify the sentence input by the intention classifier into an instruction sentence, a plain sentence sentence, and a question sentence, among which the instruction sentence and the question sentence are judged as a response target sentence. In one embodiment of the present invention, the topic classifier and the intention classifier use a sentence classification algorithm Linear Bag of Words Classifier of toolkit Fasttext.

이상 살펴본 바와 같이 본 발명은 자연어 문장에서 응대 여부를 판단하는 음성인식 방법에 관한 것이다. 이 발명은 예를 들어 자동차분야에서 주행 중에 즉각적으로 사용자의 요구를 반영하는데 응용될 수 있으며, 홈 오토메이션 분야의 사물인터넷(Internet of Things) 환경에서 사용자 인터페이스 편의성 증가에 응용가능하고, 인공지능 비서 응용에서는 스마트 스피커 또는 로봇의 사용자 인터페이스 편의성 증가에 활용될 수 있다. As described above, the present invention relates to a speech recognition method for determining whether to respond in a natural language sentence. This invention can be applied to immediately reflect the user's needs while driving in the automotive field, for example, and can be applied to increase user interface convenience in the Internet of Things environment in the home automation field, and applied to an artificial intelligence assistant. Can be used to increase the convenience of a smart speaker or a user interface of a robot.

여기에 설명되는 다양한 실시예는 예를 들어, 소프트웨어, 하드웨어 또는 이들의 조합된 것을 이용하여 컴퓨터 또는 이와 유사한 장치로 읽을 수 있는 매체 내에서 구현될 수 있다.Various embodiments described herein may be implemented in a medium that can be read by a computer or a similar device using, for example, software, hardware, or a combination thereof.

하드웨어적인 구현에 의하면, 여기에 설명되는 실시예는 ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays, 프로세서(processors), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 전기적인 유닛 중 적어도 하나를 이용하여 구현될 수 있다. 일부의 경우에 본 명세서에서 설명되는 실시예들이 관리서버 및/또는 시스템 자체로 구현될 수 있다.According to hardware implementation, the embodiments described herein include application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), It may be implemented using at least one of processors, controllers, micro-controllers, microprocessors, and electrical units for performing other functions, in some cases herein. The described embodiments may be implemented as a management server and/or the system itself.

소프트웨어적인 구현에 의하면, 본 명세서에서 설명되는 절차 및 기능과 같은 실시예들은 별도의 소프트웨어 모듈들로 구현될 수 있다. 상기 소프트웨어 모듈들 각각은 본 명세서에서 설명되는 하나 이상의 기능 및 작동을 수행할 수 있다. 적절한 프로그램 언어로 씌여진 소프트웨어 어플리케이션으로 소프트웨어 코드가 구현될 수 있다. 상기 소프트웨어 코드는 관리서버 및/또는 데이터베이스에 저장되고, 앱에 의해 실행될 수 있다.According to the software implementation, embodiments such as procedures and functions described in the present specification may be implemented as separate software modules. Each of the software modules may perform one or more functions and operations described herein. The software code can be implemented as a software application written in an appropriate programming language. The software code may be stored in a management server and/or a database, and executed by an app.

한편, 여기서 제시된 다양한 실시예들은 방법, 장치, 또는 표준 프로그래밍 및/또는 엔지니어링 기술을 사용한 제조 물품(article)으로 구현될 수 있다. 용어 "제조 물품"은 임의의 컴퓨터 판독가능한 장치로부터 액세스 가능한 컴퓨터 프로그램, 캐리어, 또는 매체(media)를 포함한다. 예를 들어, 컴퓨터 판독가능한 매체는 자기 저장 장치(예를 들면, 하드 디스크, 플로피 디스크, 자기 스트립, 등), 광학 디스크(예를 들면, CD, DVD, 등), 스마트 카드, 및 플래쉬 메모리 장치(예를 들면, EEPROM, 카드, 스틱, 키 드라이브, 등)를 포함하지만, 이들로 제한되는 것은 아니다. 또한, 여기서 제시되는 다양한 저장 매체는 정보를 저장하기 위한 하나 이상의 장치 및/또는 다른 기계-판독가능한 매체를 포함한다. 용어 "기계-판독가능한 매체"는 명령(들) 및/또는 데이터를 저장, 보유, 및/또는 전달할 수 있는 무선 채널 및 다양한 다른 매체를 포함하지만, 이들로 제한되는 것은 아니다. Meanwhile, the various embodiments presented herein may be implemented as a method, an apparatus, or an article of manufacture using standard programming and/or engineering techniques. The term “article of manufacture” includes a computer program, carrier, or media accessible from any computer readable device. For example, computer-readable media include magnetic storage devices (e.g., hard disks, floppy disks, magnetic strips, etc.), optical disks (e.g., CD, DVD, etc.), smart cards, and flash memory devices. (E.g. EEPROM, card, stick, key drive, etc.), but is not limited to these. In addition, the various storage media presented herein include one or more devices and/or other machine-readable media for storing information. The term “machine-readable medium” includes, but is not limited to, wireless channels and various other media capable of storing, holding, and/or transmitting instruction(s) and/or data.

제시된 실시예들에 대한 설명은 임의의 본 발명의 기술 분야에서 통상의 지식을 가진 자가 본 발명을 이용하거나 또는 실시할 수 있도록 제공된다. 이러한 실시예들에 대한 다양한 변형들은 본 발명의 기술 분야에서 통상의 지식을 가진 자에게 명백할 것이며, 여기에 정의된 일반적인 원리들은 본 발명의 범위를 벗어남이 없이 다른 실시예들에 적용될 수 있다. 그리하여, 본 발명은 여기에 제시된 실시예들로 한정되는 것이 아니라, 여기에 제시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위에서 해석되어야 할 것이다. The description of the presented embodiments is provided to enable any person skilled in the art to use or implement the present invention. Various modifications to these embodiments will be apparent to those of ordinary skill in the art, and the general principles defined herein can be applied to other embodiments without departing from the scope of the present invention. Thus, the present invention is not to be limited to the embodiments presented herein, but is to be construed in the widest scope consistent with the principles and novel features presented herein.

Claims

As a voice recognition method that determines whether or not to respond in natural language sentences:
The method includes the steps of inputting a voice uttered by a user into a voice input device in sentence units;
Recognizing the input voice sentence for each word by a voice recognizer connected to the voice input device;
Classifying the words recognized by the speech recognizer into topics and others of a predetermined class by a topic classifier connected to the speech recognizer;
Sending a sentence containing words classified as topics of the predetermined class in the classifying step to an intention classifier, and excluding the remaining sentences from a response target; And
Including the step of classifying the command from the sentence input in the intention classifier, and determining the command as a response target sentence,
The topic classifier and the intention classifier use Linear Bag of Words Classifier, a sentence classification algorithm of Fasttext, a natural language processing toolkit,
The intention classifier includes a sentence database,
The sentence database includes sentence sentence data for classifying a sentence from the input sentence, and the sentence sentence data is updated and stored in units of a predetermined period,
A voice recognition method that determines whether or not to respond in natural language sentences.

The method of claim 1,
The topics of the predetermined class are email, house control, weather, and schedule,
The topic classifier further comprises a topic adding unit for adding a new topic to the topic of the predetermined class,
A voice recognition method that determines whether or not to respond in natural language sentences.

The method of claim 2,
The topic classifier includes a word database,
The word database includes word and similar word data for each topic, and the word and similar word data for each topic is updated and stored in units of a predetermined period,
A voice recognition method that determines whether or not to respond in natural language sentences.

delete

The method of claim 1,
The sentence database further includes an answer for each topic according to the determining step as plain preface sentence data responding to the command sentence and the question,
The determining step further comprises the step of uttering the responding plain preface sentence with a speaker,
A voice recognition method that determines whether or not to respond in natural language sentences.

A code portion programmed to input the voice uttered by the user into the voice input device in sentence units;
A code portion programmed to recognize the input speech sentence for each word by a speech recognizer connected to the speech input device;
A code portion programmed to classify words recognized by the speech recognizer into topics of a predetermined class and others by a topic classifier connected to the speech recognizer;
A code portion programmed to send a sentence containing a word classified as a topic of the predetermined class in the classifying step to an intention classifier, and exclude the remaining sentences from the subject of response; And
Including a code portion programmed to classify the statement from the sentence input in the intention classifier, and determine the statement as a response target sentence,
The topic classifier and the intention classifier use Fasttext's Linear Bag of Words Classifier as a toolkit,
A computer-readable storage medium storing a speech recognition computer program programmed to determine whether to respond in natural language sentences.