KR102279319B1

KR102279319B1 - Audio analysis device and control method thereof

Info

Publication number: KR102279319B1
Application number: KR1020190048687A
Authority: KR
Inventors: 장석현; 김성왕
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2019-04-25
Filing date: 2019-04-25
Publication date: 2021-07-19
Also published as: KR20200125034A

Abstract

본 발명은, 음성 입력에 대한 사용자 발화 의도를 판단하기에 앞서, 음성 입력을 실제 대화 시 대표되는 문장 구조들 중 하나로 구분한 후 문장 구조에 맞는 방식으로 음성 입력의 텍스트 분석 및 사용자 발화 의도를 판단하는 하는 새로운 형태의 대화형 음성 인터페이스를 실현하는 기술을 개시하고 있다.According to the present invention, before determining the user's intention to speak with respect to the voice input, the voice input is divided into one of the sentence structures represented during actual conversation, and then text analysis of the voice input and the user's utterance intention are determined in a manner suitable for the sentence structure. Haha discloses a technology for realizing a new type of interactive voice interface.

Description

Voice analysis device and operation method of voice analysis device {AUDIO ANALYSIS DEVICE AND CONTROL METHOD THEREOF}

본 발명은, 음성을 이용하여 기기를 제어하는 음성 인터페이스에 관한 것으로, 더욱 상세하게는 단순 명령형의 음성 인터페이스 수준에서 벗어나 대화형의 음성 인터페이스를 가능하게 하는 기술에 관한 것이다.The present invention relates to a voice interface for controlling a device using voice, and more particularly, to a technology for enabling an interactive voice interface beyond the level of a simple command-type voice interface.

사용자가 발화하는 음성을 이용하여 기기를 제어하는 음성 인터페이스 기술이 발전하고 있으며, 이러한 발전에 힘 입어 최근 가정에는 음성 인터페이스를 기반으로 가정 내 기기를 제어할 수 있는 음성 기반 서비스장치가 설치 및 이용되기도 한다.Voice interface technology for controlling devices using the voice spoken by the user is developing, and thanks to this development, voice-based service devices that can control devices in the home based on the voice interface are recently installed and used in homes. do.

음성 인터페이스를 기반으로 가정 내 기기를 제어하는 음성 기반 서비스를 간단히 설명하면, 사용자가 음성으로 명령(입력)하면, 음성 기반 서비스장치는 사용자 발화에 따른 음성 입력을 수신하여 텍스트로 변환한 후, 텍스트에 대한 언어 분석을 통해 사용자 음성 입력의 의도를 파악하고, 그에 따른 제어 동작을 수행하는 방식이다.Briefly, a voice-based service for controlling in-home devices based on a voice interface is described. When a user commands (inputs) with a voice, the voice-based service device receives a voice input according to the user's utterance, converts it into text, and then It is a method of identifying the intention of the user's voice input through language analysis and performing a control operation accordingly.

헌데, 현재의 음성 인터페이스의 경우, 1회 발화 및 1개 명령으로 제한되는 단순 명령형의 음성 인터페이스 수준에 그치고 있다.However, in the case of the current voice interface, it is limited to the level of a simple command type voice interface limited to one utterance and one command.

이에, 현재의 음성 인터페이스 기반으로 동작하는 음성 기반 서비스에서는, 사용자가 음성을 발화하는 음성 입력 중 명령을 바꾸고(수정) 싶은 경우가 있더라도 수정할 수 없으며, 이를 수정하기 위해서는 처음부터 다시 발화하여 새롭게 음성으로 명령해야만 제약이 있다.Therefore, in the voice-based service operating based on the current voice interface, even if the user wants to change (modify) the command during the voice input that utters the voice, it cannot be modified. There are restrictions only on orders.

또한, 현재의 음성 인터페이스 기반으로 동작하는 음성 기반 서비스에서는, 사용자가 음성으로 2 가지 이상의 명령을 한번에 하고 싶은 경우, 1회 발화의 음성 입력으로 2가지 이상 동작을 명령할 수 없기 때문에, 음성 입력으로 진입하기 위한 진입 동작(예: Wake up 음성 명령 또는 Mic 버튼)을 각기 수행하여 2번 이상 발화로 각기 음성 입력을 해야만 하는 불편함이 있다.In addition, in the voice-based service operating based on the current voice interface, if the user wants to use two or more commands at once by voice, since it is not possible to command two or more operations with a voice input of a single utterance, the voice input is used. There is an inconvenience in that each entry operation (eg, a wake up voice command or a Mic button) to enter is performed and each voice input has to be made through utterances more than two times.

또한, 현재의 음성 인터페이스 기반으로 동작하는 음성 기반 서비스에서는, 발화 시작시점부터 발화 종료시점까지의 음성을 음성 입력으로 수신하기 때문에, 사용자가 음성을 발화하여 명령하는 중 잠시 멈추고 싶은 경우, 연속적인 음성 발화가 유지되지 못하므로 정확한 음성 입력이 불가능하여, 음성 인터페이스의 자유도가 낮다는 단점이 있다.In addition, in the voice-based service operating based on the current voice interface, since the voice from the start of the utterance to the end of the utterance is received as a voice input, if the user wants to pause while uttering and commanding the voice, the continuous voice Since the utterance cannot be maintained, accurate voice input is impossible, and thus there is a disadvantage in that the degree of freedom of the voice interface is low.

이에, 본 발명에서는, 전술의 제약/불편함/단점들 즉 기존 음성 인터페이스가 갖는 문제점들을 해결할 수 있는, 대화형의 음성 인터페이스를 실현하고자 한다. Accordingly, the present invention intends to realize an interactive voice interface that can solve the above-mentioned limitations/inconveniences/disadvantages, that is, problems with the existing voice interface.

본 발명은 상기한 사정을 감안하여 창출된 것으로서, 본 발명에서 해결하고자 하는 과제는, 1회 발화 및 1개 명령으로 제한되는 단순 명령형의 기존 음성 인터페이스 수준에서 벗어나, 대화형의 음성 인터페이스를 실현하고자 한다.The present invention was created in view of the above circumstances, and the problem to be solved by the present invention is to realize an interactive voice interface, away from the level of the existing simple command type voice interface limited to one utterance and one command. do.

상기 목적을 달성하기 위한 본 발명의 제 1 관점에 따른 음성분석장치는, 사용자에 의해 발화되는 음성 입력을 수신하는 음성수신부; 상기 수신된 음성 입력을 분석하여 상기 음성 입력의 문장 구조를 구분하는 문장구조구분부; 상기 구분된 문장 구조에 따라 상기 음성 입력에 대한 발화 의도를 판단하는 발화의도판단부를 포함한다.According to a first aspect of the present invention, there is provided a voice analysis apparatus, comprising: a voice receiver configured to receive a voice input uttered by a user; a sentence structure classification unit for analyzing the received voice input to classify a sentence structure of the voice input; and a speech intention determining unit configured to determine a speech intention with respect to the voice input according to the divided sentence structure.

구체적으로, 상기 문장구조구분부는, 문장 구조 판단을 위해 기 정의된 특정 단어가 상기 음성 입력에 포함되는 경우, 상기 음성 입력의 문장 구조를 상기 특정 단어와 매칭되는 문장 구조로 구분할 수 있다. Specifically, when a specific word predefined for determining the sentence structure is included in the voice input, the sentence structure classification unit may classify the sentence structure of the voice input into a sentence structure matching the specific word.

구체적으로, 상기 문장구조구분부는, 문장 구조 판단을 위해 기 정의된 특정 단어가 상기 음성 입력에 포함되는 경우, 문장 구조 판단을 위해 기 저장된 특정 명칭정보와 매칭된다면 상기 음성 입력의 문장 구조를 단일 의도의 처리를 명령하는 제1 문장 구조로 구분할 수 있다.Specifically, when a specific word predefined for sentence structure determination is included in the voice input, the sentence structure classification unit sets the sentence structure of the voice input as a single intention if it matches with specific name information stored in advance to determine the sentence structure. It can be divided into a first sentence structure that commands the processing of .

구체적으로, 상기 문장구조구분부는, 상기 특정 단어가 상기 음성 입력에 포함되지 않는 경우, 상기 음성 입력의 문장 구조를 단일 의도의 처리를 명령하는 제1 문장 구조로 구분할 수 있다.Specifically, when the specific word is not included in the voice input, the sentence structure classification unit may divide the sentence structure of the voice input into a first sentence structure for instructing processing of a single intention.

구체적으로, 상기 음성 입력의 문장 구조는, 단일 의도의 처리를 명령하는 문장으로 이루어지는 제1 문장 구조, 각 의도의 처리를 명령하는 2 이상의 문장이 특정 단어로 연결되는 제2 문장 구조, 특정 단어로 연결되는 2 이상의 문장 중, 상기 특정 단어 이후의 일부 문장의 의도 처리 만을 명령하는 제3 문장 구조, 문장의 가장 마지막에 특정 단어가 연결되어, 상기 문장에 따른 의도 처리 대기를 명령하는 제4 문장 구조 중 적어도 어느 하나로 구분될 수 있다.Specifically, the sentence structure of the voice input includes a first sentence structure consisting of a sentence instructing processing of a single intention, a second sentence structure in which two or more sentences instructing processing of each intention are connected to a specific word, and a specific word. A third sentence structure for commanding only intention processing of some sentences after the specific word among two or more connected sentences, and a fourth sentence structure for commanding to wait for intention processing according to the sentence by connecting a specific word to the end of the sentence It can be divided into at least any one of.

구체적으로, 상기 발화의도판단부는, 상기 음성 입력의 문장 구조를 상기 제2 또는 상기 제3 문장 구조로 구분한 경우, 상기 음성 입력 전체 및 상기 음성 입력 전체에서 상기 특정 단어를 기준으로 구분되는 일부 음성 입력을 이용하여, 상기 음성 입력의 문장 구조를 구분한 결과에 대한 활용 여부를 판단할 수 있다. Specifically, when the sentence structure of the voice input is divided into the second or the third sentence structure, the speech intention determining unit may be configured to divide the entire voice input and the entire voice input based on the specific word. Using the voice input, it is possible to determine whether to use the result of classifying the sentence structure of the voice input.

구체적으로, 상기 발화의도판단부는, 상기 음성 입력의 문장 구조를 구분한 결과를 활용하는 것으로 판단하면, 상기 제2 문장 구조로 구분한 경우 상기 음성 입력 전체 중 상기 특정 단어로 구분되는 2 이상의 문장 각각을 순차적으로 분석하여 상기 음성 입력의 문장 별로 발화 의도를 판단하고, 상기 제3 문장 구조로 구분한 경우 상기 음성 입력 전체 중 상기 특정 단어 이후에 연결된 일부 문장 만을 분석하여 상기 음성 입력의 발화 의도를 판단할 수 있다. Specifically, if the speech intention determining unit determines that the result of dividing the sentence structure of the voice input is used, two or more sentences divided by the specific word in the entire voice input when the sentence structure is divided into the second sentence structure Each is sequentially analyzed to determine the utterance intention for each sentence of the voice input, and when divided into the third sentence structure, only some sentences connected after the specific word among the entire voice input are analyzed to determine the utterance intention of the voice input. can judge

구체적으로, 상기 발화의도판단부는, 상기 음성 입력의 문장 구조를 구분한 결과를 활용하지 않는 것으로 판단하면, 상기 음성 입력 전체를 분석하여 상기 음성 입력의 발화 의도를 판단할 수 있다. Specifically, if it is determined that the result of classifying the sentence structure of the voice input is not used, the utterance intention determining unit may analyze the entire voice input to determine the utterance intention of the voice input.

구체적으로, 상기 음성수신부는, 사용자에 의한 발화 시작시점부터 발화 종료시점까지 수신되는 음성을 상기 음성 입력으로서 수신하며, 상기 발화의도판단부는, 상기 음성 입력의 문장 구조를 상기 제4 문장 구조로 구분한 경우, 상기 음성 입력에 대한 발화 의도 판단을 대기하며 상기 음성수신부로 하여금 상기 발화 종료시점 이후에도 음성 수신을 대기하도록 할 수 있다. Specifically, the voice receiving unit receives, as the voice input, a voice received by the user from the start of the utterance to the end of the utterance, and the utterance intention determining unit converts the sentence structure of the voice input into the fourth sentence structure. In the case of classification, it is possible to wait for determination of the utterance intention for the voice input and to allow the voice receiver to wait for voice reception even after the utterance end point.

상기 목적을 달성하기 위한 본 발명의 제 2 관점에 따른 음성분석장치의 동작 방법, 사용자에 의해 발화되는 음성 입력을 수신하는 음성수신단계; 상기 수신된 음성 입력을 분석하여 상기 음성 입력의 문장 구조를 구분하는 문장구조구분단계; 및 상기 구분된 문장 구조에 따라 상기 음성 입력에 대한 발화 의도를 판단하는 발화의도판단단계를 포함한다.According to a second aspect of the present invention for achieving the above object, there is provided a method of operating a voice analysis apparatus, comprising: a voice receiving step of receiving a voice input uttered by a user; a sentence structure classification step of analyzing the received voice input to classify the sentence structure of the voice input; and a speech intention determination step of determining a speech intention with respect to the voice input according to the divided sentence structure.

구체적으로, 상기 문장구조구분단계는, 문장 구조 판단을 위해 기 정의된 특정 단어가 상기 음성 입력에 포함되는 경우, 상기 음성 입력의 문장 구조를 상기 특정 단어와 매칭되는 문장 구조로 구분할 수 있다. Specifically, in the step of classifying the sentence structure, when a predetermined specific word for determining the sentence structure is included in the voice input, the sentence structure of the voice input may be divided into a sentence structure matching the specific word.

구체적으로, 상기 음성 입력의 문장 구조는, 단일 의도의 처리를 명령하는 문장으로 이루어지는 제1 문장 구조, 각 의도의 처리를 명령하는 2 이상의 문장이 특정 단어로 연결되는 제2 문장 구조, 특정 단어로 연결되는 2 이상의 문장 중, 상기 특정 단어 이후의 일부 문장의 의도 처리 만을 명령하는 제3 문장 구조, 문장의 가장 마지막에 특정 단어가 연결되어, 상기 문장에 따른 의도 처리 대기를 명령하는 제4 문장 구조 중 적어도 어느 하나로 구분될 수 있다. Specifically, the sentence structure of the voice input includes a first sentence structure consisting of a sentence instructing processing of a single intention, a second sentence structure in which two or more sentences instructing processing of each intention are connected to a specific word, and a specific word. A third sentence structure for commanding only intention processing of some sentences after the specific word among two or more connected sentences, and a fourth sentence structure for commanding to wait for intention processing according to the sentence by connecting a specific word to the end of the sentence It can be divided into at least any one of.

구체적으로, 상기 발화의도판단단계는, 상기 음성 입력의 문장 구조를 상기 제2 또는 상기 제3 문장 구조로 구분한 경우, 상기 음성 입력 전체 및 상기 상기 음성 입력 전체에서 상기 특정 단어를 기준으로 구분되는 일부 음성 입력을 이용하여, 상기 음성 입력의 문장 구조를 구분한 결과에 대한 활용 여부를 판단할 수 있다. Specifically, in the utterance determination step, when the sentence structure of the voice input is divided into the second or the third sentence structure, the entire voice input and the entire voice input are divided based on the specific word. Whether to use the result of classifying the sentence structure of the voice input may be determined by using some of the voice inputs.

구체적으로, 상기 발화의도판단단계는, 상기 음성 입력의 문장 구조를 구분한 결과를 활용하는 것으로 판단하면, 상기 제2 문장 구조로 구분한 경우 상기 음성 입력 전체 중 상기 특정 단어로 구분되는 2 이상의 문장 각각을 순차적으로 분석하여 상기 음성 입력의 문장 별로 발화 의도를 판단하고, 상기 제3 문장 구조로 구분한 경우 상기 음성 입력 전체 중 상기 특정 단어 이후에 연결된 일부 문장 만을 분석하여 상기 음성 입력의 발화 의도를 판단할 수 있다.Specifically, when it is determined that the result of classifying the sentence structure of the voice input is utilized in the step of determining the utterance intention, two or more of the entire voice input that are divided by the specific word in the case of classifying the second sentence structure Sentence intention is determined for each sentence of the voice input by sequentially analyzing each sentence, and when divided into the third sentence structure, only some sentences connected after the specific word in the entire voice input are analyzed to determine the utterance intention of the voice input can be judged

구체적으로, 상기 발화의도판단단계는, 상기 음성 입력의 문장 구조를 구분한 결과를 활용하지 않는 것으로 판단하면, 상기 음성 입력 전체를 분석하여 상기 음성 입력의 발화 의도를 판단할 수 있다. Specifically, when determining that the result of classifying the sentence structure of the voice input is not used in the utterance intention determination step, the entire voice input may be analyzed to determine the utterance intention of the voice input.

구체적으로, 상기 음성수신단계는, 사용자에 의한 발화 시작시점부터 발화 종료시점까지 수신되는 음성을 음성수신부를 통해 상기 음성 입력으로서 수신하며, 상기 발화의도판단단계는, 상기 음성 입력의 문장 구조를 상기 제4 문장 구조로 구분한 경우, 상기 음성 입력에 대한 발화 의도 판단을 대기하며 상기 음성수신부로 하여금 상기 발화 종료시점 이후에도 음성 수신을 대기하도록 할 수 있다.Specifically, the voice receiving step receives a voice received by the user from the start of the utterance to the end of the utterance as the voice input through the voice receiving unit, and the utterance intention determination step includes the sentence structure of the voice input. In the case of dividing into the fourth sentence structure, it is possible to wait for determination of the utterance intention for the voice input and to cause the voice receiver to wait for voice reception even after the end of the utterance.

이에, 음성분석장치 및 음성분석장치의 동작 방법은, 음성 입력에 대한 사용자 발화 의도를 판단하기에 앞서, 음성 입력을 실제 대화 시 대표되는 문장 구조들 중 하나로 구분한 후 문장 구조에 맞는 방식으로 음성 입력의 분석 및 사용자 발화 의도를 판단하는 하는 새로운 형태의 대화형 음성 인터페이스를 실현할 수 있다.Accordingly, the voice analysis apparatus and the method of operation of the voice analysis apparatus classify the voice input into one of the sentence structures represented during actual conversation before determining the intention of the user to utter the voice input, and then perform the voice in a manner suitable for the sentence structure. It is possible to realize a new type of interactive voice interface that analyzes input and determines the intention of a user's utterance.

따라서, 본 발명에서는, 기존의 단순한 명령형 음성 인터페이스가 갖는 문제점들을 해결하여, 음성 인터페이스의 자유도 및 사용 편의를 극대화하는 효과를 도출할 수 있다.Accordingly, in the present invention, the problems of the existing simple command-type voice interface can be solved, and the effect of maximizing the degree of freedom and ease of use of the voice interface can be derived.

도 1은 본 발명이 적용되는 음성 기반 서비스 환경을 보여주는 예시도이다.
도 2는 본 발명의 바람직한 실시예에 따른 음성분석장치의 구성을 보여주는 예시도이다.
도 3 및 도 4는 본 발명의 바람직한 실시예에 따른 음성분석장치의 동작 방법을 보여주는 흐름도이다.1 is an exemplary diagram illustrating a voice-based service environment to which the present invention is applied.
2 is an exemplary diagram showing the configuration of a voice analysis apparatus according to a preferred embodiment of the present invention.
3 and 4 are flowcharts showing a method of operating a voice analysis apparatus according to a preferred embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 대하여 설명한다.Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

먼저, 도 1을 참조하여 본 발명이 적용되는 음성 기반 서비스 환경을 설명하도록 하겠다.First, a voice-based service environment to which the present invention is applied will be described with reference to FIG. 1 .

도 1에 도시된 바와 같이, 본 발명에서 제안하는 음성분석장치가 적용되는 음성 기반 서비스 환경은, 가정 또는 사무실 등 특정 장소에 설치되는 음성 기반 서비스장치(10)를 기본으로 한다.As shown in FIG. 1 , the voice-based service environment to which the voice analysis device proposed in the present invention is applied is based on the voice-based service device 10 installed in a specific place such as home or office.

음성 기반 서비스장치(10)는, 사용자가 발화하는 음성을 음성 입력으로 수신하고 이를 기반으로 제어대상 기기를 제어하는 장치로서, 음성 인터페이스 지원을 기본으로 한다.The voice-based service device 10 is a device that receives a voice uttered by a user as a voice input and controls a control target device based on the voice input, and based on a voice interface support.

이에, 음성 인터페이스를 기반으로 가정 내 기기를 제어하는 음성 기반 서비스를 간단히 설명하면, 사용자(1)가 음성으로 명령(입력)하면, 음성 기반 서비스장치(10)는 사용자 발화에 따른 음성 입력을 수신하여 텍스트로 변환한 후, 텍스트를 이용한 언어 분석을 통해 사용자 음성 입력의 의도를 파악하고, 그에 따른 제어 동작, 예컨대 전등을 켜거나 TV를 틀거나 또는 자체 내장된 컨텐츠 스트리밍 기능을 통해 음악을 재생하는 등의 제어 동작을 수행하는 방식이다.Accordingly, briefly describing a voice-based service for controlling in-home devices based on a voice interface, when the user 1 commands (inputs) with a voice, the voice-based service device 10 receives a voice input according to the user's utterance. After converting to text, it recognizes the intent of the user's voice input through language analysis using text, and controls actions accordingly, such as turning on a light, turning on a TV, or playing music through its built-in content streaming function. It is a method of performing a control operation, etc.

헌데, 현재의 음성 인터페이스의 경우, 발화 시작시점부터 발화 종료시점까지의 음성을 음성 입력으로 수신하되, 1회 발화 및 1개 명령으로 제한되는 단순 명령형의 음성 인터페이스 수준에 그치고 있다.However, in the case of the current voice interface, the voice from the start of the utterance to the end of the utterance is received as a voice input, but only at the level of a simple command type voice interface limited to one utterance and one command.

또한, 현재의 음성 인터페이스 기반으로 동작하는 음성 기반 서비스에서는, 사용자가 음성을 발화하여 명령하는 중 잠시 멈추고 싶은 경우, 연속적인 음성 발화가 유지되지 못하므로 정확한 음성 입력이 불가능하여, 음성 인터페이스의 자유도가 낮다는 단점이 있다.In addition, in the voice-based service operating based on the current voice interface, if the user wants to pause while uttering and commanding the voice, the continuous voice utterance cannot be maintained, so accurate voice input is impossible, so the degree of freedom of the voice interface is limited. The downside is that it is low.

이에, 본 발명에서는, 전술의 제약/불편함/단점들 즉 기존 음성 인터페이스가 갖는 문제점들을 해결할 수 있는, 새로운 형태의 대화형 음성 인터페이스를 실현하고자 한다. Accordingly, in the present invention, it is intended to realize a new type of interactive voice interface that can solve the above-mentioned limitations/inconveniences/disadvantages, that is, the problems of the existing voice interface.

보다 구체적으로, 본 발명에서 실현하고자 하는 대화형 음성 인터페이스를 가능하게 하는 기술을 제안하며, 이러한 기술을 구현하는 음성분석장치를 제안하고자 한다.More specifically, the present invention proposes a technology that enables an interactive voice interface to be realized, and proposes a voice analysis device implementing such a technology.

도 2는 본 발명의 일 실시예에 따른 음성분석장치의 구성을 보여주고 있다.2 shows the configuration of a voice analysis apparatus according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 음성분석장치(100)는, 음성수신부(110), 문장구조구분부(130), 발화의도판단부(140)를 포함할 수 있다.As shown in FIG. 1 , the voice analysis apparatus 100 according to an embodiment of the present invention may include a voice receiver 110 , a sentence structure classification unit 130 , and a utterance intention determination unit 140 . .

그리고, 본 발명의 실시예에 따른 음성분석장치(100)는, STT(120)를 더 포함할 수 있다.And, the voice analysis apparatus 100 according to an embodiment of the present invention may further include an STT (120).

그리고, 본 발명의 실시예에 따른 음성분석장치(100)는 출력부(150)를 더 포함할 수 있고, 이 경우 음성분석장치(100)는 출력부(150)에서 출력되는 제어신호를 각종 제어대상 기기로 유선 또는 무선 통신을 통해 전달하기 위한 실질적인 통신 기능을 담당하는 통신부(미도시)의 구성을 더 포함할 수도 있다.And, the voice analysis apparatus 100 according to the embodiment of the present invention may further include an output unit 150, in this case, the voice analysis apparatus 100 controls various control signals output from the output unit 150. It may further include a configuration of a communication unit (not shown) in charge of a practical communication function for transmitting to the target device through wired or wireless communication.

여기서, 통신부(미도시)는 예컨대, 안테나 시스템, RF 송수신기, 하나 이상의 증폭기, 튜너, 하나 이상의 발진기, 디지털 신호 처리기, 코덱(CODEC) 칩셋, 및 메모리 등을 포함하지만 이에 제한되지는 않으며, 이 기능을 수행하는 공지의 회로는 모두 포함할 수 있다.Here, the communication unit (not shown) includes, but is not limited to, for example, an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC (CODEC) chipset, and a memory, and the like, and this function All known circuits for performing the above may be included.

이러한 음성분석장치(100)의 구성 전체 내지는 적어도 일부는 하드웨어 모듈 형태 또는 소프트웨어 모듈 형태로 구현되거나, 하드웨어 모듈과 소프트웨어 모듈이 조합된 형태로도 구현될 수 있다.All or at least part of the configuration of the voice analysis apparatus 100 may be implemented in the form of a hardware module or a software module, or may be implemented in a form in which a hardware module and a software module are combined.

여기서, 소프트웨어 모듈이란, 예컨대, 음성분석장치(100) 내에서 연산을 제어하는 프로세서에 의해 실행되는 명령어로 이해될 수 있으며, 이러한 명령어는 음성분석장치(100) 내 메모리에 탑재된 형태를 가질 수 있을 것이다.Here, the software module may be understood as, for example, a command executed by a processor that controls operations in the voice analysis device 100, and these commands may have a form mounted in a memory in the voice analysis device 100. There will be.

결국, 본 발명의 실시예에 따른 음성분석장치(100)는 전술한 구성을 통해, 본 발명에서 제안하는 대화형 음성 인터페이스를 실현 가능하게 하는 기술을 구현하며, 이하에서는 이를 구현하기 위한 음성분석장치(100) 내 각 구성에 대해 보다 구체적으로 설명하기로 한다.After all, the voice analysis apparatus 100 according to the embodiment of the present invention implements the technology enabling the realization of the interactive voice interface proposed in the present invention through the above-described configuration, and hereinafter, a voice analysis apparatus for implementing the same. Each configuration in (100) will be described in more detail.

음성수신부(110)는, 사용자에 의해 발화되는 음성 입력을 수신하는 기능을 담당한다.The voice receiver 110 is responsible for receiving a voice input uttered by a user.

구체적으로, 음성수신부(110)는, 음성 입력으로 진입하기 위한 진입 동작(예: Wake up 음성 명령 또는 Mic 버튼) 감지 시 활성화되어 음성을 수신하고 수신되는 음성이 없으면(음성 발화 중단) 비활성화되는 마이크를 포함하거나 마이크와 연동하는 구조로서, 마이크를 통해 수신되는 음성 즉 사용자에 의한 발화 시작시점부터 발화 종료시점까지 수신되는 음성을 음성 입력으로서 수신할 수 있다.Specifically, the voice receiver 110 is activated when an entry operation (eg, a wake up voice command or a Mic button) for entering a voice input is detected to receive a voice, and a microphone that is deactivated when there is no received voice (voice utterance is stopped) As a structure that includes or interworks with a microphone, a voice received through the microphone, that is, a voice received from a start point of a utterance by a user to a utterance end time, may be received as a voice input.

문장구조구분부(130)는, 수신된 음성 입력을 분석하여 음성 입력의 문장 구조를 구분하는 기능을 담당한다.The sentence structure classification unit 130 analyzes the received voice input and plays a function of classifying the sentence structure of the voice input.

구체적으로, STT(120)는 음성수신부(110)를 통해 수신되는 음성 입력을 STT(Speech to text) 기능을 통해 텍스트로 변환한다.Specifically, the STT 120 converts a voice input received through the voice receiver 110 into text through a speech to text (STT) function.

문장구조구분부(130)는 음성수신부(110)를 통해 수신된 음성 입력이 STT(120)에서 텍스트로 변환되면, 변환된 텍스트를 분석하는 방식으로 금번 수신된 음성 입력을 분석하여 금번 음성 입력의 문장 구조를 구분할 수 있다.When the voice input received through the voice receiver 110 is converted into text in the STT 120, the sentence structure classification unit 130 analyzes the voice input received this time by analyzing the converted text. Distinguish sentence structure.

보가 구체적으로 설명하면, 문장구조구분부(130)는, 문장 구조 판단을 위해 기 정의된 특정 단어가 음성 입력 즉 음성 입력의 텍스트에 포함되어 있는지 여부를 확인하고, 특정 단어가 포함되어 있는 경우 금번 음성 입력의 문장 구조를 특정 단어와 매칭되는 문장 구조로 구분할 수 있다.Specifically, the sentence structure classification unit 130 checks whether a predetermined specific word is included in the text of the voice input, that is, the voice input for determining the sentence structure, and if the specific word is included, this time The sentence structure of the voice input may be divided into a sentence structure matching a specific word.

이를 위해, 본 발명의 음성분석장치(100)는, 문장 구조 판단을 위해 기 정의된 특정 단어를 보유하거나, 또는 특정 단어가 저장된 별도 DB와 연동할 수 있다.To this end, the voice analysis apparatus 100 of the present invention may retain a predefined specific word for determining the sentence structure or may be linked with a separate DB in which the specific word is stored.

여기서, 문장 구조 판단을 위해 기 정의된 특정 단어는, 크게 3가지 종류로 분류할 수 있다.Here, a specific word predefined for sentence structure determination may be largely classified into three types.

예를 들면, "그리고", "또", "..하고"등과 같이 대화 중 문장 연결 시 주로 사용되는 특정 단어들을 복수처리 명령어로 분류할 수 있고, "아니다", "아니라", "아니아니","취소" 등과 같이 대화 중 앞 문장을 부정할 때 주로 사용되는 특정 단어들을 취소 명령어로 분류할 수 있고, "잠시만", "잠깐", "기다려", 등과 같이 대화 중 대기를 요청할 때 주로 사용되는 특정 단어들을 대기 명령어로 분류할 수 있다.For example, specific words that are mainly used when linking sentences during conversation, such as "and", "again", "...and", etc., can be classified as plural processing commands, and "no", "not", "no no" Certain words that are mainly used when negating the previous sentence during a conversation, such as ","cancel", etc., can be classified as cancel commands, and are mainly used when requesting waiting during a conversation such as "wait", "wait", "wait", etc. Specific words used can be classified as waiting commands.

이에, 문장구조구분부(130)는, 기 보유하고 있는 특정 단어 또는 별도로 연동하는 DB를 근거로, 금번 음성 입력의 텍스트에 복수처리 명령어, 취소 명령어, 대기 명령어 중 적어도 하나가 포함되어 있는지 여부를 확인할 수 있다.Accordingly, the sentence structure classification unit 130 determines whether at least one of a multiple processing command, a cancel command, and a standby command is included in the text of the voice input this time based on a specific word or a separately linked DB. can be checked

그리고, 문장구조구분부(130)는, 음성 입력의 텍스트에 복수처리 명령어, 취소 명령어, 대기 명령어 중 적어도 하나가 포함되어 있는 경우, 금번 음성 입력의 문장 구조를 특정 단어(복수처리 명령어, 취소 명령어, 대기 명령어)와 매칭되는 문장 구조로 구분한다.And, when the text of the voice input includes at least one of a multiple processing command, a cancel command, and a standby command, the sentence structure classification unit 130 sets the sentence structure of the current voice input to a specific word (multiple processing command, cancel command). , wait command) and the matching sentence structure.

한편, 문장구조구분부(130)는, 음성 입력에 특정 단어가 포함되지 않은 경우, 금번 음성 입력의 문장 구조를 단일 의도의 처리를 명령하는 문장 구조(이하, 제1문장 구조)로 구분할 수 있다.On the other hand, when a specific word is not included in the voice input, the sentence structure classification unit 130 may divide the sentence structure of the current voice input into a sentence structure that commands processing of a single intention (hereinafter, the first sentence structure). .

또 다른 실시예에 따르면, 문장구조구분부(130)는, 음성 입력에 특정 단어가 포함되는 경우라도, 문장 구조 판단을 위해 기 저장된 특정 명칭정보와 매칭된다면 금번 음성 입력의 문장 구조를 단일 의도의 처리를 명령하는 제1 문장 구조로 구분할 수도 있다.According to another embodiment, even when a specific word is included in the voice input, the sentence structure classification unit 130 sets the sentence structure of the current voice input as a single intention if it matches the specific name information stored in advance to determine the sentence structure. It can also be divided into a first sentence structure for instructing processing.

이를 위해, 본 발명의 음성분석장치(100)는, 문장 구조 판단을 위해 특정 명칭정보를 기 저장하거나, 또는 특정 명칭정보가 저장된 별도 DB와 연동할 수 있다.To this end, the voice analysis apparatus 100 of the present invention may pre-store specific name information or interwork with a separate DB in which specific name information is stored in order to determine the sentence structure.

여기서, 문장 구조 판단을 위해 기 저장된 특정 명칭정보는, 전술의 복수처리 명령어, 취소 명령어, 대기 명령어 중 적어도 하나가 포함되는 컨텐츠 명칭 또는 제품 명칭 등 다양한 대상군에 대한 명칭정보를 포함한다.Here, the specific name information pre-stored for determining the sentence structure includes name information on various target groups, such as a content name or product name including at least one of the aforementioned multiple processing command, cancel command, and standby command.

이에, 문장구조구분부(130)는, 음성 입력 즉 음성 입력의 텍스트에 복수처리 명령어, 취소 명령어, 대기 명령어 중 적어도 하나가 포함되어 있는 경우, 기 보유하고 있는 특정 명칭정보 또는 별도로 연동하는 DB를 근거로 금번 음성 입력의 텍스트가 특정 명칭정보와 매칭된다면, 금번 음성 입력의 문장 구조를 단일 의도의 처리를 명령하는 제1 문장 구조로 구분할 수 있다.Accordingly, the sentence structure classification unit 130, when the text of the voice input, that is, the voice input, includes at least one of a plurality of processing commands, a cancel command, and a standby command, the specific name information previously owned or a DB linked separately If the text of the current voice input matches specific name information as a basis, the sentence structure of the current voice input may be divided into a first sentence structure for instructing processing of a single intention.

본 발명의 일 실시예에 따르면, 음성 입력의 문장 구조는, 단일 의도의 처리를 명령하는 문장으로 이루어지는 제1 문장 구조, 각 의도의 처리를 명령하는 2 이상의 문장이 특정 단어로 연결되는 제2 문장 구조, 특정 단어로 연결되는 2 이상의 문장 중, 상기 특정 단어 이후의 일부 문장의 의도 처리 만을 명령하는 제3 문장 구조, 문장의 가장 마지막에 특정 단어가 연결되어, 상기 문장에 따른 의도 처리 대기를 명령하는 제4 문장 구조 중 적어도 어느 하나로 구분될 수 있다.According to an embodiment of the present invention, the sentence structure of the voice input includes a first sentence structure consisting of a sentence for instructing processing of a single intention, and a second sentence in which two or more sentences instructing processing of each intention are connected with a specific word. structure, a third sentence structure that commands only intention processing of some sentences after the specific word among two or more sentences connected by a specific word, a specific word is connected at the end of the sentence, and commands to wait for intention processing according to the sentence and may be divided into at least one of the fourth sentence structures.

이에, 복수처리 명령어와 매칭되는 문장 구조는 전술의 제2 문장 구조를 의미하며, 취소 명령어와 매칭되는 문장 구조는 전술의 제3 문장 구조를 의미하며, 대기 명령어와 매칭되는 문장 구조는 전술의 제4 문장 구조를 의미하는 것으로 이해하면 된다.Accordingly, the sentence structure matching the plural processing command means the second sentence structure of the above, the sentence structure matching the cancel command means the third sentence structure of the above, and the sentence structure matching the standby command is the above-mentioned second sentence structure. 4 It should be understood as meaning the sentence structure.

이처럼 문장구조구분부(130)는, 음성 입력 즉 음성 입력의 텍스트에 대한 언어 분석을 통해 사용자 발화 의도를 파악하기에 앞서, 음성 입력(텍스트)을 분석하여 음성 입력의 문장 구조를 구분하고 있다.As such, the sentence structure classifying unit 130 analyzes the voice input (text) to classify the sentence structure of the voice input before identifying the user's utterance intention through language analysis of the voice input, that is, the text of the voice input.

이러한 문장구조구분부(130)는, STT(120)로부터 출력되는 음성 입력의 텍스트를 분석하여 문장 구조를 구분하는 엔진 형태로 구현될 수 있다.The sentence structure classification unit 130 may be implemented in the form of an engine that analyzes the text of the voice input output from the STT 120 to classify the sentence structure.

발화의도판단부(140)는, 문장구조구분부(130)에서 구분된 문장 구조에 따라 금번 음성 입력에 대한 발화 의도를 판단하는 기능을 담당한다.The utterance intention determining unit 140 is responsible for determining the utterance intention for the current voice input according to the sentence structure divided by the sentence structure dividing unit 130 .

즉, 발화의도판단부(140)는, 문장구조구분부(130)에서 구분된 문장 구조에 따라 금번 음성 입력 즉 음성 입력의 텍스트를 분석하여, 금번 음성 입력에 대한 발화 의도를 판단하는 것이다.That is, the utterance intention determining unit 140 analyzes the current voice input, that is, the text of the voice input, according to the sentence structure divided by the sentence structure classifying unit 130 , and determines the utterance intention for the current voice input.

구체적으로 설명하면, 발화의도판단부(140)는, STT(120)로부터 출력되는 음성 입력의 텍스트를 분석(예: NLU(natural language understanding) 분석)하여 분석 결과로서 사용자 발화 의도를 판단하는 NLU 엔진일 수 있다.Specifically, the speech intention determining unit 140 analyzes the text of the voice input output from the STT 120 (eg, natural language understanding (NLU) analysis) to determine the user's speech intention as an analysis result. It could be an engine.

발화의도판단부(140)는, 텍스트 분석 결과 사용자 발화 의도 판단이 불가능한 경우, 음성 입력 오류를 출력하여 사용자로 하여금 인지시킬 수 있다.When it is impossible to determine the user's speech intention as a result of the text analysis, the speech intention determining unit 140 may output a voice input error so that the user can recognize it.

이러한 발화의도판단부(140)는, 문장구조구분부(130)에서 금번 음성 입력의 문장 구조를 구분한 결과를 확인한다.The utterance intention determining unit 140 confirms the result of classifying the sentence structure of the current voice input by the sentence structure dividing unit 130 .

이에, 발화의도판단부(140)는, 금번 음성 입력의 문장 구조가 제1 문장 구조 즉 단일 의도의 처리를 명령하는 문장 하나로 이루어진 문장 구조로 구분된 경우, 금번 음성 입력 전체 즉 STT(120)로부터 출력되는 텍스트 전체를 분석하여 금번 음성 입력에 대한 발화 의도를 판단할 수 있다.Accordingly, when the sentence structure of the current voice input is divided into the first sentence structure, that is, a sentence structure composed of one sentence instructing processing of a single intention, the speech intention determining unit 140 is configured to perform the entire voice input, that is, the STT (120). By analyzing the entire text output from the , it is possible to determine the utterance intention for the current voice input.

이와 같이, 발화의도판단부(140)에서 제1 문장 구조로 판단된 음성 입력(텍스트)를 분석하여 발화 의도를 판단하는 과정은, 기존 음성 인터페이스에서 텍스트에 대한 언어 분석(NLU 분석)을 통해 사용자 발화 의도를 파악하는 과정과 동일할 수 있다.As described above, the process of determining the intention of speech by analyzing the speech input (text) determined as the first sentence structure by the speech intention determining unit 140 is performed through language analysis (NLU analysis) of the text in the existing speech interface. It may be the same as the process of recognizing the user's utterance intention.

즉, 본 발명에 따르면, 특정 단어가 포함되지 않거나 또는 특정 단어가 포함되지만 특정 명칭정보와 매칭되는 음성 입력 다시 말해 1회 발화 및 1개 명령의 음성 입력에 대해서는, 기존과 동일한 절차에 따라 사용자 발화 의도를 /판단할 수 있다.That is, according to the present invention, with respect to a voice input that does not contain a specific word or includes a specific word but matches specific name information, that is, a voice input of one utterance and one command, a user utterance is performed according to the same procedure as before. Intent /can be judged.

한편, 발화의도판단부(140)는, 금번 음성 입력의 문장 구조가 제2, 제3, 제4 문장 구조 중 적어도 하나의 문장 구조로 구분된 경우, 구분된 문장 구조에 따른 음성 입력(텍스트) 분석을 통해 금번 음성 입력에 대한 발화 의도를 판단할 수 있다.On the other hand, when the sentence structure of the current voice input is divided into at least one sentence structure among the second, third, and fourth sentence structures, the speech intention determining unit 140 is configured to perform a voice input (text) according to the divided sentence structure. ) through the analysis, it is possible to determine the utterance intention for the current voice input.

보다 구체적으로 실시예를 설명하면, 발화의도판단부(140)는, 금번 음성 입력의 문장 구조가 제2 또는 제3 문장 구조 중 하나로 구분된 경우, 구분된 문장 구조에 따른 텍스트 분석을 수행하기에 앞서, 금번 음성 입력 전체(텍스트 전체) 및 음성 입력 전체에서 특정 단어를 기준으로 구분되는 일부 음성 입력(구분 텍스트)를 이용하여 금번 음성 입력의 문장 구조를 구분한 결과에 대한 활용 여부를 판단할 수 있다.To describe the embodiment in more detail, when the sentence structure of the current voice input is divided into either the second or the third sentence structure, the speech intention determination unit 140 performs text analysis according to the divided sentence structure. Prior to this, it is determined whether the result of dividing the sentence structure of the current voice input using the entire voice input (full text) and some voice input (separated text) divided based on a specific word in the entire voice input is used. can

예를 들면, 금번 음성 입력(예: 불 꺼줘 그리고 TV 틀어줘)의 텍스트에서 특정 단어로서 "그리고"가 포함된 경우를 가정할 수 있다.For example, it may be assumed that "and" is included as a specific word in the text of this voice input (eg, turn off the light and turn on the TV).

이 경우, 문장구조구분부(130)에서는 금번 음성 입력의 문장 구조를 복수처리 명령어("그리고")와 매칭되는 제2 문장 구조로 구분할 것이다.In this case, the sentence structure classifying unit 130 will divide the sentence structure of the current voice input into a second sentence structure matching the plurality of processing commands (“and”).

이 경우, 발화의도판단부(140)는, 금번 음성 입력의 문장 구조가 제2 문장 구조로 구분된 경우이므로, 금번 음성 입력의 텍스트 전체(불 꺼줘 그리고 TV 틀어줘) 및 텍스트 전체에서 특정 단어(그리고)를 기준으로 구분되는 구분 텍스트(불 꺼줘/TV 틀어줘)를 이용하여 금번 음성 입력의 문장 구조를 구분한 결과에 대한 활용 여부를 판단할 수 있다.In this case, since the speech intention determination unit 140 divides the sentence structure of the current voice input into the second sentence structure, the entire text (Turn off the lights and turn on the TV) and specific words in the entire text of the current voice input. It is possible to determine whether to use the result of dividing the sentence structure of the voice input this time by using the separated text (Turn off the lights/Turn on the TV) divided based on (and).

예를 들면, 발화의도판단부(140)는, 금번 음성 입력의 텍스트 전체(불 꺼줘 그리고 TV 틀어줘)에 대한 언어 분석(NLU 분석) 및 금번 음성 입력의 구분 텍스트(불 꺼줘/TV 틀어줘) 별 언어 분석(NLU 분석)을 통해, 텍스트 전체 및 각 구분 텍스트 별로 비문(미완성 문장 또는 정상적이지 않은 문장) 여부 확인, 텍스트 전체에 컨텐츠 명칭이 있는지 여부 확인 등 지정된 확인 과정을 거쳐 금번 문장 구조(제2 문장 구조)에 대한 활용 여부를 판단할 수 있다.For example, the speech intention determination unit 140 may perform a language analysis (NLU analysis) of the entire text of the current voice input (Turn off the lights and turn on the TV) and the separated text (Turn off the lights/turn on TV) of the current voice input. ) through language analysis (NLU analysis), this sentence structure ( It can be determined whether the second sentence structure) is used.

예컨대, 발화의도판단부(140)는, 각 구분 텍스트 별 확인 결과 각 구분 텍스트 중 비문이 없고 텍스트 전체에 컨텐츠 명칭이 없으면, 금번 음성 입력의 문장 구조를 제2 문장 구조로 구분한 구분 결과를 활용하는 것으로 판단할 수 있다.For example, if there is no inscription in each divided text and no content name in the entire text as a result of checking for each divided text, the speech intention determination unit 140 divides the sentence structure of the current voice input into the second sentence structure. can be considered to be used.

만약, 발화의도판단부(140)는, 각 구분 텍스트 별 확인 결과 각 구분 텍스트 중 비문이 있거나 또는 텍스트 전체에 컨텐츠 명칭이 있고 텍스트 전체의 확인 결과 비문이 아니면, 금번 음성 입력의 문장 구조를 제2 문장 구조로 구분한 구분 결과를 활용하지 않는 것으로 판단할 수 있다.If, as a result of checking for each divided text, there is an inscription in each divided text, or if there is a content name in the entire text and not an inscription as a result of checking the entire text, the sentence structure of this voice input is removed. It can be judged that the classification result divided by the two sentence structure is not used.

한편 다른 예를 들면, 금번 음성 입력(예: 6시 알람 맞춰줘 아니다 7시로 알람 맞춰줘)의 텍스트에서 특정 단어로서 "아니다"가 포함된 경우를 가정할 수 있다.Meanwhile, as another example, it may be assumed that "no" is included as a specific word in the text of this voice input (eg, set the alarm for 6 o'clock, no, set the alarm for 7 o'clock).

이 경우, 문장구조구분부(130)에서는 금번 음성 입력의 문장 구조를 취소 명령어("아니다")와 매칭되는 제3 문장 구조로 구분할 것이다.In this case, the sentence structure classification unit 130 will divide the sentence structure of the current voice input into a third sentence structure that matches the cancel command (“No”).

이 경우, 발화의도판단부(140)는, 금번 음성 입력의 텍스트 전체(6시 알람 맞춰줘 아니다 7시로 알람 맞춰줘)에 대한 언어 분석(NLU 분석) 및 금번 음성 입력의 구분 텍스트(6시 알람 맞춰줘/7시로 알람 맞춰줘) 별 언어 분석(NLU 분석)을 통해, 텍스트 전체 및 각 구분 텍스트 별로 비문(미완성 문장 또는 정상적이지 않은 문장) 여부 확인, 텍스트 전체에 컨텐츠 명칭이 있는지 여부 확인 등 지정된 확인 과정을 거쳐 금번 문장 구조(제3 문장 구조)에 대한 활용 여부를 판단할 수 있다.In this case, the speech intention determination unit 140 performs a language analysis (NLU analysis) for the entire text of the current voice input (set the alarm at 6 o'clock, set the alarm at 7 o'clock) and the separated text of the current voice input (6 o'clock) Set the alarm/set the alarm to 7 o'clock) Through language analysis (NLU analysis), check whether there is an inscription (incomplete or abnormal sentence) for the entire text and each divided text, check whether there is a content name in the entire text, etc. Whether to use the current sentence structure (third sentence structure) may be determined through a designated verification process.

예컨대, 발화의도판단부(140)는, 특정 단어 이후의 구분 텍스트 확인 결과 비문이 아니고 텍스트 전체에 컨텐츠 명칭이 없으면, 금번 음성 입력의 문장 구조를 제3 문장 구조로 구분한 구분 결과를 활용하는 것으로 판단할 수 있다.For example, the utterance intention determination unit 140 uses the classification result of dividing the sentence structure of the current voice input into the third sentence structure if the result of checking the separated text after the specific word is not an inscription and there is no content name in the entire text. can be judged as

만약, 발화의도판단부(140)는, 특정 단어 이후의 구분 텍스트 확인 결과 비문이거나 또는 텍스트 전체에 컨텐츠 명칭이 있고 텍스트 전체의 확인 결과 비문이 아니면, 금번 음성 입력의 문장 구조를 제3 문장 구조로 구분한 구분 결과를 활용하지 않는 것으로 판단할 수 있다.If the speech intention determining unit 140 is an inscription as a result of checking the separated text after a specific word or if there is a content name in the entire text and not the inscription as a result of checking the entire text, the sentence structure of the current voice input is the third sentence structure It can be judged that the classification result divided by .

발화의도판단부(140)는, 음성 입력의 문장 구조를 구분한 결과를 활용하는 것으로 판단하면, 제2 문장 구조로 구분한 경우 금번 음성 입력 전체(텍스트 전체) 중 특정 단어로 구분되는 2 이상의 문장 각각을 순차적으로 분석하여 금번 음성 입력의 문장 별로 발화 의도를 판단할 수 있다.When it is determined that the result of dividing the sentence structure of the voice input is used, the speech intention determining unit 140 determines that the second sentence structure is used, and 2 or more of the entire voice input (the entire text) divided by a specific word. By sequentially analyzing each sentence, it is possible to determine the utterance intention for each sentence of the current voice input.

즉, 전술의 음성 입력(예: 불 꺼줘 그리고 TV 틀어줘) 및 제2 문장 구조를 가정하여 설명하면, 발화의도판단부(140)는, 제2 문장 구조로 구분한 결과를 활용하는 것으로 판단하면, 금번 음성 입력의 텍스트 전체(불 꺼줘 그리고 TV 틀어줘) 중 특정 단어(그리고)를 기준으로 구분되는 각 문장의 텍스트 즉 구분 텍스트(불 꺼줘/TV 틀어줘)에 대해 순차적으로 언어 분석(NLU 분석)을 수행하여 금번 음성 입력의 문장 별로 발화 의도를 판단할 수 있다.That is, if the explanation is assuming the above-mentioned voice input (eg, turn off the lights and turn on the TV) and the second sentence structure, the utterance intention determining unit 140 determines that the result divided into the second sentence structure is used. If you do, the language analysis (NLU) sequentially for the text of each sentence that is divided based on a specific word (and) among the entire text of the voice input (Turn off the lights and turn on the TV), that is, the separated text (Turn off the lights/turn on the TV) analysis) to determine the utterance intention for each sentence of the current voice input.

이렇게 되면, 출력부(150)는, 발화의도판단부(140)에서 판단된 각 문장(구분 텍스트) 별 발화 의도에 따른 각 제어신호를 각 제어대상 기기(예: 전등, TV)로 전송하여, 전등 불이 꺼지도록 하고 TV가 켜지도록 할 수 있다.In this case, the output unit 150 transmits each control signal according to the utterance intention for each sentence (separated text) determined by the utterance intention determination unit 140 to each control target device (eg, a light, a TV). , you can turn off the lights and turn on the TV.

한편, 발화의도판단부(140)는, 음성 입력의 문장 구조를 구분한 결과를 활용하는 것으로 판단하면, 제3 문장 구조로 구분한 경우 금번 음성 입력 전체(텍스트 전체) 중 특정 단어 이후에 연결된 일부 문장 만을 분석하여 금번 음성 입력의 발화 의도를 판단할 수 있다.On the other hand, if the speech intention determination unit 140 determines that the result of dividing the sentence structure of the voice input is utilized, when the sentence structure is divided into the third sentence structure, it is connected after a specific word among the entire voice input (the entire text) this time. By analyzing only some sentences, it is possible to determine the utterance intention of the current voice input.

즉, 전술의 음성 입력(예: 6시 알람 맞춰줘 아니다 7시로 알람 맞춰줘) 및 제3 문장 구조를 가정하여 설명하면, 발화의도판단부(140)는, 제3 문장 구조로 구분 결과를 활용하는 것으로 판단하면, 금번 음성 입력의 텍스트 전체(6시 알람 맞춰줘 아니다 7시로 알람 맞춰줘) 중 특정 단어(아니다) 이후에 연결된 일부 문장의 텍스트(7시로 알람 맞춰줘)에 대해서만 언어 분석(NLU 분석)을 수행하여 금번 음성 입력의 발화 의도를 판단할 수 있다.That is, assuming the above-mentioned voice input (eg, set the alarm for 6 o'clock, set the alarm for 7 o'clock) and the third sentence structure, the utterance intention determining unit 140 divides the result into the third sentence structure. If it is judged to be utilized, language analysis (set the alarm to 7 o'clock) only for the text of some sentences connected after a specific word (no) among the entire text of the voice input (set the alarm at 6 o'clock, no, set the alarm at 7 o'clock) NLU analysis) may be performed to determine the utterance intention of the current voice input.

이렇게 되면, 출력부(150)는, 발화의도판단부(140)에서 판단된 발화 의도에 따른 제어신호를 제어대상 기기(예: 알람시계, 또는 내부 알람기능)로 전송하여, 7시에 알람이 설정되도록 할 수 있다.In this case, the output unit 150 transmits a control signal according to the utterance intention determined by the utterance intention determination unit 140 to the control target device (eg, an alarm clock or an internal alarm function), and provides an alarm at 7 o'clock. You can set this up.

한편 또 다른 실시예를 설명하면, 발화의도판단부(140)는, 금번 음성 입력의 문장 구조가 제2 및 제3 문장 구조로 구분된 경우, 구분된 문장 구조에 따른 텍스트 분석을 수행하기에 앞서, 전술과 같이 금번 음성 입력의 문장 구조를 구분한 결과에 대한 활용 여부를 판단할 수 있다.Meanwhile, another embodiment will be described, when the sentence structure of the current voice input is divided into the second and third sentence structures, the speech intention determination unit 140 performs text analysis according to the divided sentence structure. As described above, it is possible to determine whether or not to use the result of dividing the sentence structure of the voice input this time.

예를 들면, 금번 음성 입력(예: A 그리고 B 해줘 아니다 C 해줘)의 텍스트에서 2개의 특정 단어 "그리고" "아니다"가 포함된 경우를 가정할 수 있다.For example, it may be assumed that two specific words "and" and "no" are included in the text of this voice input (eg, A and B say no, C say it).

이 경우, 문장구조구분부(130)에서는 금번 음성 입력의 문장 구조를 복수처리 명령어("그리고")와 매칭되는 제2 문장 구조 및 취소 명령어("아니다")와 매칭되는 제3 문장 구조로 구분할 것이다.In this case, the sentence structure classification unit 130 divides the sentence structure of the current voice input into a second sentence structure matching the plural processing command (“and”) and a third sentence structure matching the cancellation command (“no”). will be.

이 경우, 발화의도판단부(140)는, 금번 음성 입력의 문장 구조가 제2 및 제3 문장 구조로 구분된 경우이므로, 금번 음성 입력의 텍스트 전체(A 그리고 B 해줘 아니다 C 해줘) 및 텍스트 전체에서 특정 단어(그리고, 아니다)를 기준으로 구분되는 구분 텍스트(A/B 해줘/C 해줘)를 이용하여 금번 음성 입력의 문장 구조를 구분한 결과에 대한 활용 여부를 판단할 수 있다.In this case, since the sentence structure of the current voice input is divided into the second and third sentence structures, the utterance intention determining unit 140 includes the entire text (A and B, no, C) and the text of the current voice input. Whether to use the result of dividing the sentence structure of this voice input by using the separated text (A/B, Say, C) divided based on a specific word (and, no) in the whole can be determined.

예를 들면, 발화의도판단부(140)는, 금번 음성 입력의 텍스트 전체(A 그리고 B 해줘 아니다 C 해줘)에 대한 언어 분석(NLU 분석) 및 금번 음성 입력의 구분 텍스트(A/B 해줘/C 해줘) 별 언어 분석(NLU 분석)을 통해, 텍스트 전체 및 각 구분 텍스트 별로 비문(미완성 문장 또는 정상적이지 않은 문장) 여부 확인, 텍스트 전체에 컨텐츠 명칭이 있는지 여부 확인 등 지정된 확인 과정을 거쳐 금번 문장 구조(제2 및 제3 문장 구조)에 대한 활용 여부를 판단할 수 있다.For example, the utterance intention determining unit 140 may perform a language analysis (NLU analysis) for the entire text of the current voice input (A and B, do not C, do it) and a separated text of this voice input (A/B, do it / C) Through the language analysis (NLU analysis), the entire text and each divided text are checked for inscriptions (incomplete sentences or abnormal sentences), and through the specified verification process such as checking whether there is a content name in the entire text, this sentence Whether to use the structure (second and third sentence structures) may be determined.

예컨대, 발화의도판단부(140)는, 복수처리 명령어로서의 특정 단어(그리고)를 기준으로 구분된 구분 텍스트(A/B 해줘) 별 확인 결과 각 구분 텍스트 중 비문이 없고 텍스트 전체에 컨텐츠 명칭이 없으면, 금번 음성 입력의 문장 구조를 제2 문장 구조로 구분한 구분 결과에 대해서는 활용하는 것으로 판단할 수 있다.For example, the utterance intention determining unit 140, as a result of checking for each divided text (A / B) divided based on a specific word (and) as a plurality of processing commands, there is no inscription in each division text, and the content name is in the entire text. If not, it may be determined that the result of dividing the sentence structure of the current voice input into the second sentence structure is utilized.

또한, 발화의도판단부(140)는, 취소 명령어로서의 특정 단어(아니다) 이후의 구분 텍스트(C 해줘) 확인 결과 비문이 아니고 텍스트 전체에 컨텐츠 명칭이 없으면, 금번 음성 입력의 문장 구조를 제3 문장 구조로 구분한 구분 결과에 대해서는 활용하는 것으로 판단할 수 있다.In addition, the utterance intention determining unit 140 may determine the sentence structure of the current voice input as a third, if it is not an inscription and there is no content name in the entire text as a result of checking the separated text (C) after the specific word (No) as the cancellation command. It can be judged that the classification result divided by the sentence structure is utilized.

발화의도판단부(140)는, 음성 입력의 문장 구조를 구분한 결과를 활용하는 것으로 판단하면, 제2 및 제3 문장 구조로 구분한 경우 금번 음성 입력 전체(텍스트 전체) 중 복수처리 명령어로서의 특정 단어(그리고)로 구분되는 2 이상의 문장 각각을 순차적으로 분석하되, 금번 음성 입력 전체(텍스트 전체) 중 취소 명령어로서의 특정 단어(아니다) 이후에 연결된 일부 문장의 텍스트(C 해줘)에 대해서만 언어 분석(NLU 분석)을 수행하여 금번 음성 입력의 발화 의도를 판단할 수 있다.When it is determined that the result of classifying the sentence structure of the voice input is utilized, the speech intention determining unit 140 is used as a plural processing command among the entire voice input (the entire text) when divided into the second and third sentence structures. Analyze each of two or more sentences separated by a specific word (and) sequentially, but only for the text of some sentences connected after a specific word (not) as a cancel command among the entire voice input (whole text) (NLU analysis) may be performed to determine the utterance intention of the current voice input.

이렇게 되면, 출력부(150)는, 발화의도판단부(140)에서 판단된 발화 의도에 따른 제어신호를 제어대상 기기로 전송하여, C(예: 에어컨 On)가 수행되도록 할 수 있다.In this case, the output unit 150 may transmit a control signal according to the utterance intention determined by the utterance intention determining unit 140 to the control target device, so that C (eg, air conditioner On) is performed.

한편, 발화의도판단부(140)는, 음성 입력의 문장 구조를 구분한 결과를 활용하지 않는 것으로 판단하면, 금번 음성 입력의 텍스트 전체를 분석하여 금번 음성 입력의 발화 의도를 판단할 수 있다.Meanwhile, if it is determined that the result of dividing the sentence structure of the voice input is not used, the utterance intention determining unit 140 may analyze the entire text of the current voice input to determine the utterance intention of the current voice input.

즉, 발화의도판단부(140)는, 음성 입력의 문장 구조를 제2 또는 제3 문장 구조로 구분한 경우라도 이를 활용하지 않는 것으로 판단하면, 금번 음성 입력의 텍스트 전체에 대해 언어 분석(NLU 분석)을 수행하여 금번 음성 입력의 발화 의도를 판단할 수 있다.That is, the speech intention determining unit 140 determines that the speech input sentence structure is not used even when the second or third sentence structure is divided into a language analysis (NLU) for the entire text of the current speech input. analysis) to determine the utterance intention of the current voice input.

한편, 발화의도판단부(140)는, 금번 음성 입력의 문장 구조가 제4 문장 구조로 구분된 경우, 금번 음성 입력에 대한 발화 의도 판단을 대기하며 음성수신부(110)로 하여금 금번 음성 입력의 발화 종료시점 이후에도 음성 수신을 대기하도록 할 수 있다.On the other hand, when the sentence structure of the current voice input is divided into the fourth sentence structure, the utterance intention determining unit 140 waits to determine the utterance intention for the current voice input and causes the voice receiver 110 to It is possible to wait for voice reception even after the end of the speech.

예를 들면, 금번 음성 입력(예: 불 꺼줘 잠깐만)의 텍스트에서 특정 단어로서 "잠깐만"이 포함된 경우를 가정할 수 있다.For example, it may be assumed that "Wait a minute" is included as a specific word in the text of this voice input (eg, turn off the light for a minute).

이 경우, 문장구조구분부(130)에서는 금번 음성 입력의 문장 구조를 대기 명령어("잠깐만")와 매칭되는 제4 문장 구조로 구분할 것이다.In this case, the sentence structure classification unit 130 will divide the sentence structure of the current voice input into a fourth sentence structure that matches the standby command (“Wait a minute”).

이 경우, 발화의도판단부(140)는, 금번 음성 입력의 문장 구조가 제4 문장 구조로 구분된 경우이므로, 금번 음성 입력의 텍스트(불 꺼줘)를 분석하지 않고 저장한 상태로 대기하며, 음성수신부(110)로 하여금 수신되는 음성이 없더라도 마이크를 활성화 상태로 유지시켜 음성 수신을 대기하도록 할 수 있다.In this case, since the speech intention determining unit 140 divides the sentence structure of the current voice input into the fourth sentence structure, it does not analyze the text (turn off the lights) of the current voice input and waits in a stored state, Even if there is no voice being received, the voice receiver 110 may keep the microphone in an active state to wait for voice reception.

이후, 사용자가 다시 발화하여 음성 입력(예: 꺼줘)의 텍스트가 음성수신부(110) 및 STT(120)를 거쳐 수신되면, 발화의도판단부(140)는, 이전에 수신 및 저장하고 있던 음성 입력의 텍스트(불 꺼줘)와 새로 수신한 음성 입력의 텍스트(꺼줘)를 결합한 문장의 텍스트(불 꺼줘 꺼줘)에 대한 언어 분석(NLU 분석)을 수행하여 금번 음성 입력의 발화 의도를 판단할 수 있다.Thereafter, when the user utters again and the text of the voice input (eg, turn it off) is received through the voice receiving unit 110 and the STT 120 , the utterance intention determining unit 140 , the previously received and stored voice You can determine the utterance intention of this voice input by performing language analysis (NLU analysis) on the text (Turn off the light) of a sentence combining the text of the input (Turn off the light) and the text of the newly received voice input (Turn off). .

출력부(150)는, 발화의도판단부(140)에서 판단된 발화 의도에 따른 제어신호를 제어대상 기기로 전송하여, 제어대상 기기로 하여금 사용자 발화 의도에 따라 동작하도록 한다.The output unit 150 transmits a control signal according to the utterance intention determined by the utterance intention determining unit 140 to the controlling device to cause the controlling device to operate according to the user's utterance intention.

이상에서 설명한 바와 같이, 본 발명에 따른 음성분석장치(100)는, 음성 입력에 대한 사용자 발화 의도를 판단하기에 앞서, 음성 입력을 실제 대화 시 구분할 수 있는 대표적인 문장 구조들 중 하나로 구분하고 구분한 문장 구조에 맞는 방식으로 음성 입력의 텍스트 분석 및 사용자 발화 의도를 판단하는 기술을 실현하고 있다.As described above, the voice analysis apparatus 100 according to the present invention divides and divides the voice input into one of the representative sentence structures that can be distinguished during an actual conversation before determining the user's intention to utter the voice input. A technique for analyzing text of a voice input and judging the intention of a user's utterance is realized in a way that fits the sentence structure.

이처럼 본 발명에 따르면, 음성 입력을 대화 시 대표되는 문장 구조로 구분한 후 그에 맞게 음성 입력의 텍스트 분석 및 사용자 발화 의도를 판단하는 방식으로 대화형 음성 인터페이스를 실현함으로써, 1회 발화 및 1개 명령으로 제한되는 기존의 단순한 명령형 음성 인터페이스가 갖는 문제점들, 예컨대 음성 발화 중 음성 입력을 수정할 수 없는 제약, 2 가지 이상의 명령을 한번의 음성 입력을 할 수 없었던 불편함, 음성 발화 중 명령을 잠시 멈춘 후 이어서 명령할 수 없었던 낮은 자유도의 단점을 해결할 수 있는 효과를 도출한다. As described above, according to the present invention, by realizing an interactive voice interface in such a way that a voice input is divided into a sentence structure represented during a conversation, text analysis of the voice input and a user's utterance intention are determined accordingly, one utterance and one command Problems with the existing simple command-type voice interface limited to, for example, the limitation in not being able to modify the voice input during voice utterance, the inconvenience of not being able to input two or more commands at once, and after stopping the command during voice utterance for a while. Then, an effect that can solve the disadvantage of the low degree of freedom that could not be ordered is derived.

특히, 본 발명에서는, 음성 입력의 문장 구조를 구분한 결과에 대한 활용 여부를 판단하는 절차를 적용함으로써, 대화형 음성 인터페이스 실현 시 문장 구조 구분 오류로 인해 사용자 발화 의도를 잘못 판단할 수 있는 상황으로부터 자유로울 수 있다. In particular, in the present invention, by applying the procedure for determining whether to use the result of dividing the sentence structure of the voice input, it is possible to avoid a situation in which the intention of the user can be incorrectly determined due to a sentence structure classification error when realizing an interactive voice interface. can be free

이하에서는, 도 3 및 도 4를 참조하여 본 발명의 일 실시예에 따른 음성분석장치의 동작 방법, 달리 말하면 대화형 음성 인터페이스를 제공하는 흐름을 구체적으로 설명하도록 한다. Hereinafter, a method of operating a voice analysis apparatus according to an embodiment of the present invention, in other words, a flow of providing an interactive voice interface will be described in detail with reference to FIGS. 3 and 4 .

설명의 편의를 위해 전술의 도 1 및 도 2에서 언급한 참조번호를 언급하여 설명하도록 하겠다. For convenience of description, the reference numerals mentioned in FIGS. 1 and 2 above will be referred to.

먼저, 도 3을 참조하여 본 발명의 일 실시예에 따른 음성분석장치의 동작 방법의 전체 흐름을 설명하겠다. First, an overall flow of a method of operating a voice analysis apparatus according to an embodiment of the present invention will be described with reference to FIG. 3 .

본 발명의 음성분석장치의 동작 방법에서 음성분석장치(100)는, 사용자에 의해 발화되는 음성 입력을 수신한다(S100).In the method of operating the voice analysis device of the present invention, the voice analysis device 100 receives a voice input uttered by a user (S100).

예를 들면, 음성분석장치(100)는, 음성 입력으로 진입하기 위한 진입 동작(예: Wake up 음성 명령 또는 Mic 버튼) 감지 시 음성수신부(110)의 마이크를 활성화시켜 음성을 수신함으로써, 사용자에 의한 발화 시작시점부터 발화 종료시점까지 수신되는 음성을 음성 입력으로서 수신할 수 있다.For example, the voice analysis apparatus 100 receives a voice by activating the microphone of the voice receiving unit 110 when detecting an entry operation (eg, a wake up voice command or a Mic button) for entering a voice input, thereby providing the user with a voice. A voice received from the start time of the utterance by the utterance to the end of the utterance may be received as a voice input.

본 발명의 음성분석장치의 동작 방법에서 음성분석장치(100)는, 음성 입력이 수신되면 음성 입력을 STT(Speech to text) 기능을 통해 텍스트로 변환한다(S20).In the method of operating a voice analysis apparatus of the present invention, the voice analysis apparatus 100, upon receiving a voice input, converts the voice input into text through a Speech to text (STT) function (S20).

본 발명의 음성분석장치의 동작 방법에서 음성분석장치(100)는, 음성 입력으로부터 변환된 텍스트를 분석하여 금번 음성 입력의 문장 구조(예: 제1,제2,제3,제4 문장 구조)를 구분할 수 있다(S30).In the operating method of the voice analysis device of the present invention, the voice analysis device 100 analyzes the text converted from the voice input, and the sentence structure of the current voice input (eg, first, second, third, fourth sentence structure) can be distinguished (S30).

그리고, 본 발명의 음성분석장치의 동작 방법에서 음성분석장치(100)는, S30단계에서 음성 입력의 문장 구조를 구분한 구분 결과에 따라 음성 입력의 텍스트를 분석하여, 금번 음성 입력에 대한 발화 의도를 판단한다(S40).And, in the operation method of the voice analysis device of the present invention, the voice analysis device 100 analyzes the text of the voice input according to the classification result of dividing the sentence structure of the voice input in step S30, and the utterance intention for the current voice input is determined (S40).

이에, 본 발명의 음성분석장치의 동작 방법에서 음성분석장치(100)는, S40단계에서 판단한 사용자 발화 의도에 따른 제어신호를 해당 제어대상 기기로 출력하여, 제어신호를 기반으로 제어대상 기기의 동작을 제어할 수 있다(S50).Accordingly, in the method of operating a voice analysis apparatus of the present invention, the voice analysis apparatus 100 outputs a control signal according to the user's utterance intention determined in step S40 to the corresponding control target device, and operates the control target device based on the control signal. can be controlled (S50).

이하에서는, 도 4를 참조하여 음성 입력의 문장 구조를 구분하는 S30단계, 문장 구조를 구분한 결과에 따라 음성 입력의 텍스트를 분석하는 S40단계를 구체적으로 설명하겠다.Hereinafter, step S30 of classifying the sentence structure of the voice input and step S40 of analyzing the text of the voice input according to the result of classifying the sentence structure will be described in detail with reference to FIG. 4 .

본 발명의 음성분석장치의 동작 방법에서 음성분석장치(100)는, S20단계에서 변환된 텍스트를 분석하여 문장 구조 판단을 위해 기 정의된 특정 단어가 포함되어 있는지 확인한다(S32).In the method of operating the voice analysis apparatus of the present invention, the voice analysis apparatus 100 analyzes the text converted in step S20 and checks whether a predefined specific word is included for determining the sentence structure (S32).

이에, 음성분석장치(100)는, 기 보유하고 있는 특정 단어 또는 별도로 연동하는 DB를 근거로, 금번 음성 입력의 텍스트에 복수처리 명령어 또는 취소 명령어 또는 대기 명령어가 포함되어 있는지 여부를 확인할 수 있다(S32).Accordingly, the voice analysis apparatus 100 may check whether the text of the current voice input includes a multiple processing command, a cancel command, or a standby command, based on a specific word or a separately linked DB ( S32).

그리고, 본 발명의 음성분석장치의 동작 방법에서 음성분석장치(100)는, 음성 입력의 텍스트에 복수처리 명령어 또는 취소 명령어 또는 대기 명령어가 포함되어 있는 경우(S32 Yes), 금번 음성 입력의 문장 구조를 텍스트에 포함된 특정 단어(복수처리 명령어, 또는 취소 명령어, 또는 대기 명령어)와 매칭되는 문장 구조로 구분한다(S36).And, in the operating method of the voice analysis device of the present invention, the voice analysis device 100, when the text of the voice input includes multiple processing commands, cancel commands, or standby commands (S32 Yes), the sentence structure of this voice input is divided into a sentence structure matching a specific word (multiple processing command, cancel command, or standby command) included in the text (S36).

한편, 본 발명의 음성분석장치의 동작 방법에서 음성분석장치(100)는, 음성 입력의 텍스트에 특정 단어가 포함되지 않은 경우(S32 No), 금번 음성 입력의 문장 구조를 단일 의도의 처리를 명령하는 문장 구조(이하, 제1문장 구조)로 구분할 수 있다(S34).On the other hand, in the method of operating the voice analysis device of the present invention, the voice analysis device 100 commands the processing of a single intention for the sentence structure of the current voice input when the text of the voice input does not contain a specific word (S32 No). It can be divided into a sentence structure (hereinafter, the first sentence structure) (S34).

이처럼 음성분석장치(100)는, 음성 입력의 텍스트에 대한 언어 분석을 통해 사용자 발화 의도를 파악하는 S40단계 이전에, 음성 입력의 텍스트를 분석하여 음성 입력의 문장 구조를 구분하고 있다.As such, the voice analysis apparatus 100 analyzes the text of the voice input and classifies the sentence structure of the voice input before step S40 of recognizing the user's utterance intention through language analysis of the text of the voice input.

본 발명의 음성분석장치의 동작 방법에서 음성분석장치(100)는, S36단계 S34단계에서 금번 음성 입력에 대해 구분한 문장 구조 구분 결과를 확인한다.In the method of operating the voice analysis device of the present invention, the voice analysis device 100 checks the sentence structure classification result divided for the current voice input in step S36 and step S34.

이에, 본 발명의 음성분석장치의 동작 방법에서 음성분석장치(100)는, 금번 음성 입력의 문장 구조가 제1 문장 구조 즉 단일 의도의 처리를 명령하는 문장 하나로 이루어진 문장 구조로 구분된 경우(S34), 금번 음성 입력의 텍스트 전체를 분석(NLU 분석)하여 금번 음성 입력에 대한 발화 의도를 판단할 수 있다(S47).Accordingly, in the method of operating the voice analysis device of the present invention, the voice analysis device 100 is configured to divide the sentence structure of the current voice input into a first sentence structure, that is, a sentence structure consisting of one sentence for instructing processing of a single intention (S34). ), the entire text of the current voice input may be analyzed (NLU analysis) to determine the utterance intention for the current voice input (S47).

이와 같이, 제1 문장 구조로 판단된 음성 입력(텍스트)를 분석하여 발화 의도를 판단하는 과정은, 기존 음성 인터페이스에서 텍스트에 대한 언어 분석(NLU 분석)을 통해 사용자 발화 의도를 파악하는 과정과 동일할 수 있다.As described above, the process of analyzing the speech input (text) determined as the first sentence structure to determine the speech intention is the same as the process of recognizing the user's speech intention through language analysis (NLU analysis) of the text in the existing voice interface. can do.

한편, 본 발명의 음성분석장치의 동작 방법에서 음성분석장치(100)는, 금번 음성 입력의 문장 구조가 제2 또는 제3 또는 제4 문장 구조로 구분된 경우(S36), 구분된 문장 구조에 따른 텍스트 분석을 통해 금번 음성 입력에 대한 발화 의도를 판단할 수 있다.On the other hand, in the operating method of the voice analysis device of the present invention, the voice analysis device 100, when the sentence structure of the current voice input is divided into the second, third, or fourth sentence structure (S36), the divided sentence structure Through the text analysis, it is possible to determine the utterance intention for the current voice input.

보다 구체적으로 실시예를 설명하면, 음성분석장치(100)는, 금번 음성 입력의 문장 구조가 제4 문장 구조로 구분된 경우(S41 Yes), 금번 음성 입력에 대한 발화 의도 판단을 대기하며 음성수신부(110)로 하여금 금번 음성 입력의 발화 종료시점 이후에도 음성 수신을 대기하도록 할 수 있다(S45).To describe the embodiment in more detail, when the sentence structure of the current voice input is divided into the fourth sentence structure (S41 Yes), the voice analysis apparatus 100 waits for the determination of the utterance intention for the current voice input and the voice receiver It is possible to make the 110 wait for voice reception even after the utterance end point of the current voice input (S45).

이 경우, S32 및 S36단계에서는 금번 음성 입력의 문장 구조를 대기 명령어("잠깐만")와 매칭되는 제4 문장 구조로 구분할 것이다.In this case, in steps S32 and S36, the sentence structure of the current voice input will be divided into a fourth sentence structure that matches the standby command (“Wait a minute”).

이 경우, 음성분석장치(100)는, 금번 음성 입력의 문장 구조가 제4 문장 구조로 구분된 경우이므로, 금번 음성 입력의 텍스트(불 꺼줘)를 분석하지 않고 저장한 상태로 대기하며, 음성수신부(110)로 하여금 수신되는 음성이 없더라도 마이크를 활성화 상태로 유지시켜 음성 수신을 대기하도록 할 수 있다.In this case, since the voice analysis apparatus 100 is a case in which the sentence structure of the current voice input is divided into the fourth sentence structure, it does not analyze the text (turn off the lights) of the current voice input and waits in a stored state, and the voice receiving unit Even if there is no voice being received, the 110 may be made to wait for voice reception by maintaining the microphone in an active state.

이후, 사용자가 다시 발화하여 음성 입력(예: 꺼줘)의 텍스트가 도 3의 S10, S20단계를 거쳐 수신되면, 음성분석장치(100)는, 이전에 수신 및 저장하고 있던 음성 입력의 텍스트(불 꺼줘)와 새로 수신한 음성 입력의 텍스트(꺼줘)를 결합한 문장의 텍스트(불 꺼줘 꺼줘)에 대한 언어 분석(NLU 분석)을 수행하여 금번 음성 입력의 발화 의도를 판단할 수 있다(S46).Thereafter, when the user utters again and the text of the voice input (eg, turn it off) is received through steps S10 and S20 of FIG. 3 , the voice analysis apparatus 100 performs the previously received and stored text of the voice input. Turn off) and the text of the newly received voice input (Turn off) are performed language analysis (NLU analysis) on the text (Turn off, turn off) of the sentence combined to determine the utterance intention of the current voice input (S46).

한편, 음성분석장치(100)는, 금번 음성 입력의 문장 구조가 제2 또는 제3 문장 구조로 구분된 경우(S41 No), 구분된 문장 구조에 따른 텍스트 분석을 수행하기에 앞서, 금번 음성 입력 전체(텍스트 전체) 및 음성 입력 전체에서 특정 단어를 기준으로 구분되는 일부 음성 입력(구분 텍스트)를 이용하여 금번 음성 입력의 문장 구조를 구분한 결과에 대한 활용 여부를 판단할 수 있다(S42).On the other hand, when the sentence structure of the current voice input is divided into the second or third sentence structure (S41 No), the voice analysis apparatus 100 performs text analysis according to the divided sentence structure before performing the text analysis according to the divided sentence structure. Whether to use the result of dividing the sentence structure of the current voice input by using a partial voice input (separated text) that is divided based on a specific word in the whole (full text) and the entire voice input may be determined (S42).

이 경우, S32 및 S36단계에서는 금번 음성 입력의 문장 구조를 복수처리 명령어("그리고")와 매칭되는 제2 문장 구조로 구분할 것이다.In this case, in steps S32 and S36, the sentence structure of the current voice input will be divided into a second sentence structure that matches the plurality of processing commands (“and”).

이 경우, 음성분석장치(100)는, 금번 음성 입력의 문장 구조가 제2 문장 구조로 구분된 경우이므로, 금번 음성 입력의 텍스트 전체(불 꺼줘 그리고 TV 틀어줘) 및 텍스트 전체에서 특정 단어(그리고)를 기준으로 구분되는 구분 텍스트(불 꺼줘/TV 틀어줘)를 이용하여 금번 음성 입력의 문장 구조를 구분한 결과에 대한 활용 여부를 판단할 수 있다(S42).In this case, the voice analysis apparatus 100, since the sentence structure of the current voice input is divided into the second sentence structure, a specific word (and ), it is possible to determine whether to use the result of dividing the sentence structure of the voice input this time by using the separated text (Turn off the lights / Turn on the TV) (S42).

예를 들면, 음성분석장치(100)는, 금번 음성 입력의 텍스트 전체(불 꺼줘 그리고 TV 틀어줘)에 대한 언어 분석(NLU 분석) 및 금번 음성 입력의 구분 텍스트(불 꺼줘/TV 틀어줘) 별 언어 분석(NLU 분석)을 통해, 텍스트 전체 및 각 구분 텍스트 별로 비문(미완성 문장 또는 정상적이지 않은 문장) 여부 확인, 텍스트 전체에 컨텐츠 명칭이 있는지 여부 확인 등 지정된 확인 과정을 거쳐 금번 문장 구조(제2 문장 구조)에 대한 활용 여부를 판단할 수 있다.For example, the voice analysis apparatus 100 performs a language analysis (NLU analysis) for the entire text of this voice input (Turn off the lights and turn on TV) and a separate text (Turn off the lights/turn on TV) of this voice input. Through language analysis (NLU analysis), this sentence structure (2nd sentence structure) can be used.

예컨대, 음성분석장치(100)는, 각 구분 텍스트 별 확인 결과 각 구분 텍스트 중 비문이 없고 텍스트 전체에 컨텐츠 명칭이 없으면, 금번 음성 입력의 문장 구조를 제2 문장 구조로 구분한 구분 결과를 활용하는 것으로 판단할 수 있다.For example, the voice analysis apparatus 100 uses the classification result of dividing the sentence structure of the current voice input into the second sentence structure if there is no inscription in each divided text and no content name in the entire text as a result of checking for each divided text. can be judged as

만약, 음성분석장치(100)는, 각 구분 텍스트 별 확인 결과 각 구분 텍스트 중 비문이 있거나 또는 텍스트 전체에 컨텐츠 명칭이 있고 텍스트 전체의 확인 결과 비문이 아니면, 금번 음성 입력의 문장 구조를 제2 문장 구조로 구분한 구분 결과를 활용하지 않는 것으로 판단할 수 있다.If, as a result of checking for each divided text, there is an inscription in each divided text or if there is a content name in the entire text and not the inscription as a result of checking the entire text, the sentence structure of the current voice input is the second sentence It can be judged that the classification result divided by the structure is not used.

이 경우, S32 및 S36단계에서는 금번 음성 입력의 문장 구조를 취소 명령어("아니다")와 매칭되는 제3 문장 구조로 구분할 것이다.In this case, in steps S32 and S36, the sentence structure of the current voice input will be divided into a third sentence structure that matches the cancel command (“No”).

이 경우, 음성분석장치(100)는, 금번 음성 입력의 텍스트 전체(6시 알람 맞춰줘 아니다 7시로 알람 맞춰줘)에 대한 언어 분석(NLU 분석) 및 금번 음성 입력의 구분 텍스트(6시 알람 맞춰줘/7시로 알람 맞춰줘) 별 언어 분석(NLU 분석)을 통해, 텍스트 전체 및 각 구분 텍스트 별로 비문(미완성 문장 또는 정상적이지 않은 문장) 여부 확인, 텍스트 전체에 컨텐츠 명칭이 있는지 여부 확인 등 지정된 확인 과정을 거쳐 금번 문장 구조(제3 문장 구조)에 대한 활용 여부를 판단할 수 있다.In this case, the voice analysis apparatus 100 performs a language analysis (NLU analysis) for the entire text of the current voice input (set the alarm at 6 o'clock, set the alarm at 7 o'clock) and the separated text of this voice input (set the alarm at 6 o'clock) Set the alarm to give me / 7 o'clock) Through language analysis (NLU analysis), check whether the entire text and each separated text are inscriptions (incomplete or abnormal sentences), and check whether there is a content name in the entire text. Through the process, it can be determined whether the current sentence structure (third sentence structure) is used.

예컨대, 음성분석장치(100)는, 특정 단어 이후의 구분 텍스트 확인 결과 비문이 아니고 텍스트 전체에 컨텐츠 명칭이 없으면, 금번 음성 입력의 문장 구조를 제3 문장 구조로 구분한 구분 결과를 활용하는 것으로 판단할 수 있다.For example, the voice analysis apparatus 100 determines to utilize the classification result obtained by dividing the sentence structure of the current voice input into the third sentence structure if the result of checking the separated text after the specific word is not an inscription and there is no content name in the entire text. can do.

만약, 음성분석장치(100)는, 특정 단어 이후의 구분 텍스트 확인 결과 비문이거나 또는 텍스트 전체에 컨텐츠 명칭이 있고 텍스트 전체의 확인 결과 비문이 아니면, 금번 음성 입력의 문장 구조를 제3 문장 구조로 구분한 구분 결과를 활용하지 않는 것으로 판단할 수 있다.If the voice analysis apparatus 100 is an inscription as a result of checking the separated text after a specific word or if there is a content name in the entire text and not the inscription as a result of checking the entire text, the sentence structure of the current voice input is divided into a third sentence structure It can be judged that the result of one classification is not used.

음성분석장치(100)는, 음성 입력의 문장 구조를 구분한 결과를 활용하는 것으로 판단하면(S43 Yes), 제2 문장 구조로 구분한 경우 금번 음성 입력 전체(텍스트 전체) 중 특정 단어로 구분되는 2 이상의 문장 각각을 순차적으로 분석하여 금번 음성 입력의 문장 별로 발화 의도를 판단할 수 있다(S44).If the voice analysis apparatus 100 determines that the result of classifying the sentence structure of the voice input is utilized (S43 Yes), the second sentence structure is divided into a specific word among the entire voice input (full text) this time. By sequentially analyzing each of the two or more sentences, it is possible to determine the utterance intention for each sentence of the current voice input (S44).

이렇게 되면, 도 3의 S50단계에서는, S40(도 4의 S44)단계에서 판단된 각 문장(구분 텍스트) 별 발화 의도에 따른 각 제어신호를 각 제어대상 기기(예: 전등, TV)로 전송하여, 전등 불이 꺼지도록 하고 TV가 켜지도록 할 수 있다.In this case, in step S50 of FIG. 3, each control signal according to the utterance intention for each sentence (separated text) determined in step S40 (S44 of FIG. 4) is transmitted to each control target device (eg, a light, TV). , you can turn off the lights and turn on the TV.

한편, 음성분석장치(100)는, 음성 입력의 문장 구조를 구분한 결과를 활용하는 것으로 판단하면(S43 Yes), 제3 문장 구조로 구분한 경우 금번 음성 입력 전체(텍스트 전체) 중 특정 단어 이후에 연결된 일부 문장 만을 분석하여 금번 음성 입력의 발화 의도를 판단할 수 있다(S44).On the other hand, if the voice analysis apparatus 100 determines that the result of dividing the sentence structure of the voice input is utilized (S43 Yes), if the sentence structure is divided into the third sentence structure, after a specific word among the entire voice input (full text) By analyzing only some sentences connected to , it is possible to determine the utterance intention of the current voice input (S44).

이렇게 되면, 도 3의 S50단계에서는, S40(도 4의 S44)단계에서 판단된 발화 의도에 따른 제어신호를 제어대상 기기(예: 알람시계, 또는 내부 알람기능)로 전송하여, 7시에 알람이 설정되도록 할 수 있다.In this case, in step S50 of FIG. 3, a control signal according to the utterance intention determined in step S40 (S44 of FIG. 4) is transmitted to the control target device (eg, an alarm clock, or an internal alarm function), and an alarm at 7 o'clock You can set this up.

한편, 음성분석장치(100)는, 음성 입력의 문장 구조를 구분한 결과를 활용하지 않는 것으로 판단하면(S43 No), 금번 음성 입력의 텍스트 전체를 분석하여 금번 음성 입력의 발화 의도를 판단할 수 있다(S47).On the other hand, if it is determined that the voice analysis apparatus 100 does not utilize the result of classifying the sentence structure of the voice input (S43 No), it is possible to determine the utterance intention of the current voice input by analyzing the entire text of the current voice input. There is (S47).

즉, 음성분석장치(100)는, 음성 입력의 문장 구조를 제2 또는 제3 문장 구조로 구분한 경우라도 이를 활용하지 않는 것으로 판단하면, 금번 음성 입력의 텍스트 전체에 대해 언어 분석(NLU 분석)을 수행하여 금번 음성 입력의 발화 의도를 판단할 수 있다.That is, if the voice analysis apparatus 100 determines that the sentence structure of the voice input is not used even when the sentence structure of the voice input is divided into the second or third sentence structure, the language analysis (NLU analysis) of the entire text of the voice input this time can be performed to determine the utterance intention of the current voice input.

이상에서 설명한 바와 같이, 본 발명에 따른 음성분석장치의 동작 방법에 의하면, 음성 입력에 대한 사용자 발화 의도를 판단하기에 앞서, 음성 입력을 실제 대화 시 구분할 수 있는 대표적인 문장 구조들 중 하나로 구분하고 구분한 문장 구조에 맞는 방식으로 음성 입력의 텍스트 분석 및 사용자 발화 의도를 판단하는 방식으로 대화형 음성 인터페이스를 실현함으로써, 기존의 단순한 명령형 음성 인터페이스가 갖는 문제점들, 예컨대 음성 발화 중 음성 입력을 수정할 수 없는 제약, 2 가지 이상의 명령을 한번의 음성 입력을 할 수 없었던 불편함, 음성 발화 중 명령을 잠시 멈춘 후 이어서 명령할 수 없었던 낮은 자유도의 단점을 해결할 수 있는 효과를 도출한다. As described above, according to the operation method of the voice analysis apparatus according to the present invention, before determining the user's intention to utter the voice input, the voice input is divided into one of the representative sentence structures that can be distinguished during an actual conversation and divided. By realizing an interactive voice interface in a way that analyzes the text of the voice input and determines the intention of a user's utterance in a manner suitable for a sentence structure, problems with the existing simple command-type voice interface, such as the inability to correct the voice input during voice utterance Constraints, the inconvenience of not being able to input two or more commands at once, and the low degree of freedom in which commands cannot be continued after stopping commands during voice utterance can be deduced.

본 발명의 실시예들은, 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Embodiments of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

지금까지 본 발명을 바람직한 실시 예를 참조하여 상세히 설명하였지만, 본 발명이 상기한 실시 예에 한정되는 것은 아니며, 이하의 특허청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 또는 수정이 가능한 범위까지 본 발명의 기술적 사상이 미친다 할 것이다.Although the present invention has been described in detail with reference to preferred embodiments so far, the present invention is not limited to the above-described embodiments, and without departing from the gist of the present invention as claimed in the following claims, the technical field to which the present invention pertains It will be said that the technical idea of the present invention extends to a range where various modifications or modifications can be made by anyone with ordinary knowledge.

본 발명에 따른 음성분석장치 및 음성분석장치의 동작 방법에 따르면, 단순 명령형의 음성 인터페이스 수준에서 벗어나 대화형의 음성 인터페이스를 실현하는 점에서, 기존 기술의 한계를 뛰어 넘음에 따라 관련 기술에 대한 이용만이 아닌 적용되는 장치의 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있는 정도이므로 산업상 이용가능성이 있는 발명이다.According to the voice analysis apparatus and the operation method of the voice analysis apparatus according to the present invention, in that it realizes an interactive voice interface beyond the level of a simple command-type voice interface, the use of related technologies as it goes beyond the limits of existing technologies It is an invention that has industrial applicability because it has sufficient potential for marketing or business of the applied device, but also to the extent that it can be clearly implemented in reality.

100 : 음성분석장치
110 : 음성수신부 120 : STT
130 : 문장구조구분부 140 : 발화의도판단부
150 : 출력부100: voice analysis device
110: voice receiver 120: STT
130: sentence structure division 140: utterance intention judgment unit
150: output unit

Claims

a voice receiver for receiving a voice input uttered by a user;
a sentence structure classification unit for classifying the sentence structure of the received voice input based on a specific word predefined for sentence structure determination;
and a speech intention determination unit configured to determine a speech intention with respect to the voice input according to the divided sentence structure;
The utterance intention determination unit,
When the specific word is included in the voice input, whether to utilize the result of classifying the sentence structure of the voice input based on the specific word according to whether the specific name information including the specific word is present in the voice input A voice analysis device, characterized in that for judging.

The method of claim 1,
The sentence structure division part,
When a predetermined specific word for determining the sentence structure is included in the voice input, the voice analysis apparatus characterized in that the sentence structure of the voice input is divided into a sentence structure matching the specific word.

3. The method of claim 2,
The sentence structure division part,
When a specific word predefined for sentence structure determination is included in the voice input, if it matches with specific name information previously stored for sentence structure determination, a first sentence structure for instructing processing of a single intention for the sentence structure of the voice input A voice analysis device, characterized in that it is divided into .

3. The method of claim 2,
The sentence structure division part,
When the specific word is not included in the voice input, the voice analysis apparatus according to claim 1, wherein the sentence structure of the voice input is divided into a first sentence structure for instructing processing of a single intention.

The method of claim 1,
The sentence structure of the voice input is,
a first sentence structure comprising a sentence instructing processing of a single intent;
a second sentence structure in which two or more sentences instructing the processing of each intent are connected with a specific word;
A third sentence structure for instructing only the intention processing of some sentences after the specific word among two or more sentences connected by a specific word;
A speech analysis apparatus, characterized in that a specific word is connected at the end of a sentence and divided into at least one of a fourth sentence structure for instructing waiting for intention processing according to the sentence.

6. The method of claim 5,
The utterance intention determination unit,
When the sentence structure of the voice input is divided into the second or the third sentence structure,
and determining whether to utilize a result of dividing the sentence structure of the voice input by using the entire voice input and a partial voice input classified based on the specific word in the entire voice input.

7. The method of claim 6,
The utterance intention determination unit,
If it is determined that the result of dividing the sentence structure of the voice input is used,
When divided into the second sentence structure, sequentially analyzing each of the two or more sentences divided by the specific word among the entire voice input to determine the utterance intention for each sentence of the voice input;
When the third sentence structure is divided, the speech analysis apparatus according to claim 1, wherein the utterance intention of the voice input is determined by analyzing only some sentences connected after the specific word among the entire voice input.

7. The method of claim 6,
The utterance intention determination unit,
If it is determined that the result of classifying the sentence structure of the voice input is not used, the entire voice input is analyzed to determine the utterance intention of the voice input.

6. The method of claim 5,
The voice receiving unit receives, as the voice input, a voice received by the user from the start of the utterance to the end of the utterance,
The utterance intention determination unit,
Speech analysis, characterized in that when the sentence structure of the voice input is divided into the fourth sentence structure, it waits for determination of the utterance intention for the voice input and causes the voice receiver to wait for voice reception even after the end of the utterance Device.

a voice receiving step of receiving a voice input uttered by a user;
a sentence structure classification step of classifying the sentence structure of the received voice input based on a specific word predefined for sentence structure determination; and
and a speech intention determination step of determining a speech intention with respect to the voice input according to the divided sentence structure;
In the utterance intention determination step,
When the specific word is included in the voice input, whether to utilize the result of classifying the sentence structure of the voice input based on the specific word according to whether the specific name information including the specific word is present in the voice input A method of operating a voice analysis device, characterized in that determining the.

11. The method of claim 10,
The sentence structure classification step is,
When a predetermined specific word for determining the sentence structure is included in the voice input, the method of operating a voice analysis apparatus, characterized in that the sentence structure of the voice input is divided into a sentence structure matching the specific word.

11. The method of claim 10,
The sentence structure of the voice input is,
a first sentence structure comprising a sentence instructing processing of a single intent;
a second sentence structure in which two or more sentences instructing the processing of each intent are connected with a specific word;
A third sentence structure for instructing only the intention processing of some sentences after the specific word among two or more sentences connected by a specific word;
A method of operating a voice analysis apparatus, characterized in that a specific word is connected at the end of a sentence and divided into at least one of a fourth sentence structure for instructing to wait for intention processing according to the sentence.

13. A computer-readable recording medium recording a program for executing the method of any one of claims 10 to 12.

A computer program stored on a medium for executing the method of any one of claims 10 to 12.