KR20220045741A

KR20220045741A - Apparatus, method and computer program for providing voice recognition service

Info

Publication number: KR20220045741A
Application number: KR1020200128751A
Authority: KR
Inventors: 백두산
Original assignee: 주식회사 케이티
Priority date: 2020-10-06
Filing date: 2020-10-06
Publication date: 2022-04-13

Abstract

A device that provides a speech recognition service may comprise: a receiving part that receives a first speech command word from a user after a call word is recognized; a determining part that determines whether or not a second speech command word satisfies a continuous situation recognition condition, when the second speech command word is received, after the speech recognition service corresponding to the first speech command word is provided; and a providing part that provides the speech recognition service corresponding to the second speech command word based on a determination result.

Description

Apparatus, method, and computer program for providing voice recognition service by judging continuous situations

본 발명은 연속 상황을 판단하여 음성 인식 서비스를 제공하는 장치, 방법 및 컴퓨터 프로그램에 관한 것이다. The present invention relates to an apparatus, method and computer program for providing a voice recognition service by determining a continuous situation.

일반적인 음성 인식 장치(예컨대, 인공 지능 스피커 등)는 음성 웨이크업(voice wake-up) 방식에 기반하여 음성 인식 서비스를 시작할 수 있다. 즉, 음성 인식 장치는 호출어에 대응하는 발화 명령어가 입력되기 전에는 사용자 발화에 대한 음성 인식을 수행하지 않고, 호출어에 대응하는 발화 명령어가 입력될 때 음성 인식을 개시한다. A general voice recognition device (eg, an artificial intelligence speaker, etc.) may start a voice recognition service based on a voice wake-up method. That is, the voice recognition apparatus does not perform voice recognition on the user's utterance before the utterance command corresponding to the call word is input, and starts voice recognition when the utterance command corresponding to the call word is input.

이와 관련하여, 음성 인식 장치의 활성화에 요구되는 호출어의 발화는 최초 1회만 요구되는 것이 아니다. 따라서, 동일한 사용자라도 시간 간격을 두고 추가의 음성 인식 서비스를 요청하는 경우, 추가의 음성 인식 서비스의 요청에 앞서 매번 호출어에 대응하는 발화 명령어를 발화해야 하는 번거로움이 있다. In this regard, the utterance of the call word required for activation of the voice recognition apparatus is not required only for the first time. Accordingly, when the same user requests an additional voice recognition service at a time interval, it is inconvenient to have to utter an utterance command corresponding to the calling word each time prior to requesting the additional voice recognition service.

도 1을 참조하면, 사용자가 활성화된 음성 인식 장치로부터 '선풍기 틀어줘'를 포함하는 음성 명령어에 대응하는 음성 인식 서비스(선풍기를 작동시키는 서비스)를 제공받은 후에 '선풍기 바람세기 낮게 해줘'를 포함하는 음성 명령어에 대응하는 음성 인식 서비스(선풍기의 바람세기를 높게 설정하는 서비스)를 받기 위해서는 호출어에 대응하는 발화 명령어를 재차 발화해야 한다. Referring to FIG. 1 , after the user is provided with a voice recognition service (service to operate the fan) corresponding to a voice command including 'turn on the fan' from the activated voice recognition device, 'make the fan wind strength lower' includes In order to receive the voice recognition service corresponding to the voice command (a service for setting the wind strength of the fan to be high), the utterance command corresponding to the calling word must be uttered again.

이렇듯 기존의 음성 인식 장치를 통해 음성 인식 서비스를 받기 위해 매번 호출어를 먼저 발화하여 음성 인식 장치를 활성화 상태로 전환시켜야 했다. As such, in order to receive a voice recognition service through the existing voice recognition device, it was necessary to first utter a call word each time to switch the voice recognition device to an active state.

한국공개특허공보 제2019-0096856호 (2019.08.20. 공개)Korean Patent Publication No. 2019-0096856 (published on August 20, 2019)

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 다양한 조건을 통해 연속 상황을 인식함으로써 사용자가 최초의 호출어만을 발화하더라도 순차적으로 발화되는 음성 명령어를 인식하는 음성 인식 서비스를 제공하고자 한다. An object of the present invention is to solve the problems of the prior art, and to provide a voice recognition service that recognizes sequentially uttered voice commands even when a user utters only the first call word by recognizing continuous situations through various conditions.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problems to be achieved by the present embodiment are not limited to the technical problems described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 제 1 측면에 따른 음성 인식 서비스를 제공하는 장치는 호출어가 인식된 후, 사용자로부터 제 1 음성 명령어를 수신하는 수신부; 상기 제 1 음성 명령어에 대응하는 음성 인식 서비스가 제공된 후, 제 2 음성 명령어가 수신된 경우, 상기 제 2 음성 명령어가 연속 상황 인식 조건을 만족하는지 여부를 판단하는 판단부; 및 판단 결과에 기초하여 상기 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공하는 제공부를 포함할 수 있다. As a technical means for achieving the above technical problem, an apparatus for providing a voice recognition service according to a first aspect of the present invention includes: a receiver configured to receive a first voice command from a user after a call word is recognized; a determination unit configured to determine whether the second voice command satisfies a continuous context recognition condition when a second voice command is received after the voice recognition service corresponding to the first voice command is provided; and a providing unit that provides a voice recognition service corresponding to the second voice command based on the determination result.

본 발명의 제 2 측면에 따른 음성 인식 서비스 제공 장치에 의해 수행되는 음성 인식 서비스를 제공하는 방법은 호출어가 인식된 후, 사용자로부터 제 1 음성 명령어를 수신하는 단계; 상기 제 1 음성 명령어에 대응하는 음성 인식 서비스가 제공된 후, 상기 사용자로부터 제 2 음성 명령어를 수신하는 단계; 상기 제 2 음성 명령어가 연속 상황 인식 조건을 만족하는지 여부를 판단하는 단계; 및 판단 결과에 기초하여 상기 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공하는 단계를 포함할 수 있다. A method of providing a voice recognition service performed by an apparatus for providing a voice recognition service according to a second aspect of the present invention comprises: receiving a first voice command from a user after a call word is recognized; receiving a second voice command from the user after a voice recognition service corresponding to the first voice command is provided; determining whether the second voice command satisfies a continuous context recognition condition; and providing a voice recognition service corresponding to the second voice command based on the determination result.

본 발명의 제 3 측면에 따른 음성 인식 서비스를 제공하는 명령어들의 시퀀스를 포함하는 컴퓨터 판독가능 기록매체에 저장된 컴퓨터 프로그램은 컴퓨팅 장치에 의해 실행될 경우, 호출어가 인식된 후, 사용자로부터 제 1 음성 명령어를 수신하고, 상기 제 1 음성 명령어에 대응하는 음성 인식 서비스가 제공된 후, 제 2 음성 명령어가 수신된 경우, 상기 제 2 음성 명령어가 연속 상황 인식 조건을 만족하는지 여부를 판단하고, 판단 결과에 기초하여 상기 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공하는 명령어들의 시퀀스를 포함할 수 있다. When a computer program stored in a computer readable recording medium including a sequence of instructions for providing a voice recognition service according to the third aspect of the present invention is executed by a computing device, after a call word is recognized, a first voice command is issued from the user After receiving and providing a voice recognition service corresponding to the first voice command, when a second voice command is received, it is determined whether the second voice command satisfies a continuous context recognition condition, and based on the determination result It may include a sequence of commands for providing a voice recognition service corresponding to the second voice command.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary, and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 제 1 음성 명령어에 대응하는 음성 인식 서비스가 제공된 후, 제 2 음성 명령어가 연속적으로 수신되면, 제 2 음성 명령어가 연속 상황 인식 조건을 만족하는지 여부에 기초하여 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다. According to any one of the above-described problem solving means of the present invention, if the second voice command is continuously received after the voice recognition service corresponding to the first voice command is provided, whether the second voice command satisfies the continuous context recognition condition It is possible to provide a voice recognition service corresponding to the second voice command based on whether or not there is.

이를 통해, 본 발명은 제 2 음성 명령어가 연속 상황 인식 조건을 만족하면, 호출어의 발화를 생략하고, 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다. Through this, in the present invention, when the second voice command satisfies the continuous context recognition condition, the utterance of the call word may be omitted and a voice recognition service corresponding to the second voice command may be provided.

또한, 본 발명은 사용자가 반복적으로 호출어를 발화하지 않더라도 연속 상황에 속하는 복수의 음성 명령어에 대하여 음성 인식 서비스를 제공할 수 있다. In addition, the present invention can provide a voice recognition service for a plurality of voice commands belonging to a continuous situation even if the user does not repeatedly utter a call word.

도 1은 종래의 음성 인식 서비스의 제공 방법을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른, 음성 인식 서비스 제공 시스템의 구성도이다.
도 3은 본 발명의 일 실시예에 따른, 도 1에 도시된 음성 인식 서비스 제공 장치의 블록도이다.
도 4a 내지 4b는 본 발명의 일 실시예에 따른, 제어 카테고리별 발화 의도를 매핑하는 방법을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른, 음성 명령어에 대한 대화분석 결과를 도시한 도면이다.
도 6은 본 발명의 일 실시예에 따른, 외부 환경의 사전 상태 정보를 포함한 테이블을 나타낸 도면이다.
도 7a 내지 7b는 본 발명의 일 실시예에 따른, 사용자의 피드백 정보에 따라 제어 카테고리별 발화 의도를 업데이트하는 방법을 설명하기 위한 도면이다.
도 8은 본 발명의 일 실시예에 따른, 음성 인식 서비스를 제공하는 방법을 나타낸 흐름도이다.
도 9는 본 발명의 일 실시예에 따른, 연속 상황을 판단하여 음성 인식 서비스를 제공하는 방법을 나타낸 흐름도이다.
도 10은 본 발명의 다른 실시예에 따른, 연속 상황을 판단하여 음성 인식 서비스를 제공하는 방법을 나타낸 흐름도이다.
도 11a 내지 11b는 본 발명의 일 실시예에 따른, 연속 상황시 호출어 발화를 생략한 음성 인식 서비스가 적용되었을 때 일어나는 시나리오를 도시한 도면이다. 1 is a diagram for explaining a conventional method of providing a voice recognition service.
2 is a block diagram of a system for providing a voice recognition service according to an embodiment of the present invention.
3 is a block diagram of an apparatus for providing a voice recognition service shown in FIG. 1 according to an embodiment of the present invention.
4A to 4B are diagrams for explaining a method of mapping utterance intentions for each control category, according to an embodiment of the present invention.
5 is a diagram illustrating a dialogue analysis result for a voice command according to an embodiment of the present invention.
6 is a diagram illustrating a table including prior state information of an external environment according to an embodiment of the present invention.
7A to 7B are diagrams for explaining a method of updating a utterance intention for each control category according to a user's feedback information, according to an embodiment of the present invention.
8 is a flowchart illustrating a method of providing a voice recognition service according to an embodiment of the present invention.
9 is a flowchart illustrating a method of providing a voice recognition service by determining a continuous situation according to an embodiment of the present invention.
10 is a flowchart illustrating a method of providing a voice recognition service by determining a continuous situation according to another embodiment of the present invention.
11A to 11B are diagrams illustrating scenarios occurring when a voice recognition service omitting a call word utterance in a continuous situation is applied, according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . Also, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다. Some of the operations or functions described as being performed by the terminal or device in this specification may be instead performed by a server connected to the terminal or device. Similarly, some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the server.

이하, 첨부된 구성도 또는 처리 흐름도를 참고하여, 본 발명의 실시를 위한 구체적인 내용을 설명하도록 한다. Hereinafter, detailed contents for carrying out the present invention will be described with reference to the accompanying configuration diagram or process flow diagram.

도 2는 본 발명의 일 실시예에 따른, 음성 인식 서비스 제공 시스템의 구성도이다. 2 is a block diagram of a system for providing a voice recognition service according to an embodiment of the present invention.

도 2를 참조하면, 음성 인식 서비스 제공 시스템은 음성 인식 서비스 제공 장치(100), 음성 인식 서버(110), 음성 분석 서버(120) 및 음성 합성 서버(130)를 포함할 수 있다. 다만, 이러한 도 1의 음성 인식 서비스 제공 시스템은 본 발명의 일 실시예에 불과하므로 도 1을 통해 본 발명이 한정 해석되는 것은 아니며, 본 발명의 다양한 실시예들에 따라 도 1과 다르게 구성될 수도 있다. Referring to FIG. 2 , the voice recognition service providing system may include a voice recognition service providing apparatus 100 , a voice recognition server 110 , a voice analysis server 120 , and a voice synthesis server 130 . However, since the voice recognition service providing system of FIG. 1 is only an embodiment of the present invention, the present invention is not limitedly interpreted through FIG. 1 , and may be configured differently from FIG. 1 according to various embodiments of the present invention. there is.

음성 인식 서비스 제공 장치(100)는 사용자와의 의사 소통을 통해 음성 인식, 음성 검색, 음성 번역, 음성 비서 등의 인공 지능 기반의 기능을 제공할 수 있다. 예를 들어, 음성 인식 서비스 제공 장치(100)는 상술한 인공 지능 기반의 기능을 제공하는 AI 스피커일 수 있으나 이에 한정되지 않는다. 즉, 음성 인식 서비스 제공 장치(100)는 상술한 인공 지능 기반의 기능의 제공이 가능한 모든 단말일 수 있다.The apparatus 100 for providing a voice recognition service may provide artificial intelligence-based functions such as voice recognition, voice search, voice translation, and voice assistant through communication with a user. For example, the apparatus 100 for providing a voice recognition service may be an AI speaker that provides the aforementioned artificial intelligence-based function, but is not limited thereto. That is, the apparatus 100 for providing a voice recognition service may be any terminal capable of providing the aforementioned artificial intelligence-based function.

음성 인식 서비스 제공 장치(100)는 사용자로부터 음성 인식 서비스 제공 장치(100)의 활성화를 위한 호출어를 수신하면, 음성 인식 서비스를 위한 준비를 수행할 수 있다. When the voice recognition service providing apparatus 100 receives a call word for activation of the voice recognition service providing apparatus 100 from the user, it may prepare for the voice recognition service.

음성 인식 서비스 제공 장치(100)는 사용자로부터 수신된 호출어를 인식한 후, 사용자로부터 제 1 음성 명령어를 수신할 수 있다. 여기서, 호출어는 음성 인식 서비스 제공 장치(100)가 음성 명령어의 수신을 개시하기 위해 기설정된 단어(예컨대, 지니야)를 의미한다.After recognizing the call word received from the user, the apparatus 100 for providing a voice recognition service may receive a first voice command from the user. Here, the call word means a preset word (eg, Genie) for the apparatus 100 for providing a voice recognition service to initiate reception of a voice command.

음성 인식 서비스 제공 장치(100)는 수신된 제 1 음성 명령어에 대한 사운드 데이터를 음성 인식 서버(110)에게 전송할 수 있다. The voice recognition service providing apparatus 100 may transmit sound data for the received first voice command to the voice recognition server 110 .

음성 인식 서버(110)는 음성 인식 서비스 제공 장치(100)로부터 수신된 사운드를 텍스트로 변환하고, 변환된 텍스트를 음성 인식 서비스 제공 장치(100)에게 전송할 수 있다. The voice recognition server 110 may convert the sound received from the voice recognition service providing apparatus 100 into text, and transmit the converted text to the voice recognition service providing apparatus 100 .

음성 인식 서비스 제공 장치(100)는 음성 인식 서버(110)로부터 수신된 텍스트를 음성 분석 서버(120)에게 전송할 수 있다. The voice recognition service providing apparatus 100 may transmit the text received from the voice recognition server 110 to the voice analysis server 120 .

음성 분석 서버(120)는 해당 텍스트를 분석하고, 분석 결과를 음성 인식 서비스 제공 장치(100)에게 전송할 수 있다. 음성 분석 서버(120)는 해당 텍스트를 기설정된 포맷으로 변환하여 변환된 포맷의 정보를 서비스 제공 장치(100)에게 전송할 수 있다. 예를 들어, 제 1 음성 명령어에 대응하는 텍스트가 '오늘 날씨 알려줘'인 경우, 음성 분석 서버(120)는 'TYPE= 날씨, INTENT = 질문' 이라는 정보를 음성 인식 서비스 제공 장치(100)에게 전송할 수 있다. The voice analysis server 120 may analyze the corresponding text and transmit the analysis result to the voice recognition service providing apparatus 100 . The voice analysis server 120 may convert the corresponding text into a preset format and transmit information in the converted format to the service providing apparatus 100 . For example, if the text corresponding to the first voice command is 'Tell me the weather today', the voice analysis server 120 transmits information 'TYPE=weather, INTENT=question' to the voice recognition service providing apparatus 100 can

음성 인식 서비스 제공 장치(100)는 음성 분석 서버(120)에 의한 분석 결과에 기초하여 제 1 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다. 예를 들어, 제 1 음성 명령어에 대하여 음성이 출력될 필요가 없는 경우(예컨대, TV 채널 변경, 특정 정보를 TV로 출력), 음성 인식 서비스 제공 장치(100)는 제 1 음성 명령어에 대응하는 특정 기능을 수행할 수 있다. The voice recognition service providing apparatus 100 may provide a voice recognition service corresponding to the first voice command based on the analysis result by the voice analysis server 120 . For example, when it is not necessary to output a voice with respect to the first voice command (eg, changing a TV channel, outputting specific information to the TV), the apparatus 100 for providing a voice recognition service provides a specific information corresponding to the first voice command. function can be performed.

다른 예를 들어, 제 1 음성 명령어에 대하여 음성이 출력될 필요가 있는 경우(예컨대, 날씨 정보 제공 등), 음성 인식 서비스 제공 장치(100)는 제 1 음성 명령어에 대한 분석 결과에 기초하여 외부 서버(미도시)에게 제 1 음성 명령어에 대응하는 응답 정보(예컨대, 날씨 정보)를 요청하고, 외부 서버(미도시)로부터 제 1 음성 명령어에 대응하는 응답 정보를 수신할 수 있다. As another example, when it is necessary to output a voice with respect to the first voice command (eg, providing weather information, etc.), the apparatus 100 for providing a voice recognition service is configured to provide an external server based on the analysis result of the first voice command. It is possible to request response information (eg, weather information) corresponding to the first voice command from (not shown), and receive response information corresponding to the first voice command from an external server (not shown).

이 후, 음성 인식 서비스 제공 장치(100)는 외부 서버(미도시)로부터 수신된 제 1 음성 명령어에 대응하는 응답 정보를 음성 합성 서버(130)에게 전송하고, 음성 인식 서버(130)로부터 해당 응답 정보에 대응하는 사운드를 수신할 수 있다. Thereafter, the voice recognition service providing apparatus 100 transmits response information corresponding to the first voice command received from an external server (not shown) to the voice synthesis server 130 , and the corresponding response from the voice recognition server 130 . A sound corresponding to the information may be received.

음성 인식 서비스 제공 장치(100)는 음성 합성 서버(130)로부터 수신한 사운드를 스피커를 통해 출력함으로써 제 1 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다.The voice recognition service providing apparatus 100 may provide a voice recognition service corresponding to the first voice command by outputting the sound received from the voice synthesis server 130 through a speaker.

음성 인식 서비스 제공 장치(100)는 제 1 음성 명령어에 대응하는 음성 인식 서비스를 제공하고, 연이어 사용자로부터 제 2 음성 명령어를 수신할 수 있다. The voice recognition service providing apparatus 100 may provide a voice recognition service corresponding to the first voice command and sequentially receive the second voice command from the user.

음성 인식 서비스 제공 장치(100)는 제 2 음성 명령어가 연속 상황 인식 조건을 만족하는지 여부를 판단할 수 있다. 여기서, 연속 상황 인식 조건은 기설정된 시간 내 제 2 음성 명령어가 수신되는 조건을 포함할 수 있다. 즉, 음성 인식 서비스 제공 장치(100)는 특정 음성 인식 서비스를 제공한 후, 기설정된 시간 동안 호출어가 인식되지 않더라도 음성 인식을 계속 수행한다.The apparatus 100 for providing a voice recognition service may determine whether the second voice command satisfies a continuous context recognition condition. Here, the continuous context recognition condition may include a condition in which the second voice command is received within a preset time. That is, the apparatus 100 for providing a voice recognition service continues to perform voice recognition even if a call word is not recognized for a preset time after providing a specific voice recognition service.

예를 들어, 음성 인식 서비스 제공 장치(100)는 제 2 음성 명령어가 기설정된 시간 내에 수신된 경우, 제 2 음성 명령어를 제 1 음성 명령어와 연관된 연속 상황으로 판단할 수 있다. For example, when the second voice command is received within a preset time, the apparatus 100 for providing a voice recognition service may determine the second voice command as a continuous situation associated with the first voice command.

음성 인식 서비스 제공 장치(100)는 제 2 음성 명령어가 연속 상황 인식 조건을 만족하는지에 대한 판단 결과에 기초하여 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다. The voice recognition service providing apparatus 100 may provide a voice recognition service corresponding to the second voice command based on a result of determining whether the second voice command satisfies the continuous context recognition condition.

음성 인식 서비스 제공 장치(100)는 제 1 음성 명령어에 대한 음성 인식 서비스가 제공되는 방식으로 제 2 음성 명령어에 대한 음성 인식 서비스를 제공할 수 있다.The apparatus 100 for providing a voice recognition service may provide a voice recognition service for the second voice command in such a way that the voice recognition service for the first voice command is provided.

이하에서는 도 1의 음성 인식 서비스 제공 시스템의 각 구성요소의 동작에 대해 보다 구체적으로 설명한다. Hereinafter, the operation of each component of the voice recognition service providing system of FIG. 1 will be described in more detail.

도 3은 본 발명의 일 실시예에 따른, 도 2에 도시된 음성 인식 서비스 제공 장치(100)의 블록도이다. 3 is a block diagram of the apparatus 100 for providing a voice recognition service shown in FIG. 2 according to an embodiment of the present invention.

도 3을 참조하면, 음성 인식 서비스 제공 장치(100)는 수신부(300), 판단부(310), 제공부(320), 업데이트부(330) 및 저장부(340)를 포함할 수 있다. 다만, 도 3에 도시된 음성 인식 서비스 제공 장치(100)는 본 발명의 하나의 구현 예에 불과하며, 도 3에 도시된 구성요소들을 기초로 하여 여러 가지 변형이 가능하다. Referring to FIG. 3 , the apparatus 100 for providing a voice recognition service may include a receiving unit 300 , a determining unit 310 , a providing unit 320 , an update unit 330 , and a storage unit 340 . However, the apparatus 100 for providing a voice recognition service illustrated in FIG. 3 is only one implementation example of the present invention, and various modifications are possible based on the components illustrated in FIG. 3 .

이하에서는 도 4a 내지 7b, 도 11a 내지 11b를 함께 참조하여 도 3을 설명하기로 한다. Hereinafter, FIG. 3 will be described with reference to FIGS. 4A to 7B and FIGS. 11A to 11B .

대화 모델 구축부(미도시)는 사용자로부터 수신되는 음성 명령어들에 대한 대화 분석을 위해 음성 서비스별로 서로 다른 대화 모델(음성 서비스별로 구분된 발화 의도를 학습한 대화 모델들)을 구축할 수 있다. 여기서, 서비스별로 서로 다른 대화 모델을 구축하는 이유는 동일한 발화임에도 서비스별로 발화 의도가 다를 수 있기 때문에다. 예를 들어, '알려줘'의 음성 명령어와 '보여줘'의 음성 명령어는 서비스에 따라 동일한 발화 의도로 파악될 수 있고, 서로 다른 발화 의도로도 파악될 수 있다. 예를 들어, 도 4a를 참조하면, 대화 모델 구축부(미도시)는 대화모델 관리 페이지(40)를 통해 입력된 음성 서비스별 발화 의도에 대한 정보에 기초하여 음성 서비스별로 대화 모델을 구축할 수 있다. 여기서, 발화 의도에 대한 정보는 설정하고자 하는 음성 서비스에 대응하는 제어 카테고리의 선택 정보(예컨대, '주문'), 선택된 제어 카테고리에 매핑될 발화 의도명(예컨대, '피자 주문') 및 발화 의도에 대응하는 대화 구문(예컨대, '치즈피자 시켜줘')을 포함할 수 있다. The dialog model building unit (not shown) may build different dialog models for each voice service (dialog models in which utterance intentions classified for each voice service are learned) for dialog analysis of voice commands received from the user. Here, the reason for constructing different dialog models for each service is that the utterance intention may be different for each service even though the utterance is the same. For example, a voice command of 'tell me' and a voice command of 'show me' may be identified as the same utterance intention or different utterance intentions depending on the service. For example, referring to FIG. 4A , the dialog model building unit (not shown) may build a dialog model for each voice service based on information on the utterance intention for each voice service input through the dialog model management page 40 . there is. Here, the information on the utterance intention includes selection information (eg, 'order') of the control category corresponding to the voice service to be set, the utterance intention name to be mapped to the selected control category (eg, 'pizza order'), and the utterance intention. It may include a corresponding dialogue phrase (eg, 'make a cheese pizza').

대화 분석부(미도시)는 구축된 음성 서비스별 대화 모델을 이용하여 사용자의 음성 명령어를 분석할 수 있다. 예를 들어, 대화 분석부(미도시)는 사용자의 음성 명령어에 대한 발화 의도가 무엇인지를 음성 서비스별 대화 모델을 통해 도출할 수 있다. 여기서, 대화 분석부(미도시)를 통해 분석된 분석 결과는 예를 들어, 음성 명령어에 대응하는 발화 의도, 음성 명령어를 발화한 사용자의 감정 정보, 분석 결과에 대한 정확도 정보 등을 포함할 수 있다. 예를 들어, 도 5를 참조하면, 대화 분석부(미도시)는 음성 서비스별 대화 모델을 통해 '동물 중급으로 선택해줘'의 음성 명령어를 분석한 결과로서 'selectLevel' 발화 의도(50)를 도출할 수 있다. The conversation analysis unit (not shown) may analyze the user's voice command by using the established conversation model for each voice service. For example, the conversation analysis unit (not shown) may derive what the user's utterance intention for the voice command is through the conversation model for each voice service. Here, the analysis result analyzed through the dialogue analyzer (not shown) may include, for example, a utterance intention corresponding to a voice command, emotional information of a user who uttered the voice command, accuracy information on the analysis result, and the like. . For example, referring to FIG. 5 , the conversation analysis unit (not shown) derives the utterance intention 50 of 'selectLevel' as a result of analyzing the voice command of 'Choose me as an intermediate animal' through a conversation model for each voice service. can do.

저장부(340)는 복수의 제어 카테고리마다 복수의 발화 의도를 매핑하여 저장할 수 있다. The storage 340 may map and store a plurality of utterance intentions for each of a plurality of control categories.

예를 들어, 도 4b를 참조하면, 저장부(340)는 연속으로 수신되는 사용자의 음성 명령어들에 대한 연속 상황을 판단하기 위해 복수의 제어 카테고리별로 복수의 발화 의도를 매핑한 제어 카테고리 테이블(42)을 저장할 수 있다. 예를 들어, IoT 제어 카테고리에는 '에어컨 켜줘'의 음성 명령어에 대응하는 'TurnOnAC' 발화 의도와, '에어컨 온도 올려줘'의 음성 명령어에 대응하는 'UpACTemp' 발화 의도가 매핑될 수 있다. TV 제어 카테고리에는 'TV 켜줘'의 음성 명령어에 대응하는 'TurnOnTV' 발화 의도와, 'TV 채널 올려줘'의 음성 명령어에 대응하는 'UpTVCH' 발화 의도가 매핑될 수 있다. For example, referring to FIG. 4B , the storage unit 340 stores a control category table 42 in which a plurality of utterance intentions are mapped for each of a plurality of control categories in order to determine a continuation situation with respect to the continuously received user's voice commands. ) can be stored. For example, the 'TurnOnAC' utterance intention corresponding to the voice command of 'Turn on the air conditioner' and the 'UpACTemp' utterance intention corresponding to the voice command of 'Raise the air conditioner temperature' may be mapped to the IoT control category. The 'TurnOnTV' utterance intention corresponding to the voice command of 'Turn on TV' and the 'UpTVCH' utterance intent corresponding to the voice command of 'Turn up TV channel' may be mapped to the TV control category.

여기서, 각 제어 카테고리마다 연속 상황을 판단하기 위한 조건으로서, 선행의 음성 명령어에 대한 음성 인식 서비스가 수행된 후, 후행의 음성 명령어가 수신되기까지의 시간(즉, 기설정된 시간)이 제어 카테고리 테이블(42)에 저장될 수 있다. 예를 들어, 기설정된 시간은 제어 카테고리마다 상이하게 설정될 수 있다. 예를 들어, 기설정된 시간은 음성인식 서비스 제공 장치(100)에 의해 자동으로 설정된 고정값일 수 있고, 사용자로부터 입력받은 변동값일 수 있다. Here, as a condition for determining the continuity situation for each control category, the time (that is, a preset time) after the voice recognition service for the preceding voice command is performed until the subsequent voice command is received (that is, a preset time) is the control category table (42) can be stored. For example, the preset time may be set differently for each control category. For example, the preset time may be a fixed value automatically set by the apparatus 100 for providing a voice recognition service, or may be a variable value input from a user.

이러한, 복수의 제어 카테고리별 발화 의도를 포함한 제어 카테고리 테이블(42)은 연속되는 음성 명령어들이 연속 상황에 속하는지 여부를 판별하는데 이용될 수 있다. 예를 들어, '에어컨 켜줘'의 음성 명령어에 대응하는 음성 서비스가 제공된 후에 사용자가 'TV 켜줘'의 음성 명령어를 연이어 발화한 경우, '에어컨 켜줘'의 음성 명령어에 대응하는 발화 의도와 'TV 켜줘'의 음성 명령어에 대응하는 발화 의도가 서로 다른 제어 카테고리에 속하므로 연속 상황이 아닌 것으로 판단될 수 있다. 만일, '에어컨 켜줘'의 음성 명령어에 대응하는 음성 인식 서비스가 제공된 후에 사용자가 '에어컨 온도 올려줘'의 음성 명령어를 연이어 발화한 경우, '에어컨 켜줘'의 음성 명령어에 대응하는 발화 의도와 '에어컨 온도 올려줘'의 음성 명령어에 대응하는 발화 의도가 동일한 제어 카테고리에 속하므로 연속 상황인 것으로 판단될 수 있다. The control category table 42 including the utterance intentions for each of the plurality of control categories may be used to determine whether consecutive voice commands belong to a continuous situation. For example, if the user consecutively utters the voice command of 'Turn on TV' after the voice service corresponding to the voice command of 'Turn on the air conditioner' is provided, the utterance intention corresponding to the voice command of 'Turn on the air conditioner' and 'Turn on the TV' Since the utterance intention corresponding to the voice command of ' belongs to different control categories, it may be determined that the situation is not continuous. If, after the voice recognition service corresponding to the voice command of 'Turn on the air conditioner' is provided, if the user consecutively utters the voice command of 'Raise the air conditioner temperature', the utterance intention corresponding to the voice command of 'Turn on the air conditioner' and the Since the utterance intention corresponding to the voice command of 'Upgrade' belongs to the same control category, it may be determined as a continuous situation.

저장부(340)는 제어 카테고리 테이블에 발화 의도별로 기설정된 외부 환경의 사전 상태 정보를 더 매핑하여 저장할 수 있다. 또는 저장부(340)는 복수의 발화 의도별로 기설정된 외부 환경의 사전 상태 정보를 매핑시켜 저장할 수 있다. The storage unit 340 may further map and store pre-state information of an external environment preset for each utterance intention in the control category table. Alternatively, the storage unit 340 may map and store pre-state information of a preset external environment for each of a plurality of utterance intentions.

예를 들어, 도 6을 참조하면, 저장부(340)는 연속으로 수신되는 사용자의 음성 명령어들에 대한 연속 상황을 판단하기 위해 복수의 발화 의도별로 기설정된 외부 환경의 사전 상태 정보가 매핑된 외부 환경 사전 상태 테이블(60)을 저장할 수 있다. 예를 들어, '에어컨 켜짐'이라는 외부 환경의 사전 상태 정보에는 '에어컨 온도 올려줘'의 음성 명령어에 대응하는 'UpACTemp' 발화 의도 및 '에어컨 꺼줘'의 음성 명령어에 대응하는 'TurnOffAC' 발화 의도가 매핑될 수 있다.For example, referring to FIG. 6 , the storage unit 340 is an external device to which pre-state information of an external environment preset for each of a plurality of utterance intentions is mapped in order to determine a continuation situation with respect to the continuously received user's voice commands. The environment dictionary state table 60 may be stored. For example, the 'UpACTemp' utterance intention corresponding to the voice command of 'Turn up the air conditioner temperature' and the 'TurnOffAC' utterance intention corresponding to the voice command of 'Turn off the air conditioner' are mapped to the pre-state information of the external environment of 'air conditioner on'. can be

또한, 'TV 켜짐'이라는 외부 환경의 사전 상태 정보에는 'TV 채널 올려줘'의 음성 명령어에 대응하는 'UpTVCH' 발화 의도 및 'TV 꺼줘'의 음성 명령어에 대응하는 'TurnOffTV' 발화 의도가 매핑될 수 있다. In addition, the 'UpTVCH' utterance intention corresponding to the voice command of 'Turn up TV channel' and the 'TurnOffTV' utterance intent corresponding to the voice command of 'Turn off TV' may be mapped to the prior state information of the external environment of 'TV on'. there is.

이러한, 발화 의도별로 매핑된 기설정된 외부 환경의 사전 상태 정보를 포함한 외부 환경 사전 상태 테이블(60)은 연속되는 음성 명령어들이 연속 상황에 속하는지 여부를 판별하는데 이용될 수 있다. The external environment dictionary state table 60 including preset external environment prior state information mapped for each utterance intention may be used to determine whether successive voice commands belong to a continuous context.

예를 들어, '에어컨 켜줘'의 음성 명령어에 따라 에어컨이 켜진 후에 사용자가 '에어컨 온도 올려줘'의 음성 명령어를 발화한 경우, '에어컨 온도 올려줘'의 음성 명령어를 발화한 시점의 외부 환경의 상태가 에어컨이 켜진 상태이므로 이는 '에어컨 온도 올려줘'의 음성 명령어에 대응하는 발화 의도('UpACTemp')에 매핑된 외부 환경의 사전 상태 정보를 만족하기 때문에 연속 상황으로 판단하고, '에어컨 온도 올려줘'의 음성 명령어에 대한 음성 인식 서비스가 제공될 수 있다. For example, if the user utters the voice command of 'Raise the air conditioner temperature' after the air conditioner is turned on according to the voice command of 'Turn on the air conditioner', the state of the external environment at the time the voice command of 'Raise the air conditioner temperature' is ignited Since the air conditioner is on, it is judged as a continuous situation because it satisfies the pre-state information of the external environment mapped to the utterance intention ('UpACTemp') corresponding to the voice command of 'Raise the air conditioner temperature', and the voice of 'Raise the air conditioner temperature' A voice recognition service for commands may be provided.

본 실시예에 따르면, 음성 인식 서비스 제공 장치(100)는 외부 환경의 사전 상태 정보에 따라 활성화된 외부 환경에 대응하는 발화 의도에 관한 음성 명령어가 수신되는 경우, 호출어 없이 음성 인식 서비스를 개시할 수 있다.According to the present embodiment, the voice recognition service providing apparatus 100 may start the voice recognition service without a call word when a voice command related to a utterance intention corresponding to an external environment activated according to the prior state information of the external environment is received. can

수신부(300)는 음성 인식 서비스 제공 장치(100)를 활성화시키기 위한 호출어가 인식된 후, 사용자로부터 제 1 음성 명령어를 수신할 수 있다. The receiver 300 may receive a first voice command from the user after a call word for activating the voice recognition service providing apparatus 100 is recognized.

수신부(300)는 제 1 음성 명령어에 대응하는 음성 인식 서비스가 제공된 후, 사용자로부터 제 2 음성 명령어를 수신할 수 있다. The receiver 300 may receive the second voice command from the user after the voice recognition service corresponding to the first voice command is provided.

판단부(310)는 제 2 음성 명령어가 연속 상황 인식 조건을 만족하는지 여부를 판단할 수 있다. 여기서, 연속 상황 인식 조건은 상술한 바와 같이, 기설정된 시간 내 제 2 음성 명령어가 수신되는 조건을 포함할 수 있다. 본 실시예에 따르면, 음성 인식 서비스 제공 장치(100)는 선행의 음성 명령어에 대한 음성 인식 서비스가 수행된 후, 기설정된 시간 동안 호출어 없이 음성 인식 서비스를 개시할 수 있다.The determination unit 310 may determine whether the second voice command satisfies a continuous context recognition condition. Here, the continuous context recognition condition may include a condition in which the second voice command is received within a preset time, as described above. According to the present embodiment, the apparatus 100 for providing a voice recognition service may start a voice recognition service without a call word for a preset time after the voice recognition service for a preceding voice command is performed.

또한, 연속 상황 인식 정보는 제 2 음성 명령어의 발화 의도가 제 1 음성 명령어의 발화 의도가 속하는 제어 카테고리에 포함되고, 제 1 음성 명령어의 발화 의도가 속하는 제어 카테고리에 설정된 기설정된 시간 내에 제 2 음성 명령어가 수신되는 조건을 포함할 수 있다. In addition, the continuous context recognition information includes the second voice command within a preset time set in the control category to which the utterance intention of the first voice command belongs, and the utterance intention of the first voice command belongs to the continuous context recognition information. It may include a condition that the command is received.

판단부(310)는 제 2 음성 명령어가 기설정된 시간 내에 수신된 경우, 제 2 음성 명령어를 제 1 음성 명령어와 연관된 연속 상황으로 판단할 수 있다. When the second voice command is received within a preset time, the determination unit 310 may determine the second voice command as a continuous situation associated with the first voice command.

판단부(310)는 제 2 음성 명령어의 발화 의도가 제 1 음성 명령어의 발화 의도가 속하는 제어 카테고리에 포함되고, 제 1 음성 명령어의 발화 의도가 속하는 제어 카테고리에 설정된 기설정된 시간 내에 제 2 음성 명령어가 수신된 것으로 판단되면, 제 2 음성 명령어를 제 1 음성 명령어와 연관된 연속 상황으로 판단할 수 있다. The determination unit 310 includes the second voice command within a preset time set in the control category to which the utterance intention of the first voice command belongs, and the utterance intention of the second voice command is included in the control category to which the utterance intention of the first voice command belongs. If it is determined that the second voice command has been received, it may be determined as a continuous situation associated with the first voice command.

예를 들어, 도 4b를 참조하면, '에어컨 켜줘'의 제 1 음성 명령어에 대응하는 음성 서비스가 제공된 후에 사용자로부터 '에어컨 온도 올려줘'의 제 2 음성 명령어를 연이어 수신한 경우(즉, 음성 서비스가 제공된 후 기설정된 시간 내에 제 2 음성 명령어가 수신됨), 판단부(310)는 '에어컨 켜줘'의 제 1 음성 명령어에 대응하는 발화 의도('TurnOnAC')가 속한 IoT 제어 카테고리에 '에어컨 온도 올려줘'의 제 2 음성 명령어에 대응하는 발화 의도('UpACTemp')가 속하므로 '에어컨 온도 올려줘'의 제 2 음성 명령어를 '에어컨 켜줘'의 제 1 음성 명령어와 연관된 연속 상황으로 판단할 수 있다. For example, referring to FIG. 4B , when the second voice command of 'Raise the air conditioner temperature' is continuously received from the user after the voice service corresponding to the first voice command of 'Turn on the air conditioner' is provided (that is, the voice service is After being provided, the second voice command is received within a preset time), the determination unit 310 'Raise the air conditioner temperature' to the IoT control category to which the ignition intention ('TurnOnAC') corresponding to the first voice command of 'Turn on the air conditioner' belongs Since the utterance intention ('UpACTemp') corresponding to the second voice command of ' belongs to, the second voice command of 'Raise the air conditioner temperature' can be determined as a continuous situation associated with the first voice command of 'Turn on the air conditioner'.

또한, 판단부(310)는 '에어컨 켜줘'의 제 1 음성 명령어에 대응하는 발화 의도('TurnOnAC')가 속한 IoT 제어 카테고리에 속한 '에어컨 온도 올려줘'의 제 2 음성 명령어가 기설정된 제 1 시간(예컨대, 5초) 내에 수신되면, '에어컨 온도 올려줘'의 제 2 음성 명령어를 '에어컨 켜줘'의 제 1 음성 명령어와 연관된 연속 상황으로 판단할 수 있다. In addition, the determination unit 310 is a second voice command of 'Raise the air conditioner temperature' belonging to the IoT control category to which the utterance intention ('TurnOnAC') corresponding to the first voice command of 'Turn on the air conditioner' is a preset first time If received within (eg, 5 seconds), it may be determined that the second voice command of 'raise the air conditioner temperature' is a continuous situation associated with the first voice command of 'turn on the air conditioner'.

판단부(310)는 제 2 음성 명령어의 발화 의도가 제 1 음성 명령어의 발화 의도가 속하는 제어 카테고리에 포함되지 않은 경우, 제 2 음성 명령어를 제 1 음성 명령어와 관련성이 없는 음성 명령어로 판단할 수 있다. When the utterance intention of the second voice command is not included in the control category to which the utterance intention of the first voice command belongs, the determination unit 310 may determine the second voice command as a voice command unrelated to the first voice command. there is.

예를 들어, 도 4b를 참조하면, '에어컨 켜줘'의 제 1 음성 명령어에 대응하는 음성 서비스가 제공된 후에 사용자로부터 'TV 켜줘'의 제 2 음성 명령어를 연이어 수신한 경우, 판단부(310)는 '에어컨 켜줘'의 제 1 음성 명령어에 대응하는 발화 의도('TurnOnAC')가 속한 IoT 제어 카테고리에 'TV 켜줘'의 제 2 음성 명령어에 대응하는 발화 의도('TurnOnTV')가 속하지 않으므로 'TV 켜줘'의 제 2 음성 명령어를 '에어컨 켜줘'의 제 1 음성 명령어와 연속되지 않은 음성 명령어로 판단할 수 있다. For example, referring to FIG. 4B , when the second voice command of 'Turn on TV' is continuously received from the user after the voice service corresponding to the first voice command of 'Turn on the air conditioner' is provided, the determination unit 310 is Since the utterance intent ('TurnOnTV') corresponding to the second voice command of 'Turn on TV' does not belong to the IoT control category to which the utterance intent ('TurnOnAC') corresponding to the first voice command of 'Turn on the air conditioner' belongs, 'Turn on TV' The second voice command of ' may be determined as a voice command that is not continuous with the first voice command of 'Turn on the air conditioner'.

판단부(310)는 제 2 음성 명령어의 발화 의도가 제 1 음성 명령어의 발화 의도가 속하는 제어 카테고리에 포함되지만, 제 2 음성 명령어가 제 1 음성 명령어의 발화 의도가 속하는 제어 카테고리에 설정된 기설정된 시간 내에 수신되지 못한 경우, 제 2 음성 명령어를 제 1 음성 명령어와 관련성이 없는 음성 명령어로 판단할 수 있다. The determination unit 310 includes a preset time set in the control category to which the utterance intention of the second voice command belongs, but the second voice command is included in the control category to which the utterance intention of the first voice command belongs. If it is not received within, it may be determined that the second voice command is a voice command that is not related to the first voice command.

예를 들어, 도 4b를 참조하면, '에어컨 켜줘'의 제 1 음성 명령어에 대응하는 발화 의도('TurnOnAC')가 속한 IoT 제어 카테고리에 '에어컨 온도 올려줘'의 제 2 음성 명령어에 대응하는 발화 의도('UpACTemp')가 속하지만, IoT 제어 카테고리에 설정된 기설정된 제 1 시간(예컨대, 5초) 내에 '에어컨 온도 올려줘'의 제 2 음성 명령어가 수신되지 않은 경우, 판단부(310)는 '에어컨 온도 올려줘'의 제 2 음성 명령어를 '에어컨 켜줘'의 제 1 음성 명령어와 연관된 연속 상황으로 판단하지 않을 수 있다.For example, referring to FIG. 4B , a utterance intention corresponding to the second voice command of 'Raise the air conditioner temperature' in the IoT control category to which the utterance intention ('TurnOnAC') corresponding to the first voice command of 'Turn on the air conditioner' belongs ('UpACTemp') belongs to, but the second voice command of 'Raise the air conditioner temperature' is not received within the first preset time (eg, 5 seconds) set in the IoT control category, the determination unit 310 'Air conditioner The second voice command of 'Raise the temperature' may not be determined as a continuous situation associated with the first voice command of 'Turn on the air conditioner'.

판단부(310)는 사용자의 발화 의도별로 기설정된 외부 환경의 사전 상태 정보 중 제 2 음성 명령어에 대한 발화 의도와 연관된 외부 환경의 사전 상태가 존재하는 경우, 제 2 음성 명령어를 제 1 음성 명령어와 연관된 연속 상황으로 판단할 수 있다. The determination unit 310 is configured to combine the second voice command with the first voice command when there is a prior state of the external environment related to the utterance intention for the second voice command among the pre-state information of the external environment preset for each utterance intention of the user. It can be judged as a series of related situations.

예를 들어, 도 6을 참조하면, 만일 '에어컨 켜줘'의 제 1 음성 명령어가 수신되어, 에어컨이 켜진 경우, 판단부(310)는 외부 환경의 사전 상태 정보로서 '에어컨 켜짐'을 인식한 후, 호출도 없이도 '에어컨 켜짐'의 외부 환경의 사전 상태 정보에 대응하는 발화 의도에 대응하는 음성 명령어를 인식할 수 있다.For example, referring to FIG. 6 , if the first voice command of 'turn on the air conditioner' is received and the air conditioner is turned on, the determination unit 310 recognizes 'air conditioner on' as pre-state information of the external environment. , it is possible to recognize a voice command corresponding to the utterance intention corresponding to the prior state information of the external environment of 'air conditioner on' without a call.

예를 들어, '에어컨 켜줘'의 제 1 음성 명령어에 따라 에어컨이 켜진 후, 사용자로부터 '에어컨 온도 올려줘'의 제 2 음성 명령어를 수신한 경우, 판단부(310)는 제 2 음성 명령어가 '에어컨 켜짐'의 외부 환경의 사전 상태 정보에 대응하는 발화 의도 중 하나이므로, '에어컨 온도 올려줘'의 제 2 음성 명령어를 '에어컨 켜줘'의 제 1 음성 명령어와 연관된 연속 상황으로 판단할 수 있다. For example, after the air conditioner is turned on according to the first voice command of 'Turn on the air conditioner', if a second voice command of 'Raise the air conditioner temperature' is received from the user, the determination unit 310 determines that the second voice command is 'Air conditioner Since it is one of the utterance intentions corresponding to the prior state information of the external environment of 'on', it can be determined that the second voice command of 'raise the air conditioner temperature' is a continuous situation associated with the first voice command of 'turn on the air conditioner'.

본 실시예에 따르면, 음성 인식 서비스 제공 장치(100)는 특정 외부 환경의 사전 상태 정보가 활성화된 상태에서 이에 대응하는 발화 의도에 포함된 음성 명령어가 수신되는 경우 호출어 없이 음성 인식 서비스를 개시할 수 있다. 이 경우, 후행의 음성 명령어의 수신 시점은 특별한 제한이 없을 수 있다.According to the present embodiment, the voice recognition service providing apparatus 100 may start the voice recognition service without a call word when a voice command included in an utterance intention corresponding thereto is received in a state in which prior state information of a specific external environment is activated. can In this case, there may be no particular limitation on the reception time of the subsequent voice command.

일 실시예에 있어서, 판단부(310)는 제 1 음성 명령어가 텍스트로 변환된 제 1 문장 및 제 2 음성 명령어가 텍스트로 변환된 제 2 문장 간의 유사도 비교(예컨대, 코사인 유사도, 자카드 유사도, 유클리디안 유사도, 맨하탄 유사도 등)을 통해 제 2 음성 명령어가 제 1 음성 명령어와 연관된 연속 상황에 속하는지 여부를 판단할 수 있다. 예를 들어, 판단부(310)는 제 1 음성 명령어에 대응하는 제 1 문장과 제 2 음성 명령어에 대응하는 제 2 문장 간의 유사도 값이 기설정된 임계치를 초과하는 경우, 제 2 음성 명령어를 제 1 음성 명령어와 연관된 연속 상황으로 판단할 수 있다. In an embodiment, the determination unit 310 compares the similarity between the first sentence in which the first voice command is converted into text and the second sentence in which the second voice command is converted into text (eg, cosine similarity, jacquard similarity, similarity). Through the Clidian similarity, Manhattan similarity, etc.), it may be determined whether the second voice command belongs to a continuous situation associated with the first voice command. For example, when the similarity value between the first sentence corresponding to the first voice command and the second sentence corresponding to the second voice command exceeds a preset threshold, the determination unit 310 may transmit the second voice command to the first It can be determined by the continuous situation related to the voice command.

다른 일 실시예에 있어서, 사용자의 제 1 음성 명령어에 대응하는 음성 인식 서비스가 제공된 후, 사용자의 제 2 음성 명령어가 수신된 경우, 판단부(310)는 사용자만을 위한 맞춤형 개인 대화 모델을 통해 인식된 사용자의 제 2 음성 명령어에 대한 제 1 연속 상황 판단값과, 불특정 다수의 사용자를 위한 전체 대화 모델을 통해 인식된 사용자의 제 2 음성 명령어에 대한 제 2 연속 상황 판단값을 합산한 최종 연속 상황 판단값에 기초하여 제 2 음성 명령어를 제 1 음성 명령어와 연관된 연속 상황으로 판단할 수 있다. 예를 들어, 최종 연속 상황 판단값은 [수학식 1]을 통해 계산될 수 있다. In another embodiment, when the user's second voice command is received after the voice recognition service corresponding to the user's first voice command is provided, the determination unit 310 is recognized through a customized personal conversation model only for the user The final continuous situation obtained by summing the first continuous situation determination value for the user's second voice command and the second continuous situation determination value for the user's second voice command recognized through the entire conversation model for an unspecified number of users Based on the determination value, it is possible to determine the second voice command as a continuous situation associated with the first voice command. For example, the final continuous situation determination value may be calculated through [Equation 1].

[수학식 1][Equation 1]

,

여기서,

는 최종 연속 상황 판단값이고,

는 전체 대화 모델의 가중치이고,

는 전체 대화 모델에 의한 연속 상황 판단값이고,

는 맞춤형 개인 대화 모델의 가중치이고,

는 맞춤형 개인 모델에 의한 연속 상황 판단값이다. here,

is the final continuous situation judgment value,

is the weight of the entire dialog model,

is a continuous situation judgment value by the entire dialogue model,

is the weight of the customized personal conversation model,

is a continuous situation judgment value by the customized personal model.

다른 일 실시예에 있어서, 출력부(미도시)는 사용자의 제 1 음성 명령어에 대응하는 음성 인식 서비스가 제공된 후, 사용자로부터 추가로 수신된 제 2 음성 명령어가 제 1 음성 명령어의 연속 상황으로 판단되고 있음을 사용자에게 알리기 위해 특정 LED 색상을 음성 인식 서비스 제공 장치(100)의 LED 전구를 통해 출력시킬 수 있다. 이를 통해, 사용자는 음성 인식 서비스 제공 장치(100)가 연속 상황 내에서 선행의 음성 명령어와 연관된 후행의 음성 명령어를 인식하고 있음을 확인할 수 있다. In another embodiment, the output unit (not shown) determines that the second voice command additionally received from the user is a continuation situation of the first voice command after the voice recognition service corresponding to the user's first voice command is provided A specific LED color may be output through the LED bulb of the voice recognition service providing apparatus 100 in order to notify the user of the current state. Through this, the user can confirm that the voice recognition service providing apparatus 100 recognizes a subsequent voice command associated with a preceding voice command within a continuous context.

또한, 출력부(미도시)는 연속 상황 내 선행의 음성 명령어와 연관된 후행의 음성 명령어가 인식되어 음성 인식 서비스가 제공된 후 기설정된 시간(연속 상황으로 판단하기 위해 설정된 시간)을 경과하면 사용자에게 호출어만 인식될 수 있음을 알리는 효과음(예컨대, Beep 음)을 음성 인식 서비스 제공 장치(100)의 스피커를 통해 출력할 수 있다. In addition, the output unit (not shown) calls the user when a preset time (time set to determine the continuous situation) has elapsed after the voice recognition service is provided by recognizing a subsequent voice command related to a preceding voice command in a continuous situation. An effect sound (eg, a beep sound) indicating that only words can be recognized may be output through the speaker of the apparatus 100 for providing a voice recognition service.

제공부(320)는 판단 결과에 기초하여 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다. The providing unit 320 may provide a voice recognition service corresponding to the second voice command based on the determination result.

제공부(320)는 제 2 음성 명령어가 연속 상황 인식 조건을 만족하는 경우, 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다. When the second voice command satisfies the continuous context recognition condition, the providing unit 320 may provide a voice recognition service corresponding to the second voice command.

제공부(320)는 제 2 음성 명령어의 발화 의도가 제 1 음성 명령어의 발화 의도가 속하는 제어 카테고리에 포함되고, 제 1 음성 명령어의 발화 의도가 속하는 제어 카테고리에 설정된 기설정된 시간 내에 제 2 음성 명령어가 수신된 경우, 제 1 음성 명령어와 연속 상황으로 판단된 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다.The providing unit 320 includes a second voice command within a preset time set in the control category to which the utterance intention of the second voice command belongs, and the utterance intention of the first voice command belongs to the control category. When is received, a voice recognition service corresponding to the first voice command and the second voice command determined as a continuation situation may be provided.

제공부(320)는 사용자의 발화 의도별로 기설정된 외부 환경의 사전 상태 정보 중 제 2 음성 명령어에 대한 발화 의도와 연관된 외부 환경의 사전 상태가 존재하는 경우, 제 1 음성 명령어와 연속 상황으로 판단된 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다.The providing unit 320 determines that the situation is continuous with the first voice command when there is a prior state of the external environment related to the utterance intention for the second voice command among the pre-state information of the external environment preset for each utterance intention of the user. A voice recognition service corresponding to the second voice command may be provided.

예를 들어, 도 11a를 참조하면, '선풍기를 틀어줘'의 제 1 음성 명령어에 대응하는 음성 인식 서비스(선풍기의 상태를 ON 동작으로 구동시키는 기능)가 제공된 후, 상술한 조건들 하에 사용자로부터 추가로 '선풍기 바람세기 낮춰줘'의 제 2 음성 명령어가 수신된 경우, 제공부(320)는 제 2 음성 명령어를 제 1 음성 명령어와 연관된 연속 상황으로 판단하여 호출어 없이 제 2 음성 명령어에 대응하는 음성 인식 서비스(선풍기의 바람 세기를 낮추는 기능)를 제공할 수 있다. For example, referring to FIG. 11A , after a voice recognition service corresponding to the first voice command of 'turn on the fan' (a function of driving the state of the fan into an ON operation) is provided, the user receives the In addition, when the second voice command of 'lower the wind strength of the fan' is received, the providing unit 320 determines the second voice command as a continuous situation related to the first voice command and responds to the second voice command without a call word A voice recognition service (a function to lower the wind strength of the fan) may be provided.

제공부(320)는 제 2 음성 명령어가 연속 상황 인식 조건을 만족하지 않은 경우, 제 2 음성 명령어를 무시하고, 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공하지 않게 된다. 예를 들어, 도 11b를 참조하면, '선풍기를 틀어줘'의 제 1 음성 명령어에 대응하는 음성 인식 서비스(선풍기의 상태를 ON 동작으로 구동시키는 기능)가 제공된 후, 상술한 조건들을 만족하지 않은 상태에서 사용자로부터 '안녕하세요'의 제 2 음성 명령어가 수신된 경우, 제공부(320)는 '안녕하세요'의 제 2 음성 명령어를 제 1 음성 명령어와 연관된 연속 상황이 아닌 것으로 판단하여 제 2 음성 명령어를 무시할 수 있다. When the second voice command does not satisfy the continuous context recognition condition, the providing unit 320 ignores the second voice command and does not provide a voice recognition service corresponding to the second voice command. For example, referring to FIG. 11B , after a voice recognition service corresponding to the first voice command of 'turn on the fan' (a function of driving the state of the fan to an ON operation) is provided, the above conditions are not satisfied. When the second voice command of 'hello' is received from the user in the state, the providing unit 320 determines that the second voice command of 'hello' is not a continuous situation related to the first voice command and provides the second voice command can be ignored

수신부(300)는 제 2 음성 명령어에 대응하는 음성 인식 서비스가 제공된 후, 사용자로부터 피드백 정보를 수신할 수 있다. 이 때, 사용자의 피드백 정보에 의해 연속 상황으로 분석된 제 2 음성 명령어 또는 연속 상황이 아닌 것으로 분석된 제 2 음성 명령어들을 별도로 관리되고, 별도로 관리되는 제 2 음성 명령어들에 한하여 선행된 제 1 음성 명령어와의 유사도값을 도출할 수 있다. 도출된 유사도값은 복수의 제어 카테고리에 매핑된 사용자의 발화 의도를 업데이트하는데 이용될 수 있다.The receiver 300 may receive feedback information from the user after the voice recognition service corresponding to the second voice command is provided. At this time, the second voice command analyzed as a continuous situation by the user's feedback information or the second voice commands analyzed as not in the continuous situation are separately managed, and only the second voice commands that are managed separately are preceded by the first voice A similarity value with the command can be derived. The derived similarity value may be used to update the user's utterance intention mapped to a plurality of control categories.

예를 들어, '에어컨 켜줘'의 제 1 음성 명령어에 대한 음성 인식 서비스가 제공된 후, 사용자가 추가로 발화한 '에어컨 온도 올려줘'의 제 2 음성 명령어에 대한 음성 인식 서비스가 제공되었다고 가정하면, 사용자가 해당 제 2 음성 명령어가 제 1 음성 명령어와 연속 상황의 발화가 아니었음을 확인시켜주는 피드백 정보(예컨대, '지니야 너한테 한 말이 아니야')를 음성 인식 서비스 제공 장치(100)에게 발화하면, 수신부(300)는 해당 피드백 정보를 수신할 수 있다. For example, assuming that the voice recognition service for the first voice command of 'Turn on the air conditioner' is provided and then the voice recognition service for the second voice command of 'Turn up the air conditioner temperature' uttered by the user is provided, the user When is uttered to the voice recognition service providing apparatus 100 feedback information confirming that the second voice command is not the utterance of the first voice command and the continuous situation (eg, 'Ginny, I didn't say that to you') , the receiving unit 300 may receive the corresponding feedback information.

또한, 수신부(300)는 제 2 음성 명령어가 연속 상황이 아니라고 판단되어 제 2 음성 명령어에 대응하는 음성 인식 서비스가 제공되지 않은 경우, 사용자로부터 피드백 정보를 수신할 수 있다. 예를 들어, '에어컨 켜줘'의 제 1 음성 명령에 대한 음성 인식 서비스가 제공된 후, 사용자가 추가로 발화한 'TV 켜줘'의 제 2 음성 명령어를 음성인식 서비스 제공 장치(100)에서 인식하지 못한 경우, 수신부(300)는 '지니야 너한테 한 말이야'를 포함하는 피드백 정보를 사용자로부터 수신할 수 있다. Also, when it is determined that the second voice command is not in a continuous situation and thus a voice recognition service corresponding to the second voice command is not provided, the receiver 300 may receive feedback information from the user. For example, after the voice recognition service for the first voice command of 'Turn on the air conditioner' is provided, the second voice command of 'Turn on TV' uttered by the user is not recognized by the voice recognition service providing apparatus 100 In this case, the receiving unit 300 may receive feedback information including 'Ginny said to you' from the user.

업데이트부(330)는 사용자의 피드백 정보에 기초하여 복수의 제어 카테고리에 매핑된 사용자의 발화 의도를 업데이트할 수 있다. The updater 330 may update the user's utterance intention mapped to a plurality of control categories based on the user's feedback information.

예를 들어, 제 2 음성 명령어에 대한 음성 인식 서비스가 제공되지 않은 상황에서 사용자로부터 수신된 사용자의 피드백 정보에 대한 분석 결과가 제 2 음성 명령어가 제 1 음성 명령어와 연관된 연속 상황인 것으로 분석된 경우, 업데이트부(330)는 제 1 음성 명령어의 발화 의도가 속하는 제어 카테고리를 제 2 음성 명령어의 발화 의도가 속하는 제어 카테고리와 병합하여 제어 카테고리 테이블(42)을 업데이트할 수 있다. 또는, 사용자의 피드백 정보에 대한 분석 결과가 제 2 음성 명령어가 제 1 음성 명령어와 연관된 연속 상황이 아닌 것으로 분석된 경우, 업데이트부(330)는 제 1 음성 명령어의 발화 의도가 속하는 제어 카테고리를 분할하여 제어 카테고리 테이블(42)을 업데이트할 수 있다. For example, when the analysis result of the user's feedback information received from the user in a situation where the voice recognition service for the second voice command is not provided is analyzed that the second voice command is a continuous situation associated with the first voice command , the updater 330 may update the control category table 42 by merging the control category to which the utterance intention of the first voice command belongs with the control category to which the utterance intention of the second voice command belongs. Alternatively, when the analysis result of the user's feedback information is analyzed that the second voice command is not a continuous situation associated with the first voice command, the updater 330 divides the control category to which the utterance intention of the first voice command belongs. to update the control category table 42 .

업데이트부(330)는 복수의 제어 카테고리에 사용자 맞춤형 카테고리를 추가할 수 있다. The updater 330 may add a user-customized category to the plurality of control categories.

예를 들어, 도 4b 및 도 7a를 함께 참조하면, '에어컨 켜줘'의 제 1 음성 명령(701)에 대한 음성 인식 서비스가 제공된 후, 사용자가 추가로 발화한 'TV 채널 올려줘'의 제 2 음성 명령어(703)를 음성인식 서비스 제공 장치(100)에서 인식하지 못한 경우(즉, 연속 상황 인식에 실패한 경우), 업데이트부(330)는 제 1 음성 명령(701)의 발화 의도가 속한 IoT 제어 카테고리와 제 2 음성 명령어(703)의 발화 의도가 속한 TV 제어 카테고리를 병합한 사용자 맞춤형 카테고리를 제어 카테고리 테이블(42)에 추가하여 제어 카테고리 테이블(42)을 업데이트할 수 있다. For example, referring to FIGS. 4B and 7A together, after the voice recognition service for the first voice command 701 of 'Turn on the air conditioner' is provided, the second voice of 'Turn up TV channel' uttered by the user additionally When the command 703 is not recognized by the apparatus 100 for providing a voice recognition service (ie, when continuous situation recognition fails), the update unit 330 sets the IoT control category to which the utterance intention of the first voice command 701 belongs. The control category table 42 may be updated by adding a user-customized category that merges the TV control category to which the utterance intention of the second voice command 703 and the second voice command 703 belongs to the control category table 42 .

예를 들어, 도 4b 및 도 7b를 함께 참조하면, '에어컨 켜줘'의 제 1 음성 명령(701)에 대한 음성 인식 서비스가 제공된 후, 사용자가 추가로 발화한 '에어컨 온도 올려줘'의 제 2 음성 명령어(705)를 음성인식 서비스 제공 장치(100)에서 인식하여 제 2 음성 명령어(705)에 대한 음성인식 서비스를 제공한 것이 사용자의 피드백 정보로부터 연속 상황 오인식된 것으로 분석된 경우, 업데이트부(330)는 제 1 음성 명령(701)의 발화 의도가 속한 IoT 제어 카테고리와 제 2 음성 명령어(703)의 발화 의도가 속한 IoT 제어 카테고리를 분할한 사용자 맞춤형 카테고리를 제어 카테고리 테이블(42)에 추가하여 제어 카테고리 테이블(42)을 업데이트할 수 있다.For example, referring to FIGS. 4B and 7B together, after the voice recognition service for the first voice command 701 of 'Turn on the air conditioner' is provided, the second voice of 'Raise the air conditioner temperature' further uttered by the user When it is analyzed that the command 705 is recognized by the voice recognition service providing apparatus 100 and the voice recognition service for the second voice command 705 is provided from the user's feedback information, it is analyzed that the continuous situation is misrecognized, the update unit 330 ) is controlled by adding a user-customized category obtained by dividing the IoT control category to which the utterance intention of the first voice command 701 belongs and the IoT control category to which the utterance intention of the second voice command 703 belongs to the control category table 42 . The category table 42 may be updated.

한편, 당업자라면, 수신부(300), 판단부(310), 제공부(320), 업데이트부(330) 및 저장부(340) 각각이 분리되어 구현되거나, 이 중 하나 이상이 통합되어 구현될 수 있음을 충분히 이해할 것이다. Meanwhile, for those skilled in the art, each of the receiving unit 300, the determining unit 310, the providing unit 320, the update unit 330, and the storage unit 340 may be implemented separately, or one or more of them may be integrated. You will fully understand that

도 8은 본 발명의 일 실시예에 따른, 음성 인식 서비스를 제공하는 방법을 나타낸 흐름도이다. 8 is a flowchart illustrating a method of providing a voice recognition service according to an embodiment of the present invention.

도 8을 참조하면, 단계 S801에서 음성 인식 서비스 제공 장치(100)는 호출어가 인식된 후, 사용자로부터 제 1 음성 명령어를 수신할 수 있다. Referring to FIG. 8 , in step S801 , the apparatus 100 for providing a voice recognition service may receive a first voice command from a user after a call word is recognized.

단계 S803에서 음성 인식 서비스 제공 장치(100)는 제 1 음성 명령어에 대응하는 음성 인식 서비스가 제공된 후, 사용자로부터 제 2 음성 명령어를 수신할 수 있다. In step S803 , the apparatus 100 for providing a voice recognition service may receive a second voice command from the user after the voice recognition service corresponding to the first voice command is provided.

단계 S805에서 음성 인식 서비스 제공 장치(100)는 제 2 음성 명령어가 연속 상황 인식 조건을 만족하는지 여부를 판단할 수 있다. 여기서, 연속 상황 인식 조건은 기설정된 시간 내 제 2 음성 명령어가 수신되는 조건을 포함할 수 있다. In step S805, the apparatus 100 for providing a voice recognition service may determine whether the second voice command satisfies a continuous context recognition condition. Here, the continuous context recognition condition may include a condition in which the second voice command is received within a preset time.

단계 S807에서 음성 인식 서비스 제공 장치(100)는 판단 결과에 기초하여 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다. 예를 들어, 음성 인식 서비스 제공 장치(100)는 제 2 음성 명령어가 기설정된 시간 내에 수신된 경우, 제 2 음성 명령어를 제 1 음성 명령어와 연관된 연속 상황으로 판단하고, 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다. 만일, 음성 인식 서비스 제공 장치(100)는 제 2 음성 명령어가 기설정된 시간 내에 수신되지 않은 경우, 제 2 음성 명령어를 제 1 음성 명령어와 연관된 연속 상황으로 판단하지 않고, 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공하지 않는다. In operation S807, the apparatus 100 for providing a voice recognition service may provide a voice recognition service corresponding to the second voice command based on the determination result. For example, when the second voice command is received within a preset time, the apparatus 100 for providing a voice recognition service determines that the second voice command is a continuous situation related to the first voice command, and corresponds to the second voice command. A voice recognition service may be provided. If the second voice command is not received within the preset time, the voice recognition service providing apparatus 100 does not determine the second voice command as a continuous situation related to the first voice command, but does not determine that the second voice command corresponds to the second voice command It does not provide voice recognition service.

상술한 설명에서, 단계 S801 내지 S807은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. In the above description, steps S801 to S807 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted if necessary, and the order between the steps may be changed.

도 9는 본 발명의 일 실시예에 따른, 연속 상황을 판단하여 음성 인식 서비스를 제공하는 방법을 나타낸 흐름도이다.9 is a flowchart illustrating a method of providing a voice recognition service by determining a continuous situation according to an embodiment of the present invention.

도 9를 참조하면, 단계 S901에서 음성 인식 서비스 제공 장치(100)는 호출어가 인식된 후, 사용자로부터 제 1 음성 명령어를 수신할 수 있다. Referring to FIG. 9 , in step S901 , the apparatus 100 for providing a voice recognition service may receive a first voice command from a user after a call word is recognized.

단계 S903에서 음성 인식 서비스 제공 장치(100)는 제 1 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다. In step S903, the voice recognition service providing apparatus 100 may provide a voice recognition service corresponding to the first voice command.

단계 S905에서 음성 인식 서비스 제공 장치(100)는 사용자로부터 제 2 음성 명령어를 수신할 수 있다. In step S905, the apparatus 100 for providing a voice recognition service may receive a second voice command from the user.

단계 S907에서 음성 인식 서비스 제공 장치(100)는 제 2 음성 명령어의 발화 의도가 제 1 음성 명령어의 발화 의도에 속하는 제어 카테고리에 포함되고, 제 1 음성 명령어의 발화 의도가 속하는 제어 카테고리에 설정된 기설정된 시간 내에 제 2 음성 명령어가 입력된 경우, 제 2 음성 명령어를 제 1 음성 명령어와 연관된 연속 상황으로 판단할 수 있다. In step S907, the apparatus 100 for providing a voice recognition service determines that the utterance intention of the second voice command is included in the control category belonging to the utterance intention of the first voice command, and is set in the control category to which the utterance intention of the first voice command belongs. When the second voice command is input within the time, it may be determined that the second voice command is a continuous situation associated with the first voice command.

단계 S909에서 음성 인식 서비스 제공 장치(100)는 제 2 음성 명령어가 제 1 음성 명령어와 연관된 연속 상황으로 판단되면, 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다. If it is determined in step S909 that the second voice command is a continuous situation associated with the first voice command, the voice recognition service providing apparatus 100 may provide a voice recognition service corresponding to the second voice command.

단계 S911에서 음성 인식 서비스 제공 장치(100)는 제 2 음성 명령어에 대응하는 음성 인식 서비스에 대한 피드백 정보를 사용자로부터 수신할 수 있다. In step S911 , the apparatus 100 for providing a voice recognition service may receive feedback information about a voice recognition service corresponding to the second voice command from the user.

단계 S913에서 음성 인식 서비스 제공 장치(100)는 사용자의 피드백 정보에 기초하여 복수의 제어 카테고리에 매핑된 사용자의 발화 의도를 업데이트할 수 있다. In operation S913 , the apparatus 100 for providing a voice recognition service may update the user's utterance intention mapped to a plurality of control categories based on the user's feedback information.

상술한 설명에서, 단계 S901 내지 S913은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. In the above description, steps S901 to S913 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted if necessary, and the order between the steps may be changed.

도 10은 본 발명의 다른 실시예에 따른, 연속 상황을 판단하여 음성 인식 서비스를 제공하는 방법을 나타낸 흐름도이다.10 is a flowchart illustrating a method of providing a voice recognition service by determining a continuous situation according to another embodiment of the present invention.

도 10을 참조하면, 단계 S1001에서 음성 인식 서비스 제공 장치(100)는 호출어가 인식된 후, 사용자로부터 제 1 음성 명령어를 수신할 수 있다.Referring to FIG. 10 , in step S1001 , the apparatus 100 for providing a voice recognition service may receive a first voice command from a user after a call word is recognized.

단계 S1003에서 음성 인식 서비스 제공 장치(100)는 제 1 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다. In step S1003, the apparatus 100 for providing a voice recognition service may provide a voice recognition service corresponding to the first voice command.

단계 S1005에서 음성 인식 서비스 제공 장치(100)는 사용자로부터 제 2 음성 명령어를 수신할 수 있다. In step S1005, the apparatus 100 for providing a voice recognition service may receive a second voice command from the user.

단계 S1007에서 음성 인식 서비스 제공 장치(100)는 사용자의 발화 의도별로 기설정된 외부 환경의 사전 상태 정보 중 제 2 음성 명령어에 대한 발화 의도와 연관된 외부 환경의 사전 상태 정보가 존재하는 경우, 제 2 음성 명령어를 제 1 음성 명령어와 연관된 연속 상황으로 판단할 수 있다. In step S1007 , the apparatus 100 for providing a voice recognition service performs the second voice when there is prior state information of the external environment related to the utterance intention for the second voice command among the prior state information of the external environment preset for each user's utterance intention. The command may be determined as a continuous situation associated with the first voice command.

단계 S1009에서 음성 인식 서비스 제공 장치(100)는 제 2 음성 명령어가 제 1 음성 명령어와 연관된 연속 상황으로 판단되면, 제 2 음성 명령어에 대응하는 음성 인식 서비스를 제공할 수 있다. When it is determined that the second voice command is a continuous situation associated with the first voice command in step S1009, the voice recognition service providing apparatus 100 may provide a voice recognition service corresponding to the second voice command.

단계 S1011에서 음성 인식 서비스 제공 장치(100)는 제 2 음성 명령어에 대응하는 음성 인식 서비스에 대한 피드백 정보를 사용자로부터 수신할 수 있다. In operation S1011 , the apparatus 100 for providing a voice recognition service may receive feedback information about a voice recognition service corresponding to the second voice command from the user.

단계 S1013에서 음성 인식 서비스 제공 장치(100)는 사용자의 피드백 정보에 기초하여 복수의 제어 카테고리에 매핑된 사용자의 발화 의도를 업데이트할 수 있다. In operation S1013 , the apparatus 100 for providing a voice recognition service may update the user's utterance intention mapped to a plurality of control categories based on the user's feedback information.

상술한 설명에서, 단계 S1001 내지 S1013은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. In the above description, steps S1001 to S1013 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted if necessary, and the order between the steps may be changed.

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. An embodiment of the present invention may be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The description of the present invention described above is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may also be implemented in a combined form.

본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다. The scope of the present invention is indicated by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. .

100: 음성 인식 서비스 제공 장치
300: 수신부
310: 판단부
320: 제공부
330: 업데이트부
340: 저장부100: voice recognition service providing device
300: receiver
310: judgment unit
320: providing unit
330: update unit
340: storage

Claims

A device for providing a voice recognition service
a receiver configured to receive a first voice command from a user after the call word is recognized;
a determination unit configured to determine whether the second voice command satisfies a continuous context recognition condition when a second voice command is received after the voice recognition service corresponding to the first voice command is provided; and
and a providing unit providing a voice recognition service corresponding to the second voice command based on a result of the determination.

The method of claim 1,
The server for providing a voice recognition service, further comprising a storage unit for mapping and storing a plurality of utterance intentions for each of a plurality of control categories.

3. The method of claim 2,
The continuous situation recognition condition is
Including a condition that the second voice command is received within a preset time,
The voice recognition service providing server that the preset time is set differently for each of the plurality of control categories.

4. The method of claim 3,
The judging unit
When the second voice command is received within the preset time, the apparatus for providing a voice recognition service is to determine the second voice command as a continuous situation associated with the first voice command.

4. The method of claim 3,
The judging unit
The utterance intention of the second voice command is included in a control category to which the utterance intention of the first voice command belongs, and the second voice command is input within a preset time set in a control category to which the utterance intention of the first voice command belongs. If it is, the second voice command will be determined as a continuous situation associated with the first voice command, the voice recognition service providing apparatus.

3. The method of claim 2,
The apparatus for providing a voice recognition service, wherein the storage unit further includes pre-state information of an external environment preset for each utterance intention.

8. The method of claim 7,
The judging unit
If there is prior state information of the external environment related to the utterance intention for the second voice command among the pre-state information of the external environment preset for each utterance intention, the second voice command is continuously related to the first voice command which is to be determined as, a voice recognition service providing device.

3. The method of claim 2,
the receiving unit
After the voice recognition service corresponding to the second voice command is provided, the apparatus for providing a voice recognition service is to receive feedback information from a user.

9. The method of claim 8,
The apparatus for providing a speech recognition service further comprising an updater configured to update the user's utterance intention mapped to the plurality of control categories based on the user's feedback information.

10. The method of claim 9,
The update unit will add a user-customized category to the plurality of control categories, the voice recognition service providing apparatus.

A method of providing a voice recognition service performed by a voice recognition service providing apparatus,
after the call word is recognized, receiving a first voice command from a user;
receiving a second voice command from the user after a voice recognition service corresponding to the first voice command is provided;
determining whether the second voice command satisfies a continuous context recognition condition; and
and providing a voice recognition service corresponding to the second voice command based on a result of the determination.

12. The method of claim 11,
The method further comprising the step of mapping and storing a plurality of utterance intentions for each of a plurality of control categories.

13. The method of claim 12,
The continuous situation recognition condition is
Including a condition that the second voice command is received within a preset time,
The method for providing a voice recognition service, wherein the preset time is set differently for each of the plurality of control categories.

14. The method of claim 13,
The step of determining whether the second voice command satisfies the continuous situation recognition condition includes:
When the second voice command is received within the preset time, determining the second voice command as a continuous situation associated with the first voice command, the voice recognition service providing method.

14. The method of claim 13,
The step of determining whether the second voice command satisfies the continuous situation recognition condition includes:
The utterance intention of the second voice command is included in a control category to which the utterance intention of the first voice command belongs, and the second voice command is input within a preset time set in a control category to which the utterance intention of the first voice command belongs. If it is, the method of providing a voice recognition service comprising the step of determining the second voice command as a continuous situation associated with the first voice command.

13. The method of claim 12,
The saving step is
The method of providing a voice recognition service comprising the step of further storing pre-state information of an external environment preset for each utterance intention.

17. The method of claim 16,
The step of determining whether the second voice command satisfies the continuous situation recognition condition includes:
If there is prior state information of the external environment related to the utterance intention for the second voice command among the pre-state information of the external environment preset for each utterance intention, the second voice command is continuously related to the first voice command A method of providing a voice recognition service comprising the step of determining as.

13. The method of claim 12,
After the voice recognition service corresponding to the second voice command is provided, the method further comprising the step of receiving feedback information from a user.

19. The method of claim 18,
The method further comprising the step of updating the user's utterance intention mapped to the plurality of control categories based on the user's feedback information.

In a computer program stored on a computer-readable recording medium comprising a sequence of instructions for providing a voice recognition service,
When the computer program is executed by a computing device,
After the call word is recognized, receiving a first voice command from the user,
When a second voice command is received after the voice recognition service corresponding to the first voice command is provided, it is determined whether the second voice command satisfies a continuous context recognition condition;
A computer program stored in a computer-readable recording medium comprising a sequence of instructions for providing a voice recognition service corresponding to the second voice command based on a result of the determination.