KR102417899B1

KR102417899B1 - Apparatus and method for recognizing voice of vehicle

Info

Publication number: KR102417899B1
Application number: KR1020170153220A
Authority: KR
Inventors: 조재민; 김비호
Original assignee: 현대자동차주식회사; 기아 주식회사
Priority date: 2017-11-16
Filing date: 2017-11-16
Publication date: 2022-07-07
Also published as: KR20190056115A

Abstract

본 발명의 음성인식 시스템은 사용자로부터 명령어를 수신하는 입력부 및 상기 명령어에 포함된 호출명령어를 인식하는 제1음성인식엔진 및 상기 호출명령어 및 음성명령어를 인식하는 제2음성인식엔진을 포함하고, 상기 제1음성인식엔진 및 상기 제2음성인식엔진을 동시에 구동하여 상기 명령어를 인식하는 제어부를 포함하는 것을 특징으로 하여, 사용자로부터 호출명령어와 음성명령어가 연속적으로 입력된 경우에도 호출명령어와 음성명령어가 동시에 검색되도록 하여 명령어를 실행하기 위한 반응시간을 절약할 수 있다. The voice recognition system of the present invention includes an input unit for receiving a command from a user, a first voice recognition engine for recognizing a call command included in the command, and a second voice recognition engine for recognizing the call command and voice command, Characterized in that it comprises a control unit for recognizing the command by simultaneously driving the first voice recognition engine and the second voice recognition engine, even when the call command and the voice command are continuously input from the user, the call command and the voice command are You can save reaction time for executing commands by allowing them to be searched at the same time.

Description

Vehicle voice recognition system and method {APPARATUS AND METHOD FOR RECOGNIZING VOICE OF VEHICLE}

본 발명은 차량의 음성인식 시스템 및 방법에 관한 것으로, 보다 자세하게는 호출명령어 및 음성명령어를 동시에 인식할 수 있도록 음성인식엔진이 병렬로 구성된 차량의 음성인식 시스템 및 방법에 관한 것이다.The present invention relates to a voice recognition system and method for a vehicle, and more particularly, to a voice recognition system and method for a vehicle in which a voice recognition engine is configured in parallel to recognize a call command and a voice command at the same time.

최근 입력장치를 통해 직접 문자나 단축기를 통해 제어하던 방식에서 벗어나, 사용자의 환경에 따라 편리하게 음성을 이용하여 기기들을 제어하는 음성 인식 기술이 개발되고 있다. Recently, a voice recognition technology for controlling devices using voice conveniently according to the user's environment has been developed, away from the method of controlling through text or a shortcut key through an input device.

즉, 음성인식기술은 자동적으로 음성인식장치를 통해 사용자로부터 발화된 음성으로부터 언어적 의미, 내용을 식별하는 것으로서, 구체적으로 음성 신호를 입력하여 단어나 문장을 식별하고 처리하는 과정을 나타낸다. That is, the speech recognition technology automatically identifies the linguistic meaning and content from the voice uttered by the user through the voice recognition device, and specifically refers to the process of identifying and processing words or sentences by inputting a voice signal.

이러한 음성인식기술은 차량에 적용되어 음성인식을 통해 차량에서 구현 가능한 일부 기능들이 수행되도록 하고 있다. 예를들면, 자동차에 구비되는 파워 윈도우, 와이퍼, 비상램프, 에어컨, 오디오 장치를 제어하거나, 길 안내를 요청하거나, 전화를 거는 기능 등이 음성인식 기술을 통해 구현되고 있다. This voice recognition technology is applied to a vehicle so that some functions that can be implemented in the vehicle are performed through voice recognition. For example, functions of controlling a power window, wiper, emergency lamp, air conditioner, and audio device provided in a vehicle, requesting directions, or making a phone call are implemented through voice recognition technology.

이를 위하여, 음성인식장치는 호출명령어를 입력받아 음성인식 기능을 활성화시키고, 음성인식이 활성화되면 음성명령어를 입력받아 음성인식이 수행되도록 하였다. To this end, the voice recognition device receives a call command to activate a voice recognition function, and when voice recognition is activated, receives a voice command to perform voice recognition.

그러나, 매번 호출명령어 이후 묵음을 감지하고, 음성명령어를 입력받는 일련의 과정을 반복하고, 입력된 음성명령어를 서버로 송신해야하므로 음성인식의 결과값을 응답받는 응답시간이 길어지는 한계가 있다. 또한, 호출명령어와 음성명령어 사이에 묵음이 없이 연속적으로 입력되는 경우, 호출명령어만 인식이되고, 호출명령어 이후에 수신된 음성명령어에 대해서는 인식되지 않는 경우가 발생하여 음성인식이 실패되는 한계가 있다.However, since it is necessary to detect silence after a call command every time, repeat a series of processes for receiving a voice command, and transmit the inputted voice command to the server, there is a limitation in that the response time for receiving the result of voice recognition becomes longer. In addition, when continuously input without silence between the call command and the voice command, only the call command is recognized, and the voice command received after the call command is not recognized. There is a limit in that voice recognition fails. .

한편, 최근에는 호출명령어를 활용하여 차량 내에서 음성을 인식하는 방법이 다양화되고 있으며, 호출명령어를 인식하기 위한 음성인식장치의 성능향상이 요구되고 있다. On the other hand, recently, methods for recognizing a voice in a vehicle by using a call command have been diversified, and performance improvement of a voice recognition device for recognizing a call command is required.

그러나, 음성인식장치의 호출명령어의 인식률과 거절률은 트레이드 오프 관계에 있어서, 호출명령어의 인식률을 향상시키기 위해 신뢰도 스코어를 최소화하게 되는 경우 음성인식의 거절률이 낮아지고, 호출명령어의 거절률을 향상시키는 경우 인식률이 낮아지는 한계가 있다. 이에, 호출명령어의 인식률과 거절률을 조절하여 호출명령어가 인식되도록 하는 기술도 요구되는 실정이다.However, there is a trade-off relationship between the recognition rate and rejection rate of the call command of the voice recognition device. When the reliability score is minimized to improve the recognition rate of the call command, the rejection rate of voice recognition is lowered, and the rejection rate of the call command is lowered. In case of improvement, there is a limit in that the recognition rate is lowered. Accordingly, a technique for recognizing the call command by adjusting the recognition rate and rejection rate of the call command is also required.

본 발명은 상술한 한계점을 극복하기 위한 것으로, 호출명령어를 명확하게 인식하고, 호출명령어와 음성명령어가 동시에 처리되도록 하는 차량의 음성인식 시스템 및 방법을 제공하는데 목적이 있다.An object of the present invention is to overcome the above-described limitations, and to provide a vehicle voice recognition system and method that clearly recognizes a call command and processes the call command and the voice command at the same time.

본 발명의 차량의 음성인식 시스템은 사용자로부터 명령어를 수신하는 입력부 및 상기 명령어에 포함된 호출명령어를 인식하는 제1음성인식엔진 및 상기 호출명령어 및 음성명령어를 인식하는 제2음성인식엔진을 포함하고, 상기 호출명령어 및 상기 음성명령어가 연속적으로 수신되면, 상기 제1음성인식엔진 및 상기 제2음성인식엔진을 동시에 구동하여 상기 명령어를 인식하는 제어부를 포함하는 것을 특징으로 한다.The vehicle voice recognition system of the present invention includes an input unit for receiving a command from a user, a first voice recognition engine for recognizing a call command included in the command, and a second voice recognition engine for recognizing the call command and voice command, , When the call command and the voice command are continuously received, it characterized in that it comprises a control unit for recognizing the command by driving the first voice recognition engine and the second voice recognition engine at the same time.

그리고, 상기 제2음성인식엔진은 상기 호출명령어 및 상기 음성명령어 사이에 묵음이 존재하는 경우 및 상기 묵음이 존재하지 않는 경우를 포함하여 구성된 컨텍스트(context)를 기반으로 상기 명령어를 인식하는 것을 특징으로 한다.In addition, the second voice recognition engine recognizes the command based on a context configured including a case in which silence exists between the call command word and the voice command word and a case in which the silence does not exist. do.

그리고, 상기 제2음성인식엔진은 상기 컨텍스트(context)의 문자소(Grapheme)를 음소(Phoneme)로 변환하는 변환부를 포함하는 것을 특징으로 한다.In addition, the second speech recognition engine is characterized in that it includes a conversion unit for converting a grapheme of the context into a phoneme.

그리고, 상기 제어부는 상기 호출명령어의 인식률이 임계치 미만인 경우 상기 명령어의 인식이 실패한 것으로 판단하는 것을 특징으로 한다.And, when the recognition rate of the call command is less than a threshold value, the controller determines that the command recognition has failed.

그리고, 상기 제어부는 상기 호출명령어가 수신된 이후 추가로 수신된 음성이 존재하지 않는 경우, 상기 음성명령어를 수신하도록 하는 것을 특징으로 한다.And, when there is no additionally received voice after the call command is received, the control unit is configured to receive the voice command.

그리고, 상기 제어부는 상기 호출명령어가 수신된 이후 수신된 음성이 존재하지 않는 경우, 상기 제2음성인식엔진의 구동을 종료하는 것을 특징으로 한다.And, when the received voice does not exist after the call command is received, the control unit terminates the driving of the second voice recognition engine.

그리고, 상기 제어부는 상기 제1음성인식엔진 및 상기 제2음성인식엔진의 음성인식 결과에 상기 호출명령어가 포함되지 않으면 상기 명령어의 인식이 실패한 것으로 판단하는 것을 특징으로 한다.And, the control unit is characterized in that if the call command is not included in the voice recognition results of the first voice recognition engine and the second voice recognition engine, it is determined that recognition of the command has failed.

그리고, 상기 제1음성인식엔진 및 상기 제2음성인식엔진의 음성인식 결과에 상기 호출명령어가 포함되면, 상기 호출명령어를 삭제하고 상기 음성명령어가 실행되도록 하는 것을 특징으로 한다.And, when the call command word is included in the speech recognition results of the first speech recognition engine and the second speech recognition engine, the call command word is deleted and the voice command word is executed.

본 발명의 차량의 음성인식 방법은 사용자로부터 명령어를 수신하는 단계와, 호출명령어 및 음성명령어가 연속적으로 수신되면, 상기 호출명령어를 인식하는 제1음성인식엔진 및 상기 호출명령어 및 음성명령어를 인식하는 제2음성인식엔진을 동시에 구동하여 상기 명령어를 인식하는 단계와, 상기 명령어의 음성인식결과에 상기 호출명령어가 포함되는지 판단하는 단계 및 상기 명령어의 음성인식결과로부터 상기 호출명령어를 삭제하고 상기 음성명령어를 실행하는 단계를 포함하는 것을 특징으로 한다.The vehicle voice recognition method of the present invention comprises the steps of receiving a command from a user, and when a call command and a voice command are continuously received, a first voice recognition engine that recognizes the call command and the call command and the voice command are recognized Recognizing the command by simultaneously driving a second voice recognition engine, determining whether the call command word is included in the voice recognition result of the command, and deleting the call command word from the voice recognition result of the command and the voice command word It characterized in that it comprises the step of executing.

그리고, 상기 명령어를 인식하는 단계는 상기 호출명령어 및 상기 음성명령어 사이에 묵음이 존재하는 경우 및 상기 묵음이 존재하지 않는 경우를 포함하여 구성된 컨텍스트를 기반으로 상기 제2음성인식엔진을 구동하여 상기 명령어를 인식하는 것을 수행하는 것을 특징으로 한다.And, the step of recognizing the command may include driving the second voice recognition engine based on a context configured including a case in which silence exists between the call command word and the voice command word and a case in which the silence does not exist to drive the command. It is characterized in that it performs recognition of

그리고, 상기 명령어를 인식하는 단계는 상기 컨텍스트(context)의 문자소(Grapheme)를 음소(Phoneme)로 변환하여 상기 명령어를 인식하는 것을 특징으로 한다.The step of recognizing the command is characterized in that the command is recognized by converting a grapheme of the context into a phoneme.

그리고, 상기 명령어를 인식하는 단계 이후, 상기 호출명령어의 인식률이 임계치 이상인지 판단하는 단계 및 상기 호출명령어의 인식률이 임계치 이상이면, 수신된 호출명령어 이후 추가로 수신된 음성이 존재하는지 판단하는 단계를 더 수행하는 것을 특징으로 한다.And, after the step of recognizing the command, determining whether the recognition rate of the call command is greater than or equal to a threshold, and if the recognition rate of the call command is greater than or equal to the threshold, determining whether there is an additionally received voice after the received call command. It is characterized by performing more.

그리고, 상기 호출명령어의 인식률이 임계치 이상인지 판단하는 단계에서, 상기 호출명령어의 인식률이 임계치 미만인 경우 상기 명령어의 인식이 실패한 것으로 판단하는 것을 특징으로 한다.And, in the step of determining whether the recognition rate of the call command is greater than or equal to a threshold, when the recognition rate of the call command is less than a threshold, it is characterized in that it is determined that the recognition of the command has failed.

그리고, 상기 호출명령어의 인식률이 임계치 이상이면, 수신된 호출명령어 이후 추가로 수신된 음성이 존재하는지 판단하는 단계에서, 상기 호출명령어가 수신된 이후 수신된 음성이 존재하지 않는 경우, 상기 제2음성인식엔진의 구동을 종료하는 것을 특징으로 한다.And, if the recognition rate of the call command is greater than or equal to a threshold, in the step of determining whether there is an additionally received voice after the received call command, if there is no voice received after the call command is received, the second voice It is characterized in that the driving of the recognition engine is terminated.

그리고, 상기 명령어의 음성인식결과에 상기 호출명령어가 포함되는지 판단하는 단계에서, 상기 제1음성인식엔진 및 상기 제2음성인식엔진의 음성인식 결과에 상기 호출명령어가 포함되지 않으면 상기 명령어의 인식이 실패한 것으로 판단하는 것을 특징으로 한다.And, in the step of determining whether the call command is included in the voice recognition result of the command, if the call command is not included in the voice recognition results of the first voice recognition engine and the second voice recognition engine, the recognition of the command is It is characterized as a failure.

본 발명의 차량의 음성인식 시스템 및 방법은 사용자로부터 호출명령어와 음성명령어가 연속적으로 입력된 경우에도 제1음성인식엔진 및 제2음성인식엔진을 구동하여 호출명령어와 음성명령어가 동시에 처리되도록 하여 명령어를 실행하기 위한 반응시간을 절약할 수 있다.The vehicle voice recognition system and method of the present invention operate the first voice recognition engine and the second voice recognition engine to process the call command and the voice command at the same time even when the call command and the voice command are continuously input from the user. It can save the reaction time for executing

또한, 본 발명의 차량의 음성인식 시스템 및 방법은 호출명령어와 음성명령어 사이에 묵음이 존재하지 않는 경우에도 음성인식이 이루어지도록 할 수 있다. In addition, the vehicle voice recognition system and method of the present invention can perform voice recognition even when there is no silence between the call command word and the voice command word.

또한, 본 발명의 차량의 음성인식 시스템 및 방법은 호출명령어의 인식률이 낮은 경우 거절하여 호출명령어의 인식률을 향상시킬 수 있다. In addition, the vehicle voice recognition system and method of the present invention can improve the recognition rate of the call command by rejecting the call command when the recognition rate is low.

도 1은 본 발명의 차량의 음성인식 시스템을 나타낸 도면이다.
도 2는 본 발명의 제2음성인식엔진의 구성을 나타낸 도면이다.
도 3은 본 발명의 차량의 음성인식방법을 나타낸 순서도이다.1 is a view showing a voice recognition system for a vehicle according to the present invention.
2 is a diagram showing the configuration of a second voice recognition engine of the present invention.
3 is a flowchart illustrating a vehicle voice recognition method according to the present invention.

이하, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명의 실시예를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 실시예에 대한 이해를 방해한다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present invention will be described in detail with reference to exemplary drawings. In adding reference numerals to the components of each drawing, it should be noted that the same components are given the same reference numerals as much as possible even though they are indicated on different drawings. In addition, in describing the embodiment of the present invention, if it is determined that a detailed description of a related known configuration or function interferes with the understanding of the embodiment of the present invention, the detailed description thereof will be omitted.

본 발명의 실시예의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 또한, 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가진 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.In describing the components of the embodiment of the present invention, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. In addition, unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

도 1은 본 발명의 차량의 음성인식장치를 나타낸 도면이다.1 is a view showing a voice recognition device for a vehicle according to the present invention.

도 1에 도시된 바와 같이, 본 발명의 음성인식장치는 입력부(10), 제어부(20), 메모리(50) 및 통신부(60)를 포함할 수 있다.1 , the voice recognition apparatus of the present invention may include an input unit 10 , a control unit 20 , a memory 50 , and a communication unit 60 .

입력부(10)는 사용자의 명령어를 수신할 수 있다. 실시예에 따르면, 입력부(10)는 마이크로폰을 포함할 수 있으며, 마이크로폰을 통해 수신된 명령어를 전기적신호로 변환할 수 있다. 여기서, 입력부(10)에 수신된 명령어는 호출명령어 및 음성명령어가 연속되는 자연어를 포함할 수 있다.The input unit 10 may receive a user's command. According to an embodiment, the input unit 10 may include a microphone, and may convert a command received through the microphone into an electrical signal. Here, the command received by the input unit 10 may include a natural language in which a call command word and a voice command word are continuous.

제어부(20)는 제1음성인식엔진(30) 및 제2음성인식엔진(40)을 포함할 수 있다. 제1음성인식엔진 및 제2음성인식엔진은 음성 활성화 감지(Voice Activity Detection)방식을 이용하여 음성인식을 수행할 수 있다. 음성 활성화 감지 방식은 음성인식 이전에 전처리를 수행하고, 수신된 음성에서 음성 활동 구간을 검출한 후, 음성인식하는 것을 의미할 수 있다. 음성 활성화 감지는 음성 신호만이 음성 인식기로 입력되도록 함으로써 음성 신호가 아닌 노이즈 신호에 의해 음성 인식기가 오작동하는 것을 막을 수 있다.The control unit 20 may include a first voice recognition engine 30 and a second voice recognition engine 40 . The first voice recognition engine and the second voice recognition engine may perform voice recognition using a voice activity detection method. The voice activation detection method may mean performing pre-processing before voice recognition, detecting a voice activity section from a received voice, and then performing voice recognition. The voice activation detection allows only the voice signal to be input to the voice recognizer, thereby preventing the voice recognizer from malfunctioning due to a noise signal other than the voice signal.

제어부(20)는 제1음성인식엔진(30)은 호출명령어를 인식할 수 있고, 제2음성인식엔진(40)은 호출명령어 및 음성명령어를 인식할 수 있다. 입력부(10)에서 사용자로부터 명령어를 수신하면, 제1음성인식엔진 및 제2음성인식엔진을 구동시켜 호출명령어 및 음성명령어가 동시에 인식되도록 할 수 있다. The control unit 20, the first voice recognition engine 30 may recognize a call command, and the second voice recognition engine 40 may recognize a call command and a voice command. When the input unit 10 receives a command from the user, the first voice recognition engine and the second voice recognition engine are driven so that the call command and the voice command are recognized at the same time.

여기서, 호출명령어는 음성인식장치를 활성화시키기 위한 명령어를 포함할 수 있고, 음성명령어는 네트워크를 통해 음성인식이 이루어지도록 하는 명령어를 포함할 수 있다. 보다 구체적으로 음성명령어는 사용자가 소정의 기능을 실행하기 위한 명령어를 포함할 수 있다. 예를들면, 호출명령어는 '하이 제네시스'를 포함할 수 있다. 그리고, 음성명령어는 '오늘 날씨 어때?'를 포함할 수 있다.Here, the call command word may include a command for activating the voice recognition device, and the voice command word may include a command for performing voice recognition through a network. More specifically, the voice command word may include a command for the user to execute a predetermined function. For example, the call command may include 'high genesis'. And, the voice command may include 'how is the weather today?'

본 발명의 제2음성인식엔진(40)은 호출명령어 및 음성명령어를 인식하도록 컨텍스트를 구성하고, 음향모델을 이용하여 음성인식이 되도록 할 수 있다. 보다 자세하게 제2음성인식엔진(40)의 구성은 도 2를 참조하여 설명한다. The second voice recognition engine 40 of the present invention may configure a context to recognize a call command and a voice command, and use an acoustic model to perform voice recognition. In more detail, the configuration of the second voice recognition engine 40 will be described with reference to FIG. 2 .

도 2를 참조하면, 본 발명의 제2음성인식엔진(40)은 호출명령어 및 음성명령어를 인식하도록 하기 위해, 호출명령어와 음성명령어가 조합된 컨텍스트를 이용하여 음성인식을 수행할 수 있다. 실시예에 따르면, 컨텍스트는 호출명령어와 음성명령어 사이에 묵음이 있는 경우와 묵음이 없는 경우를 가정하여 구성될 수 있다. 즉, 호출명령어-묵음-음성명령어, 호출명령어-음성명령어의 구성을 갖는 컨텍스트를 구성할 수 있다.Referring to FIG. 2 , the second voice recognition engine 40 of the present invention may perform voice recognition using a context in which a call command word and a voice command word are combined in order to recognize a call command word and a voice command word. According to an embodiment, the context may be configured on the assumption that there is silence between the call command word and the voice command word and there is no silence. That is, a context having the configuration of a call command-silence-voice command and a call command-voice command can be configured.

제2음성인식엔진(40)의 변환부(41)는 미리 구성된 컨텍스트(context)의 문자소(Grapheme)를 음소(Phoneme)로 변환하는 것으로, 컨텍스트의 문자소를 음소단위로 변환할 수 있다. 따라서, 컨텍스트의 문자를 음성인식의 발음열로 변환할 수 있다.The conversion unit 41 of the second speech recognition engine 40 converts a pre-configured grapheme of a context into a phoneme, and may convert the grapheme of the context into phoneme units. Accordingly, it is possible to convert the text of the context into the pronunciation sequence of speech recognition.

제2음성인식엔진(40)의 음성인식부(42)는 기 저장된 음향모델을 이용하여 음성인식을 할 수 있다. 음향모델은 사용자들의 음성으로 학습된 음소 기반의 확률모델을 의미할 수 있다. 참고로, 음향모델은 음색, 음역 등에 기초하여 다양하게 설계된 음성데이터 중에서 가장 표준에 가까운 데이터에 기초하여 설계될 수 있다. 따라서, 음향모델은 각각 음색이나 음역에 의해 구별될 수 있다. 예를들면, 음향모델은 남성과 여성의 음색 또는 음역에 따라 구별되어 생성될 수 있다. The voice recognition unit 42 of the second voice recognition engine 40 may perform voice recognition using a pre-stored acoustic model. The acoustic model may mean a phoneme-based probabilistic model learned from users' voices. For reference, the acoustic model may be designed based on data closest to the standard among voice data designed in various ways based on tone, sound range, and the like. Therefore, each acoustic model can be distinguished by a tone or a sound range. For example, the acoustic model may be generated to be differentiated according to male and female tones or voice ranges.

음성인식부(42)는 호출명령어 및 음성명령어가 조합된 컨텍스트 및 음향모델을 기반으로 입력부(10)에서 수신된 음성을 인식할 수 있다. 하지만, 이에 한정되는 것은 아니다.The voice recognition unit 42 may recognize the voice received from the input unit 10 based on a context in which a call command word and a voice command word are combined and an acoustic model. However, the present invention is not limited thereto.

다시 도 1을 참조하면, 제어부(20)는 제1음성인식엔진 및 제2음성인식엔진을 동시에 구동시켜 호출명령어 및 음성명령어가 동시에 인식되도록 함으로써, 호출명령어및 음성명령어의 인식결과를 빠르게 응답받을 수 있다. 즉, 종래에는 호출명령어의 인식결과를 응답받은 이후 음성명령어의 인식결과를 받을 수 있었지만, 본 발명의 음성인식장치는 호출명령어가 인식됨과 동시에 음성명령어가 인식됨으로써 인식결과를 빠르게 응답받을 수 있다.Referring back to FIG. 1, the control unit 20 simultaneously drives the first voice recognition engine and the second voice recognition engine so that the call command and the voice command are recognized at the same time, so that the recognition result of the call command and the voice command can be quickly answered. can That is, in the prior art, after receiving the recognition result of the call command, the recognition result of the voice command could be received, but the voice recognition apparatus of the present invention recognizes the call command and the voice command at the same time, so that the recognition result can be quickly received.

제어부(20)는 호출명령어의 인식률이 임계치 이상인지 판단하여, 호출명령어의 인식률이 임계치 미만이면, 호출명령어의 인식실패로 판단하여 음성인식을 중단한다. 종래에는 사용자가 '싸이 노래가 좋아'라고 발화한 경우, 호출명령어를 인식하는 엔진이 '싸이'를 '하이'로 오인식하여, '노래가 좋아'만을 음성명령어로 판단하고 서버로 전송하였다. The control unit 20 determines whether the recognition rate of the call command is greater than or equal to a threshold, and if the recognition rate of the call command is less than the threshold, the control unit 20 determines that the recognition of the call command has failed and stops the voice recognition. Conventionally, when a user utters 'I like Psy's song', the engine that recognizes the call command mistakenly recognizes 'Psy' as 'Hi', and only 'I like the song' is determined as a voice command and transmitted to the server.

그러나, 본 발명의 제1음성인식엔진은 '싸이'를 호출명령어의 인식률이 임계치 미만인 것으로 판단하고, '싸이'를 '하이'로 인식하지 않아 호출명령어의 인식실패로 판단하여 음성인식을 중단할 수 있다. 그리고, 호출명령어의 인식률이 임계치 이상이면 호출명령어가 수신된 이후 추가로 수신된 음성이 존재하는지 판단할 수 있다.However, the first voice recognition engine of the present invention determines that the recognition rate of the call command for 'Psy' is less than the threshold, and does not recognize 'Psy' as 'High'. can And, if the recognition rate of the call command is greater than or equal to the threshold, it may be determined whether there is an additionally received voice after the call command is received.

일 실시예에 따라 입력부(10)가 '하이 제네시스-홍길동 전화걸기'라는 음성이 수신한 경우, 본 발명에 따른 제1음성인식엔진은 '하이 제네시스'를 호출명령어로 인식하고, 제어부(20)는 '하이 제네시스' 이후에 추가로 수신된 음성이 존재하는 것으로 판단할 수 있다. According to an embodiment, when the input unit 10 receives a voice saying 'High Genesis-Hong Gil-dong Call', the first voice recognition engine according to the present invention recognizes 'Hi Genesis' as a call command, and the control unit 20 may determine that there is an additionally received voice after 'High Genesis'.

이 경우, 제2음성인식엔진은 호출명령어 및 음성명령어를 모두 인식할 수 있기 때문에 '하이 제네시스-홍길동 전화걸기'를 동시에 인식할 수 있다. 여기서, 제2음성인식엔진은 호출명령어 및 음성명령어 사이에 묵음이 존재하지 않는 경우에도 인식할 수 있도록 설계되었기 때문에 '하이 제네세스'와 '홍길동 전화걸기'사이에 묵음이 존재하지 않더라도 인식가능하다.In this case, since the second voice recognition engine can recognize both a call command and a voice command, 'Hi Genesis-Hong Gil-dong Call' can be recognized at the same time. Here, since the second voice recognition engine is designed to recognize even when there is no silence between the call command and the voice command, it can be recognized even if there is no silence between 'High Geneses' and 'Call Hong Gil-dong' .

다른 실시예에 따라 입력부(10)에 '하이 제네시스'라는 음성이 수신된 경우, 제1음성인식엔진은 호출명령어를 인식하였지만, 호출명령어 이후 수신된 음성이 존재하지 않는다고 판단할 수 있으며, 이 경우에는 음성명령어의 수신을 대기하고, 제2음성인식엔진의 구동을 종료하도록 할 수 있다. According to another embodiment, when the voice 'Hi Genesis' is received in the input unit 10, the first voice recognition engine recognizes the call command, but it may be determined that the voice received after the call command does not exist, in this case to wait for the reception of a voice command, and to end the driving of the second voice recognition engine.

제어부(20)는 입력부(10)에서 수신된 명령어가 제1음성인식엔진 및 제2음성인식엔진에서 인식된 경우, 인식결과(dictation 결과)에 호출명령어가 포함되었는지 판단할 수 있다. 또한, 음성명령어의 인식률이 임계치 이상인지 판단할 수 있다. 인식결과에 호출명령어가 포함되지 않거나, 음성명령어의 인식률이 임계치 미만인 것으로 판단한 경우 음성인식이 실패한 것으로 판단할 수 있다. When the command received from the input unit 10 is recognized by the first voice recognition engine and the second voice recognition engine, the control unit 20 may determine whether a call command is included in the recognition result (dictation result). In addition, it may be determined whether the recognition rate of the voice command word is greater than or equal to a threshold value. When it is determined that the call command is not included in the recognition result or that the recognition rate of the voice command word is less than a threshold value, it may be determined that the voice recognition has failed.

한편, 제어부(20)는 인식결과에 호출명령어가 포함되거나 음성명령어의 인식률이 임계치 이상인 것으로 판단한 경우, 호출명령어를 인식결과에서 삭제하고 명령어 결과를 표시할 수 있다. 여기서 명령어 결과는 음성명령어의 실행결과를 의미할 수 있다. On the other hand, when it is determined that the call command word is included in the recognition result or the recognition rate of the voice command word is greater than or equal to the threshold value, the control unit 20 may delete the call command word from the recognition result and display the command result. Here, the command result may mean an execution result of the voice command command.

메모리(50)는 호출명령어 및 음성명령어의 조합으로 구성된 컨텍스트가 저장될 수 있고, 사용자의 음성으로 설계된 음향모델이 저장될 수 있다. 메모리(50)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory), SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기메모리, 자기 디스크, 또는 광디스크 타입의 저장매체를 포함할 수 있다.The memory 50 may store a context composed of a combination of a call command word and a voice command word, and an acoustic model designed with the user's voice may be stored. The memory 50 is a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (eg, SD or XD memory, etc.), RAM (RAM, Random Access Memory), SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic It may include a disk or optical disk type storage medium.

통신부(60)는 무선통신망 또는 유선통신망에 통해 외부기기와 각종 데이터를 주고 받을 수 있다. 여기서, 무선통신망은 데이터가 포함된 신호를 무선으로 주고 받을 수 있는 통신망을 의미한다. 예를 들어, 무선통신망은 3G 통신망, 4G 통신망뿐만 아니라 블루투스 통신망 등을 포함할 수 있다. 유선통신망은 데이터가 포함된 신호를 유선으로 주고 받을 수 있는 통신망을 의미한다. 예를 들어, 유선통신망은 PCI(Peripheral Component Interconnect), PCI-express, USB(Universe Serial Bus) 등을 포함할 수 있다. The communication unit 60 may exchange various data with an external device through a wireless communication network or a wired communication network. Here, the wireless communication network refers to a communication network capable of wirelessly transmitting and receiving signals including data. For example, the wireless communication network may include a 3G communication network and a 4G communication network as well as a Bluetooth communication network. The wired communication network refers to a communication network that can transmit and receive signals including data through a wired network. For example, the wired communication network may include Peripheral Component Interconnect (PCI), PCI-express, and Universal Serial Bus (USB).

통신부(60)는 통신망을 통해 사용자의 음성 명령 또는 음성 명령에 관한 분석 결과를 서버에 송신할 수 있으며, 서버로부터 이에 관한 처리 결과를 수신할 수 있다. 하지만 이에 한정되는 것은 아니다. 여기서, 서버는 음성인식서버를 포함할 수 있고, 대어휘(large vocabulary)를 검색할 수 있도록 구성될 수 있다. The communication unit 60 may transmit a user's voice command or an analysis result regarding the voice command to the server through a communication network, and may receive a processing result related thereto from the server. However, the present invention is not limited thereto. Here, the server may include a voice recognition server, and may be configured to search a large vocabulary.

참고로, 통신부(60)는 음성인식서버로부터 입력 받은 음성 명령을 전달하거나 또는 이에 관한 파형, 음소 시퀀스 등 다양한 분석 결과를 전달할 수 있다. 이에 따라 서버는 이를 기초로 음성 명령을 인식하고, 음성인식에 관한 처리 결과를 전달할 수 있다. For reference, the communication unit 60 may transmit a voice command inputted from the voice recognition server or transmit various analysis results such as a waveform and a phoneme sequence related thereto. Accordingly, the server may recognize the voice command based on this and deliver the processing result related to the voice recognition.

도 3은 본 발명의 음성인식방법을 나타낸 순서도이다.3 is a flowchart illustrating a voice recognition method of the present invention.

도 3에 도시된 바와 같이, 먼저 사용자로부터 명령어를 수신한다(S100). S100에서 수신되는 명령어는 호출명령어 및 음성명령어가 연속되는 자연어를 포함할 수 있다. 여기서 호출명령어는 음성인식장치를 활성화시키기 위한 명령어일 수 있으며, 음성명령어는 네트워크를 통해 서버에서 음성인식이 이루어지도록 하는 명령어를 포함할 수 있다. 보다 구체적으로 음성명령어는 사용자가 소정의 기능을 실행하기 위한 명령어를 포함할 수 있다.As shown in FIG. 3 , a command is first received from the user ( S100 ). The command received in S100 may include a natural language in which a call command word and a voice command word are continuous. Here, the call command may be a command for activating the voice recognition device, and the voice command may include a command for performing voice recognition in a server through a network. More specifically, the voice command word may include a command for the user to execute a predetermined function.

수신된 명령어에 대하여 호출명령어 및 음성명령어를 동시에 인식하기 위하여, 제1음성인식엔진 및 제2음성인식엔진을 동시에 구동한다(S110,S120). 제1음성인식엔진은 호출명령어를 인식할 수 있고, 제2음성인식엔진은 호출명령어 및 음성명령어를 인식할 수 있다. In order to simultaneously recognize a call command and a voice command with respect to the received command, the first voice recognition engine and the second voice recognition engine are simultaneously driven (S110 and S120). The first voice recognition engine may recognize a call command, and the second voice recognition engine may recognize a call command and a voice command.

호출명령어의 인식률이 임계치 이상인지 판단한다(S130). S130에서 호출명령어의 인식률이 임계치 미만(No)이면, 호출명령어의 인식실패로 판단하여 음성인식을 중단한다. 실시예에 따르면 입력부가 '싸이 노래가 좋아'를 수신한 경우, 본 발명의 제1음성인식엔진은 '싸이'를 호출명령어의 인식률이 임계치 미만인 것으로 판단하여 '싸이'를 호출명령어로 인식하지 않고, 호출명령어의 인식실패로 판단하여 음성인식을 중단할 수 있다. It is determined whether the recognition rate of the call command is greater than or equal to a threshold (S130). If the recognition rate of the call command is less than the threshold (No) in S130, it is determined that the call command has failed and the voice recognition is stopped. According to the embodiment, when the input unit receives 'I like Psy's song', the first voice recognition engine of the present invention determines that the recognition rate of 'Psy' is less than a threshold value and does not recognize 'Psy' as a call command. , it is determined that the recognition of the call command has failed, and the voice recognition can be stopped.

S130에서 호출명령어의 인식률이 임계치 이상(Yes)이면 호출명령어가 수신된 이후 추가로 수신된 음성이 존재하는지 판단한다(S140). 실시예에 따르면, 입력부가 '하이 제네시스-홍길동 전화걸기'를 수신한 경우, 호출명령어가 수신된 이후 추가로 수신된 음성이 존재하는 것으로 판단할 수 있다. 호출명령어가 수신된 이후 추가로 수신된 음성이 존재하지 않는 경우에는 음성명령어를 수신하도록 하고, S120을 종료할 수 있다.If the recognition rate of the call command is higher than the threshold (Yes) in S130, it is determined whether there is an additionally received voice after the call command is received (S140). According to an embodiment, when the input unit receives the 'High Genesis-Hong Gil-dong call', it may be determined that there is an additionally received voice after the call command is received. If there is no additionally received voice after the call command is received, the voice command can be received and S120 can be terminated.

S110 및 S120에서 수행된 음성인식의 결과(dictation 결과)에 호출명령어가 포함되었는지 판단한다(S150). S150 단계에서는 S120에서 수행된 음성명령어의 인식률이 임계치 이상인지 판단할 수 있다. It is determined whether a call command is included in the result (dictation result) of the voice recognition performed in S110 and S120 (S150). In step S150, it may be determined whether the recognition rate of the voice command performed in S120 is equal to or greater than a threshold.

S150에서 호출명령어가 포함된 것으로 판단되거나 음성명령어의 인식률이 임계치 이상인 것으로 판단되는 경우, 호출명령어를 인식결과에서 삭제한다(S170). S170는 사용자로부터 수신된 명령어의 인식결과로부터 호출명령어를 삭제하여, 음성명령어에 대해서만 실행되도록 하는 것으로 이해될 수 있다. If it is determined in S150 that the call command is included or the recognition rate of the voice command word is greater than or equal to the threshold, the call command is deleted from the recognition result (S170). S170 may be understood to delete the call command from the recognition result of the command received from the user, so that only the voice command is executed.

그리고, 명령어의 실행결과를 표시한다(S180). 명령어의 실행 결과는 음성명령어의 실행결과를 의미할 수 있다. Then, the execution result of the command is displayed (S180). The execution result of the command may mean the execution result of the voice command.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. The above description is merely illustrative of the technical spirit of the present invention, and various modifications and variations will be possible without departing from the essential characteristics of the present invention by those skilled in the art to which the present invention pertains.

따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Accordingly, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

입력부 10
제어부 20
제1음성인식엔진 30
제2음성인식엔진 40
메모리 50
통신부 60input 10
Control 20
1st voice recognition engine 30
2nd voice recognition engine 40
memory 50
communication department 60

Claims

an input unit for receiving a command including a natural language in which a call command and a voice command are continuous from a user; and
a first voice recognition engine for recognizing the call command word included in the command and a second speech recognition engine for recognizing the call command word and the voice command word, wherein the natural language in which the call command word and the voice command word are continuous is the input and a controller configured to recognize the command by simultaneously driving the first voice recognition engine and the second voice recognition engine when received through the unit.

The method according to claim 1,
The second voice recognition engine
A voice recognition system for a vehicle, characterized in that the command is recognized based on a context configured including a case in which silence exists between the call command word and the voice command word and a case in which the silence does not exist.

3. The method according to claim 2,
The second voice recognition engine
and a conversion unit for converting a grapheme of the context into a phoneme.

The method according to claim 1,
the control unit
When the recognition rate of the call command is less than a threshold value, the vehicle voice recognition system, characterized in that it is determined that the recognition of the command has failed.

The method according to claim 1,
the control unit
A voice recognition system for a vehicle, characterized in that the voice command is received when there is no additionally received voice after the call command is received.

The method according to claim 1,
the control unit
When the received voice does not exist after the call command is received, the driving of the second voice recognition engine is terminated.

The method according to claim 1,
the control unit
The voice recognition system for a vehicle, characterized in that when the call command is not included in the voice recognition results of the first voice recognition engine and the second voice recognition engine, it is determined that recognition of the command has failed.

The method according to claim 1,
the control unit
If the call command word is included in the speech recognition results of the first speech recognition engine and the second speech recognition engine, the call command word is deleted and the voice command word is executed.

Receiving a command including a natural language in which a call command and a voice command are continuous from a user;
When the command including the natural language in which the call command and the voice command are continuous is received, a first voice recognition engine for recognizing the call command and a second voice recognition engine for recognizing the call command and the voice command are simultaneously driven to recognize the command;
determining whether the call command is included in the voice recognition result of the command; and
and deleting the call command word from the voice recognition result of the command and executing the voice command word.

10. The method of claim 9,
Recognizing the command
Recognizing the command by driving the second voice recognition engine based on a context configured including a case in which silence exists between the call command and the voice command word and a case in which the silence does not exist vehicle voice recognition method.

11. The method of claim 10,
Recognizing the command
The voice recognition method of a vehicle, characterized in that the command is recognized by converting a grapheme of the context into a phoneme.

10. The method of claim 9,
After recognizing the command,
determining whether the recognition rate of the call command is greater than or equal to a threshold value; and
If the recognition rate of the call command is greater than or equal to a threshold value, the method further comprising determining whether there is an additionally received voice after the received call command.

13. The method of claim 12,
In the step of determining whether the recognition rate of the call command is greater than or equal to a threshold,
When the recognition rate of the call command is less than a threshold value, the vehicle voice recognition method, characterized in that it is determined that the recognition of the command has failed.

13. The method of claim 12,
If the recognition rate of the call command is greater than or equal to a threshold, in the step of determining whether there is an additionally received voice after the received call command,
When the received voice does not exist after the call command is received, the driving of the second voice recognition engine is terminated.

10. The method of claim 9,
In the step of determining whether the call command word is included in the voice recognition result of the command,
The voice recognition method of a vehicle, characterized in that when the call command is not included in the voice recognition results of the first voice recognition engine and the second voice recognition engine, it is determined that recognition of the command has failed.