KR20130041421A

KR20130041421A - Voice recognition multimodality system based on touch

Info

Publication number: KR20130041421A
Application number: KR1020110105641A
Authority: KR
Inventors: 허동필; 서재암; 박재우; 배현철; 윤헌중
Original assignee: 현대자동차주식회사
Priority date: 2011-10-17
Filing date: 2011-10-17
Publication date: 2013-04-25

Abstract

PURPOSE: A touch-based voice recognition multi-modality system is provided to reduce the inconvenience of recognizing a manipulating method by using a voice recognition technique through simple manipulation. CONSTITUTION: If a user touches a peripheral device(500) in a vehicle, a touch recognition unit(100) recognizes the touch. If A voice command is inputted through a voice recognition unit in the touch state, a voice command processing unit(300) recognizes a voice command operating the touched peripheral device and delivers a control command to the peripheral device. [Reference numerals] (100) Touch recognition unit; (200) Voice input unit; (310) Touch information confirmation unit; (320) Voice analysis unit; (340) Control module; (AA) Voice recognition; (BB) Articulation verification; (CC) Vehicle communication module; (DD) Door module; (EE) Air-conditioning; (FF) Others;

Description

Touch-based Speech Recognition Multimodality System {VOICE RECOGNITION MULTIMODALITY SYSTEM BASED ON TOUCH}

According to the present invention, when a user wants to control a peripheral device using a voice command, the touch-based voice can be conveniently used by allowing the user to recognize the voice command limited to the peripheral device by directly touching the peripheral device to perform a voice command. A recognition multimodality system.

Voice recognition technology can be driven only by the driver's voice command, but the voice recognition technology applied to the car is not applicable to many fields, but the power window, wiper, emergency lamp, air conditioner, audio, etc. provided in the car Various peripheral devices that provide safety and convenience to the user are conveniently used through the voice of a driver.

The voice recognition technology is configured to have a configuration in which the voice command recognition device distinguishes the voice command of the driver and transmits a control command to the corresponding peripheral device when the driver commands the operation of the peripheral device by voice.

1 briefly illustrates a process in which the above-described voice command recognition is performed.

In Figure 1, the voice input step (1) is a step of inputting the voice as an analog signal to the microphone, and the preprocessing step (2) is to filter and convert the input analog signal into a digital code which is a signal suitable for control. D conversion and the like. The digital voice signal formed by the preprocessing is subjected to vector extraction in the feature vector extraction step in predetermined frame units.

The feature vector extraction step (3) is a step of obtaining a feature vector representing the characteristics of each frame of the digital voice signal by dividing the sound signal inputted and converted into a digital code by frame, and the voice pattern classification step (4) is vector quantized The step of finding a parameter string having the same characteristics as the feature vector of the digital voice signal currently input from among the voice commands set in the step, and as a result, recognizes the voice command. The control step is a step of driving the device to be controlled according to the recognized voice command.

However, the existing voice recognition technology has a problem that the driver has to memorize because of the problem of inputting all the commands to drive the peripheral devices differently, and the same command has to be applied to each peripheral device, which is inconvenient to use. Of course, there is a difficulty, the voice recognition rate is also reduced by the use of the same command.

It should be understood that the foregoing description of the background art is merely for the purpose of promoting an understanding of the background of the present invention and is not to be construed as an admission that the prior art is known to those skilled in the art.

The present invention has been proposed to solve such a problem, and when a user wants to control a specific peripheral device by using a voice command, the user directly touches the peripheral device to perform a voice command so as to recognize the voice command. The object of the present invention is to provide a touch-based speech recognition multimodality system for easily and conveniently operating peripherals.

Touch-based speech recognition multi-modality system according to the present invention for achieving the above object is a touch recognition unit is installed to recognize when the user touches any one of the peripheral devices in the vehicle; When a voice command is input through the voice input means while the peripheral device is touched, the voice command input is limited to the touched peripheral device and recognizes that the voice command operates the peripheral device. Voice command processing unit for transmitting; .

The touch recognition unit may be any one of a switch, a touch sensor, an infrared sensor, and a camera.

The voice command processor receives the touch information of each peripheral device through the touch recognition unit, checks the touch information, selects the corresponding peripheral device, and analyzes the voice command input through the voice input unit. A voice analysis unit for extracting a feature vector required for recognition, a speech recognition unit for speech recognition using the feature vector extracted from the speech analysis unit, and a control for identifying a user's intention and controlling a peripheral device by receiving a speech recognition result It may include a module.

The voice recognition unit includes a command DB for storing a voice command corresponding to each peripheral device, and inputs a voice pattern and a voice input means preset in the voice command word DB of the corresponding peripheral device selected through the touch information checking unit of the command DB. The feature vectors extracted from the extracted voice commands can be compared and the result can be output.

In addition, the speech recognition unit includes a utterance model, a phoneme model, a pronunciation dictionary, a grammar model, a semantic model, and searches for words and sentences similar to feature vectors of a voice command input through a plurality of models to output recognition result candidates. can do.

According to the touch-based voice recognition multi-modality system having the above-described structure, it is possible to use the voice recognition technology through intuitive and simple operation to reduce the inconvenience of the user to recognize the operation method, touch the peripheral device By making the voice command, the voice recognition rate can be increased to reduce the complaints caused by the malfunction of the device, and it can be easily applied to the existing voice recognition technology, thereby reducing the economic burden.

1 is a block diagram schematically illustrating a voice command recognition method of a conventional car voice command recognition apparatus;
2 is a block diagram illustrating a touch-based speech recognition multimodality system according to an embodiment of the present invention;
3 is a block diagram showing a speech recognition unit in a touch-based speech recognition multimodality system according to an embodiment of the present invention;
Figure 4 is a block diagram showing another embodiment of the speech recognition unit in a touch-based speech recognition multi-modality system according to an embodiment of the present invention.

Hereinafter, a touch-based speech recognition multimodality system according to a preferred embodiment of the present invention will be described with reference to the accompanying drawings.

2 is a block diagram illustrating a touch-based speech recognition multi-modality system according to an embodiment of the present invention, the touch-based speech recognition multi-modality system according to an embodiment of the present invention is a conventional speech recognition method shown in FIG. Compared to the above, by checking whether the peripheral device is touched and allowing the user to recognize the voice command only for the peripheral device touched, it is possible to use the voice recognition technology more conveniently and conveniently.

To this end, when the peripheral device 500 is touched, the touch recognition unit 100 for recognizing this, and a voice command for operating the touched peripheral device when a voice command is input while the peripheral device 500 is touched. It is configured to include a voice command processing unit 300 to recognize and control to operate the peripheral device.

The touch recognition unit 100 is installed to recognize when a user touches any one of the peripheral devices 500 in the vehicle, and may be a switch, a touch sensor, or an infrared sensor installed in each peripheral device. In addition, it may be a camera installed in a position (vehicle ceiling, indoor part) where it is possible to check whether the peripheral device is touched without installing the sensor in the peripheral device, respectively. Here, the touch recognition unit 100 is for recognizing whether the user has touched the peripheral device, and is not limited to the sensor or the camera.

The information recognized as being touched by the touch recognition unit 100 is transmitted to the voice command processing unit 300, and the voice command processing unit 300 when inputting a voice command based on the information received from the touch recognition unit 100. It recognizes that only the peripheral device is a voice command, and the control command is transmitted to the peripheral device to operate.

The voice command processing unit 300 has a similar operation principle to that of a general voice recognition device, but the voice command processing unit 300 according to the present invention receives the touch information of each peripheral device through the touch recognition unit 100 and the touch information thereof. And a touch information verification unit 310 for selecting a corresponding peripheral device and a voice recognition unit 330 for recognizing the input voice.

In addition, a voice analyzer 320 for analyzing a voice command input through a voice input unit and extracting a feature vector required for voice recognition, and a voice recognition result output through the voice recognition unit, are received. It includes a control module 340 to grasp the intention of and control the peripheral device.

The touch information checking unit 310 checks the touch information through the touch recognition unit and guides the touch recognition unit to recognize the voice command only when the voice recognition unit 330 recognizes the voice.

3 is a block diagram showing a voice recognition unit in a touch-based voice recognition multi-modality system according to an embodiment of the present invention, the voice recognition unit 330 is a command DB (400) pre-stored voice commands corresponding to each peripheral device When a voice command is input, the voice command is compared with a preset voice pattern in the voice command word DB of the corresponding peripheral device selected by the touch information checker 310 and the result is output.

The output voice recognition result is to deliver the result to the control module 340 after the speech verification (reliability calculation).

At this time, the command DB 400 must be input in advance a voice command corresponding to each peripheral device, the input voice command is stored in the voice command DB of each peripheral device, if the voice is recognized that the user touched The voice command of the voice command stored in the voice command DB of the peripheral device is compared with the voice command input through the voice input unit to recognize the voice.

Figure 4 is a block diagram showing another embodiment of the speech recognition unit in the touch-based speech recognition multi-modality system according to an embodiment of the present invention, the speech recognition unit 330 is a speech model, phoneme model, pronunciation dictionary, grammar model It includes a model storage unit 410, the semantic model is stored, and search for words and sentences similar to the feature vector of the voice command input through a plurality of models stored in the model storage unit to output the recognition result candidates, The output recognition result candidate transmits the final result to the control module 340 after speech verification (reliability calculation).

The voice recognition unit 330 searches for a voice command having the largest similarity between the voice command and the pattern and outputs the result. When recognizing a feature vector of the input voice command, the voice recognition unit 330 uses a voice model, a phoneme model, and a pronunciation dictionary. The word model is generated and searched in word units, and grammar model and semantic model are searched and recognized in sentence units, and the recognition result is calculated by calculating the reliability of candidates. As a result of the recognition, the final output is transmitted to the control module 340.

At this time, if the reliability of the recognition result candidate is less than the threshold value, it is natural that the final recognition result is passed to the control module 340 after the error post-processing process for correcting the error of the recognized result.

As an embodiment using the touch-based voice recognition multi-modality system as described above, when the user's hand touches the audio video navigation system (AVN) and the voice command is "voice up", the voice command processor is limited to the touched AVN. After the voice recognition, the volume is increased, and in another embodiment, when the voice command is "parking" while the user touches the electronic parking brake (EPB), the parking operation is performed after the voice recognition is limited to the EPB.

As described above, the voice recognition rate can be increased by performing voice recognition only on the corresponding peripheral device touched by the user, thereby reducing inconvenience and complaints such as malfunction of the device due to voice recognition of all peripheral devices.

While the present invention has been particularly shown and described with reference to specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the following claims It will be apparent to those of ordinary skill in the art.

100: touch recognition unit 200: voice input means
300: voice command processing unit 310: touch information confirmation unit
320: speech analysis unit 330: speech recognition unit
340: control module 400: command DB
410: model storage unit 500: peripheral device

Claims

A touch recognition unit installed to recognize when the user touches any one of the peripheral devices in the vehicle;
When a voice command is input through a voice input means while touching any one of the peripheral devices, the voice command input is limited to the touched peripheral device and recognizes that the voice command operates the peripheral device. A voice command processor for transmitting a control command to the voice command processor;
Touch-based speech recognition multi-modality system comprising a.

The method according to claim 1,
The touch recognition unit is a touch-based voice recognition multi-modality system, characterized in that any one of a switch, a touch sensor, an infrared sensor, a camera.

The method according to claim 1,
The voice command processor receives the touch information of each peripheral device through the touch recognition unit, checks the touch information, selects the peripheral device, and analyzes the voice command input through the voice input unit. A voice analysis unit for extracting a feature vector required for the voice, a voice recognition unit for recognizing the voice with the feature vector extracted from the voice analysis unit, and a control module for receiving a voice recognition result to determine the user's intention and control the peripheral device. Touch-based speech recognition multi-modality system comprising a.

The method according to claim 3,
The voice recognition unit includes a command DB for storing a voice command corresponding to each peripheral device, and inputs a voice pattern and a voice input means preset in the voice command word DB of the corresponding peripheral device selected through the touch information checking unit of the command DB. Touch-based speech recognition multi-modality system, characterized in that for comparing the extracted feature vector extracted from the speech command output.

The method according to claim 3,
The speech recognition unit includes a model storage unit for storing a phonation model, a phoneme model, a pronunciation dictionary, a grammar model, and a semantic model, and searches for words and sentences similar to the feature vectors of the voice command input through the model stored in the model storage unit. Touch-based speech recognition multimodality system, characterized in that for outputting the recognition result candidate.