KR20130041421A - Voice recognition multimodality system based on touch - Google Patents

Voice recognition multimodality system based on touch Download PDF

Info

Publication number
KR20130041421A
KR20130041421A KR1020110105641A KR20110105641A KR20130041421A KR 20130041421 A KR20130041421 A KR 20130041421A KR 1020110105641 A KR1020110105641 A KR 1020110105641A KR 20110105641 A KR20110105641 A KR 20110105641A KR 20130041421 A KR20130041421 A KR 20130041421A
Authority
KR
South Korea
Prior art keywords
voice
touch
peripheral device
recognition
command
Prior art date
Application number
KR1020110105641A
Other languages
Korean (ko)
Inventor
허동필
서재암
박재우
배현철
윤헌중
Original Assignee
현대자동차주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 현대자동차주식회사 filed Critical 현대자동차주식회사
Priority to KR1020110105641A priority Critical patent/KR20130041421A/en
Publication of KR20130041421A publication Critical patent/KR20130041421A/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

PURPOSE: A touch-based voice recognition multi-modality system is provided to reduce the inconvenience of recognizing a manipulating method by using a voice recognition technique through simple manipulation. CONSTITUTION: If a user touches a peripheral device(500) in a vehicle, a touch recognition unit(100) recognizes the touch. If A voice command is inputted through a voice recognition unit in the touch state, a voice command processing unit(300) recognizes a voice command operating the touched peripheral device and delivers a control command to the peripheral device. [Reference numerals] (100) Touch recognition unit; (200) Voice input unit; (310) Touch information confirmation unit; (320) Voice analysis unit; (340) Control module; (AA) Voice recognition; (BB) Articulation verification; (CC) Vehicle communication module; (DD) Door module; (EE) Air-conditioning; (FF) Others;

Description

Touch-based Speech Recognition Multimodality System {VOICE RECOGNITION MULTIMODALITY SYSTEM BASED ON TOUCH}

According to the present invention, when a user wants to control a peripheral device using a voice command, the touch-based voice can be conveniently used by allowing the user to recognize the voice command limited to the peripheral device by directly touching the peripheral device to perform a voice command. A recognition multimodality system.

Voice recognition technology can be driven only by the driver's voice command, but the voice recognition technology applied to the car is not applicable to many fields, but the power window, wiper, emergency lamp, air conditioner, audio, etc. provided in the car Various peripheral devices that provide safety and convenience to the user are conveniently used through the voice of a driver.

The voice recognition technology is configured to have a configuration in which the voice command recognition device distinguishes the voice command of the driver and transmits a control command to the corresponding peripheral device when the driver commands the operation of the peripheral device by voice.

1 briefly illustrates a process in which the above-described voice command recognition is performed.

In Figure 1, the voice input step (1) is a step of inputting the voice as an analog signal to the microphone, and the preprocessing step (2) is to filter and convert the input analog signal into a digital code which is a signal suitable for control. D conversion and the like. The digital voice signal formed by the preprocessing is subjected to vector extraction in the feature vector extraction step in predetermined frame units.

The feature vector extraction step (3) is a step of obtaining a feature vector representing the characteristics of each frame of the digital voice signal by dividing the sound signal inputted and converted into a digital code by frame, and the voice pattern classification step (4) is vector quantized The step of finding a parameter string having the same characteristics as the feature vector of the digital voice signal currently input from among the voice commands set in the step, and as a result, recognizes the voice command. The control step is a step of driving the device to be controlled according to the recognized voice command.

However, the existing voice recognition technology has a problem that the driver has to memorize because of the problem of inputting all the commands to drive the peripheral devices differently, and the same command has to be applied to each peripheral device, which is inconvenient to use. Of course, there is a difficulty, the voice recognition rate is also reduced by the use of the same command.

It should be understood that the foregoing description of the background art is merely for the purpose of promoting an understanding of the background of the present invention and is not to be construed as an admission that the prior art is known to those skilled in the art.

The present invention has been proposed to solve such a problem, and when a user wants to control a specific peripheral device by using a voice command, the user directly touches the peripheral device to perform a voice command so as to recognize the voice command. The object of the present invention is to provide a touch-based speech recognition multimodality system for easily and conveniently operating peripherals.

Touch-based speech recognition multi-modality system according to the present invention for achieving the above object is a touch recognition unit is installed to recognize when the user touches any one of the peripheral devices in the vehicle; When a voice command is input through the voice input means while the peripheral device is touched, the voice command input is limited to the touched peripheral device and recognizes that the voice command operates the peripheral device. Voice command processing unit for transmitting; .

The touch recognition unit may be any one of a switch, a touch sensor, an infrared sensor, and a camera.

The voice command processor receives the touch information of each peripheral device through the touch recognition unit, checks the touch information, selects the corresponding peripheral device, and analyzes the voice command input through the voice input unit. A voice analysis unit for extracting a feature vector required for recognition, a speech recognition unit for speech recognition using the feature vector extracted from the speech analysis unit, and a control for identifying a user's intention and controlling a peripheral device by receiving a speech recognition result It may include a module.

The voice recognition unit includes a command DB for storing a voice command corresponding to each peripheral device, and inputs a voice pattern and a voice input means preset in the voice command word DB of the corresponding peripheral device selected through the touch information checking unit of the command DB. The feature vectors extracted from the extracted voice commands can be compared and the result can be output.

In addition, the speech recognition unit includes a utterance model, a phoneme model, a pronunciation dictionary, a grammar model, a semantic model, and searches for words and sentences similar to feature vectors of a voice command input through a plurality of models to output recognition result candidates. can do.

According to the touch-based voice recognition multi-modality system having the above-described structure, it is possible to use the voice recognition technology through intuitive and simple operation to reduce the inconvenience of the user to recognize the operation method, touch the peripheral device By making the voice command, the voice recognition rate can be increased to reduce the complaints caused by the malfunction of the device, and it can be easily applied to the existing voice recognition technology, thereby reducing the economic burden.

1 is a block diagram schematically illustrating a voice command recognition method of a conventional car voice command recognition apparatus;
2 is a block diagram illustrating a touch-based speech recognition multimodality system according to an embodiment of the present invention;
3 is a block diagram showing a speech recognition unit in a touch-based speech recognition multimodality system according to an embodiment of the present invention;
Figure 4 is a block diagram showing another embodiment of the speech recognition unit in a touch-based speech recognition multi-modality system according to an embodiment of the present invention.

Hereinafter, a touch-based speech recognition multimodality system according to a preferred embodiment of the present invention will be described with reference to the accompanying drawings.

2 is a block diagram illustrating a touch-based speech recognition multi-modality system according to an embodiment of the present invention, the touch-based speech recognition multi-modality system according to an embodiment of the present invention is a conventional speech recognition method shown in FIG. Compared to the above, by checking whether the peripheral device is touched and allowing the user to recognize the voice command only for the peripheral device touched, it is possible to use the voice recognition technology more conveniently and conveniently.

To this end, when the peripheral device 500 is touched, the touch recognition unit 100 for recognizing this, and a voice command for operating the touched peripheral device when a voice command is input while the peripheral device 500 is touched. It is configured to include a voice command processing unit 300 to recognize and control to operate the peripheral device.

The touch recognition unit 100 is installed to recognize when a user touches any one of the peripheral devices 500 in the vehicle, and may be a switch, a touch sensor, or an infrared sensor installed in each peripheral device. In addition, it may be a camera installed in a position (vehicle ceiling, indoor part) where it is possible to check whether the peripheral device is touched without installing the sensor in the peripheral device, respectively. Here, the touch recognition unit 100 is for recognizing whether the user has touched the peripheral device, and is not limited to the sensor or the camera.

The information recognized as being touched by the touch recognition unit 100 is transmitted to the voice command processing unit 300, and the voice command processing unit 300 when inputting a voice command based on the information received from the touch recognition unit 100. It recognizes that only the peripheral device is a voice command, and the control command is transmitted to the peripheral device to operate.

The voice command processing unit 300 has a similar operation principle to that of a general voice recognition device, but the voice command processing unit 300 according to the present invention receives the touch information of each peripheral device through the touch recognition unit 100 and the touch information thereof. And a touch information verification unit 310 for selecting a corresponding peripheral device and a voice recognition unit 330 for recognizing the input voice.

In addition, a voice analyzer 320 for analyzing a voice command input through a voice input unit and extracting a feature vector required for voice recognition, and a voice recognition result output through the voice recognition unit, are received. It includes a control module 340 to grasp the intention of and control the peripheral device.

The touch information checking unit 310 checks the touch information through the touch recognition unit and guides the touch recognition unit to recognize the voice command only when the voice recognition unit 330 recognizes the voice.

3 is a block diagram showing a voice recognition unit in a touch-based voice recognition multi-modality system according to an embodiment of the present invention, the voice recognition unit 330 is a command DB (400) pre-stored voice commands corresponding to each peripheral device When a voice command is input, the voice command is compared with a preset voice pattern in the voice command word DB of the corresponding peripheral device selected by the touch information checker 310 and the result is output.

The output voice recognition result is to deliver the result to the control module 340 after the speech verification (reliability calculation).

At this time, the command DB 400 must be input in advance a voice command corresponding to each peripheral device, the input voice command is stored in the voice command DB of each peripheral device, if the voice is recognized that the user touched The voice command of the voice command stored in the voice command DB of the peripheral device is compared with the voice command input through the voice input unit to recognize the voice.

Figure 4 is a block diagram showing another embodiment of the speech recognition unit in the touch-based speech recognition multi-modality system according to an embodiment of the present invention, the speech recognition unit 330 is a speech model, phoneme model, pronunciation dictionary, grammar model It includes a model storage unit 410, the semantic model is stored, and search for words and sentences similar to the feature vector of the voice command input through a plurality of models stored in the model storage unit to output the recognition result candidates, The output recognition result candidate transmits the final result to the control module 340 after speech verification (reliability calculation).

The voice recognition unit 330 searches for a voice command having the largest similarity between the voice command and the pattern and outputs the result. When recognizing a feature vector of the input voice command, the voice recognition unit 330 uses a voice model, a phoneme model, and a pronunciation dictionary. The word model is generated and searched in word units, and grammar model and semantic model are searched and recognized in sentence units, and the recognition result is calculated by calculating the reliability of candidates. As a result of the recognition, the final output is transmitted to the control module 340.

At this time, if the reliability of the recognition result candidate is less than the threshold value, it is natural that the final recognition result is passed to the control module 340 after the error post-processing process for correcting the error of the recognized result.

As an embodiment using the touch-based voice recognition multi-modality system as described above, when the user's hand touches the audio video navigation system (AVN) and the voice command is "voice up", the voice command processor is limited to the touched AVN. After the voice recognition, the volume is increased, and in another embodiment, when the voice command is "parking" while the user touches the electronic parking brake (EPB), the parking operation is performed after the voice recognition is limited to the EPB.

As described above, the voice recognition rate can be increased by performing voice recognition only on the corresponding peripheral device touched by the user, thereby reducing inconvenience and complaints such as malfunction of the device due to voice recognition of all peripheral devices.

While the present invention has been particularly shown and described with reference to specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the following claims It will be apparent to those of ordinary skill in the art.

100: touch recognition unit 200: voice input means
300: voice command processing unit 310: touch information confirmation unit
320: speech analysis unit 330: speech recognition unit
340: control module 400: command DB
410: model storage unit 500: peripheral device

Claims (5)

A touch recognition unit installed to recognize when the user touches any one of the peripheral devices in the vehicle;
When a voice command is input through a voice input means while touching any one of the peripheral devices, the voice command input is limited to the touched peripheral device and recognizes that the voice command operates the peripheral device. A voice command processor for transmitting a control command to the voice command processor;
Touch-based speech recognition multi-modality system comprising a.
The method according to claim 1,
The touch recognition unit is a touch-based voice recognition multi-modality system, characterized in that any one of a switch, a touch sensor, an infrared sensor, a camera.
The method according to claim 1,
The voice command processor receives the touch information of each peripheral device through the touch recognition unit, checks the touch information, selects the peripheral device, and analyzes the voice command input through the voice input unit. A voice analysis unit for extracting a feature vector required for the voice, a voice recognition unit for recognizing the voice with the feature vector extracted from the voice analysis unit, and a control module for receiving a voice recognition result to determine the user's intention and control the peripheral device. Touch-based speech recognition multi-modality system comprising a.
The method according to claim 3,
The voice recognition unit includes a command DB for storing a voice command corresponding to each peripheral device, and inputs a voice pattern and a voice input means preset in the voice command word DB of the corresponding peripheral device selected through the touch information checking unit of the command DB. Touch-based speech recognition multi-modality system, characterized in that for comparing the extracted feature vector extracted from the speech command output.
The method according to claim 3,
The speech recognition unit includes a model storage unit for storing a phonation model, a phoneme model, a pronunciation dictionary, a grammar model, and a semantic model, and searches for words and sentences similar to the feature vectors of the voice command input through the model stored in the model storage unit. Touch-based speech recognition multimodality system, characterized in that for outputting the recognition result candidate.
KR1020110105641A 2011-10-17 2011-10-17 Voice recognition multimodality system based on touch KR20130041421A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020110105641A KR20130041421A (en) 2011-10-17 2011-10-17 Voice recognition multimodality system based on touch

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020110105641A KR20130041421A (en) 2011-10-17 2011-10-17 Voice recognition multimodality system based on touch

Publications (1)

Publication Number Publication Date
KR20130041421A true KR20130041421A (en) 2013-04-25

Family

ID=48440565

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020110105641A KR20130041421A (en) 2011-10-17 2011-10-17 Voice recognition multimodality system based on touch

Country Status (1)

Country Link
KR (1) KR20130041421A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101601226B1 (en) * 2014-11-03 2016-03-08 현대자동차주식회사 Vehicle control system and a control method thereof
KR20190041111A (en) * 2017-10-12 2019-04-22 현대자동차주식회사 Voice recognition of vehicle system for infering user intention and method for controlling thereof
KR20200050609A (en) 2018-11-02 2020-05-12 김종훈 Voice command based virtual touch input apparatus

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101601226B1 (en) * 2014-11-03 2016-03-08 현대자동차주식회사 Vehicle control system and a control method thereof
US9862340B2 (en) 2014-11-03 2018-01-09 Hyundai Motor Company Vehicle control system and control method thereof
KR20190041111A (en) * 2017-10-12 2019-04-22 현대자동차주식회사 Voice recognition of vehicle system for infering user intention and method for controlling thereof
KR20200050609A (en) 2018-11-02 2020-05-12 김종훈 Voice command based virtual touch input apparatus

Similar Documents

Publication Publication Date Title
US10854195B2 (en) Dialogue processing apparatus, a vehicle having same, and a dialogue processing method
KR101598948B1 (en) Speech recognition apparatus, vehicle having the same and speech recongition method
US9756161B2 (en) Voice recognition apparatus, vehicle having the same, and method of controlling the vehicle
US20150325240A1 (en) Method and system for speech input
US7634401B2 (en) Speech recognition method for determining missing speech
WO2015098109A1 (en) Speech recognition processing device, speech recognition processing method and display device
EP2562746A1 (en) Apparatus and method for recognizing voice by using lip image
CN104200805B (en) Driver's voice assistant
KR101579533B1 (en) Vehicle and controlling method for the same
CN104123939A (en) Substation inspection robot based voice interaction control method
JP2008268340A (en) Voice recognition device, voice recognition method, and program for voice recognition
JP2008058813A (en) Voice response system, and voice response program
CN105788596A (en) Speech recognition television control method and system
KR102192678B1 (en) Apparatus and method for normalizing input data of acoustic model, speech recognition apparatus
CN112346570A (en) Method and equipment for man-machine interaction based on voice and gestures
KR20130041421A (en) Voice recognition multimodality system based on touch
US10770070B2 (en) Voice recognition apparatus, vehicle including the same, and control method thereof
US7181396B2 (en) System and method for speech recognition utilizing a merged dictionary
JP2005234332A (en) Electronic equipment controller
JP2011203434A (en) Voice recognition device and voice recognition method
KR20140086302A (en) Apparatus and method for recognizing command using speech and gesture
JP2017081258A (en) Vehicle operation device
KR102417899B1 (en) Apparatus and method for recognizing voice of vehicle
CN109830239B (en) Speech processing device, speech recognition input system, and speech recognition input method
CN112331200A (en) Vehicle-mounted voice control method

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E601 Decision to refuse application