CN102332265A

CN102332265A - Method for improving voice recognition rate of automobile voice control system

Info

Publication number: CN102332265A
Application number: CN201110164289A
Authority: CN
Inventors: 张方伟; 邓健; 陈冰; 朱祝阳; 丁武俊; 熊想涛; 陈文强; 潘之杰; 赵福全
Original assignee: Zhejiang Geely Holding Group Co Ltd; Zhejiang Geely Automobile Research Institute Co Ltd
Current assignee: Zhejiang Geely Holding Group Co Ltd; Zhejiang Geely Automobile Research Institute Co Ltd
Priority date: 2011-06-20
Filing date: 2011-06-20
Publication date: 2012-01-25
Anticipated expiration: 2031-06-20
Also published as: CN102332265B

Abstract

The invention discloses a method for improving the voice recognition rate of an automobile voice control system. The method comprises the following steps of: (1) configuring a voice prompt module for the automobile voice control system to output a voice prompt signal to perform time-sharing control on entertainment signals played by an automobile sound system; (2) configuring a process of voice instruction classification method using classifying input and classifying recognition for the voice recognition module of the automobile voice control system; and (3) configuring a process using classified voice instruction robustness recognition method for the voice recognition module of the automobile voice control system. The method provided by the invention can effectively improve the voice difference factors influencing the recognition rate of the automobile voice recognition system, thereby keeping the expected accurate recognition rate and improving the voice recognition reliability and stability of the automobile voice recognition system. The method can be also applied to electrical appliance voice control systems of automobiles of different grades.

Description

A kind of method that improves automobile voice activated control phonetic recognization rate

Technical field

The invention belongs to car electrics control technology field, relate to the speech recognition system of automobile voice acoustic control, relate in particular to a kind of method that improves automobile voice activated control phonetic recognization rate, be used for the speech recognition of automobile voice voice activated control.

Background technology

In present stage, be used for improving the method for speech recognition system discrimination, often just on algorithm or hardware device, make an issue of.Rapid Expansion along with automobile market; The continuous expansion of automobile consumption crowd troop; Automobile consumption crowd's structure generation great variety, is done manual work to the common people from the ECO white collar to the rural area of the length and breadth of land from the city of prosperity; Well educated to the low educational background of obligation teaching from higher education, same serious hope is expressed to the pursuit of automotive performance by these consumer groups.But pronunciation is not a standard very in numerous car owners, and China is the multi-ethnic big country that has vast territory; Do not say that native language and various places local accent vary, just big zone receives the influence of local accent such as the pronunciation of the mandarin in the China north and southern mandarin; Distinguish greatlyyer, the pronunciation local, colloquial expression that is exactly the Pekinese has weighed, and also has than big difference with standard mandarin; At cacuminal, there is bigger difference on pre-nasal sound and the back nasal sound especially.This is a unavoidable key factor that influences the automobile speech recognition systems discrimination, and it is directly connected to the big problem of automobile speech recognition systems discrimination.At algorithm or the hardware device speech recognition system discrimination of getting on the car, how to adopt several different methods further to improve the speech recognition system discrimination simultaneously, this is the important topic that each big automobile enterprise of China Automobile Industry and even the world extremely pays close attention to.The speech recognition of prior art, like application number 200810103679.1, name is called the patent of invention of " based on the vehicle electrical apparatus sound control method of command word list ", and this method comprises: generate a voice vocabulary and noise vocabulary; Speech recognition engine record driver's the sound of speaking; Merge said voice vocabulary and noise vocabulary; Be loaded into after the merging in the speech recognition engine, select pronunciation and driver's the most close word of sound of speaking from said voice vocabulary; According to the word of speech recognition engine output, driver's the intention of speaking is discerned the electrical equipment control order of the result of identification for confirming; The electrical equipment control command conversion that obtains is become the CAN bus control signal, output on the CAN bus, in order to electrical equipment is controlled.Though this control method has stronger inhibition ability to the noise under the car steering environment, not enough below existing: there be the influence of driver's speech utterance to the automobile speech recognition systems discrimination in (1).(2) can not discern the several passengers' beyond the driver voice.(3) select pronunciation and driver's the most close word of sound of speaking from said voice vocabulary, the reliability that driver's the intention of speaking is discerned does not have substantial raising.

At present on the automobile market, more existing automobile brands are controlled vehicle with sound control technique and are carried out some actions, like playing back music, open reading lamp etc.The development of automobile speech control system is in the good impetus of rising always, and patents such as " the automobile entertainment speech control systems " of the application number 200920315418 of prior art such as this group is seen in generation.The reliability of the speech recognition system of automobile speech control system and speech recognition probability are core technology problems always in the prior art.The wherein identification of the keyword of phonetic order and confirm just a lot of problems to be arranged; With acoustic control electrical equipment object is that air-conditioning is an example; Two basic keywords are respectively " opening air-conditioning " and " closing air-conditioning "; These two keywords itself just have two speech identical, and this can reduce the reliability of system identification to a great extent, even maloperation can occur.

Summary of the invention

The objective of the invention is in order to overcome the deficiency that prior art exists; A kind of method that improves the automobile speech recognition systems discrimination is proposed; Improve the voice variance factor that influences the automobile speech recognition systems discrimination; Can discern the several passengers' beyond the driver voice, and improve the automobile speech recognition systems discrimination.

The objective of the invention is to realize through following technical scheme.

A kind of method that improves the automobile speech recognition systems discrimination, it may further comprise the steps:

Step 1, be automobile voice activated control configured voice reminding module, the playback parts of voice cue module shared motor vehicle sound system, voice cue module output voice suggestion signal can be implemented timesharing control to the entertain mem signal that car audio system is play;

The sound identification module of step 2, automobile voice activated control also disposes the flow process of the phonetic order stage division that adopts classification input and hierarchical identification; Through with the phonetic order classification, make in to occur identical word less or the close word that pronounces with the sound instruction of one-level as far as possible, so just can improve the speech recognition reliability greatly;

The sound identification module of step 3, automobile voice activated control also disposes the flow process that adopts the recognition methods of classification phonetic order robustness; Be used to realize under the coarse condition of voice, make sound identification module can keep the identification probability of expection, thereby can improve the discrimination of automobile voice activated control speech recognition greatly.

Described method, its phonetic order stage division flow process that is said step 2 may further comprise the steps:

(1), starts the automobile voice activated control through button or non-button;

(2) voice cue module output voice suggestion signal " welcomes to use first order phonetic order ", comprises " Mytip is welcome you ", and the voice suggestion signal is play the content of " first order keyword phrase " through the playback parts of sound system;

(3) the voice voice activated control is gathered " first order keyword " voice signal that car takes advantage of personnel to send;

(4) sound identification module is done speech recognition to " first order keyword " voice signal, accomplishes the affirmation to first order phonetic order; If be judged as " denying ", judge that the voice that receive are not first order phonetic orders, return " first order keyword " voice that cars to be collected such as (3) takes advantage of personnel to send; If be judged as " being ", judge that the voice that receive are first order phonetic orders, carry out next step flow process (5);

(5) voice cue module is according to the different keywords of first order keyword; The different phonetic cue of output second level phonetic order; Comprise " second level phonetic order available commands is opened, stopped or closes "; " available commands liter, stop or fall ", voice suggestion signal are play the corresponding voice suggestion content of " second level keyword phrase " through the playback parts of sound system;

(6) the voice voice activated control is gathered " second level keyword " voice signal that car takes advantage of personnel to send;

(7) sound identification module is done speech recognition to " second level keyword ", accomplishes the affirmation to second level phonetic order; If be judged as " denying ", judge that the voice that receive are not second level phonetic orders, return " second level keyword " voice that cars to be collected such as (6) takes advantage of personnel to send; If be judged as " being ", judge that the voice that receive are second level phonetic orders, carry out next step flow process (8);

(8) output of voice voice activated control is accomplished the control of corresponding acoustic control electrical equipment by the control signal of first order phonetic order, the combination of second level phonetic order by system control module;

(9) process ends;

Through with the phonetic order classification, make in to occur identical word less or the close word that pronounces with the sound instruction of one-level as far as possible, so just can improve the speech recognition reliability greatly.

Described method, its method that is said classification phonetic order robustness identification may further comprise the steps:

(1) starts the voice voice activated control;

(2) voice voice activated control initialization;

1) definition and foundation are comparable to the non-accurate phonetic model database abbreviation phonetic order robustness phonetic model database of robustness of the accurate phonetic model of phonetic order; Consider the spoken voice of band of common phonetic order and the variance factor of received pronunciation; The accurate phonetic model that is the basis with the received pronunciation of phonetic order keyword; Set up phonetic order robustness phonetic model database; Its target is under phonetic order pronounces coarse condition, makes sound identification module can keep the accurate discrimination of expection, thereby improves the reliability and stability of voice voice activated control speech recognition.

2) confirm the non-accurate proximity criterion of robustness phonetic model database, comprise

A. be judged to be cacuminal voice phonetic and non-cacuminal voice phonetic close;

B. be judged to be the voice phonetic of pre-nasal sound and the voice phonetic of non-pre-nasal sound close;

C. be judged to be the voice phonetic of back nasal sound and the voice phonetic of non-back nasal sound close;

3) first order keyword phrase of definition first order phonetic order, the accurate phonetic model of each keyword received pronunciation and the robustness phonetic model thereof of structure first order keyword phrase;

The phonetic order first order keyword word bank that foundation comprises first order keyword phrase is called for short first order keyword word bank, and each " first order keyword " phonetic model comprises a robustness phonetic model that accurate phonetic model is close with several;

4) second level keyword phrase of definition second level phonetic order, the accurate phonetic model of keyword received pronunciation and the robustness phonetic model thereof of structure second level keyword phrase;

The phonetic order second level keyword word bank that foundation comprises second level keyword phrase is called for short second level keyword word bank, and each " second level keyword " phonetic model comprises a robustness phonetic model that accurate phonetic model is close with several;

(3) the speech recognition identification module receives the first order phonetic order that the instruction people sends;

(4) sound identification module calls " first order keyword word bank " earlier and does the identification of phonetic order robustness coupling; The robustness phonetic model close with several with an accurate phonetic model of each first order keyword compares, with " or " meet and be judged to be " coupling "; If be judged to be " denying ", return step (3); If be judged to be " being ", change step (5);

The keyword code of the first order phonetic order of (5) output coupling;

(6) the speech recognition identification module receives the second level phonetic order that the instruction people sends;

(7) sound identification module calls " first order keyword word bank " earlier and does the identification of phonetic order robustness coupling; The robustness phonetic model close with several with an accurate phonetic model of each first order keyword compares, with " or " meet and be judged to be " coupling "; If be judged to be " denying ", return step (6); If be judged to be " being ", change step (8);

The action keyword code of the second level phonetic order of (8) output coupling;

Call " second level keyword word bank " behind the sound identification module and do the identification of phonetic order robustness coupling; The robustness phonetic model close with several with an accurate phonetic model of each second keyword compares; With " or " meet; Then be judged to be " coupling ", export the code value of this second keyword;

(9) combined treatment made in the first order and the second level phonetic order keyword of coupling: the code value combination of the code value of first order keyword and second level keyword constitutes the combine voice instruction code;

The control signal of the combine voice instruction code of (10) output coupling;

(11) process ends.

When have in the phonetic order cacuminal or pre-nasal sound with after the speech of nasal sound; RP like " liter " is " sheng ", and this is the speech that sticks up nasal sound behind the tongue, close with it non-standard pronunciation non-cacuminal " sen " and non-cacuminal " seng " arranged; They non-standard pronunciation not with prerequisite that other phonetic order is obscured mutually under; " sheng " of above-mentioned RP and " sen " and " seng " of non-standard pronunciation are come together in the same phonetic order, by that analogy, be built into the automobile phonetic order database of taking into account northern regional mandarin and southern regional mandarin; In this database each phonetic order be comprise a plurality of close voice speech or the set; Speech recognition system is extracted a plurality of close voice speech of this or set respectively, identification one by one, and to the recognition result of a plurality of close voice speech do " or " handle.Thereby can improve the discrimination of speech recognition system greatly.

Described method, it is that the timesharing control of said step 1 is to generate the timesharing control signal with voice cue module output voice suggestion signal, the timesharing control signal is implemented timesharing to the entertain mem signal of sound system and is enabled control.

Described method, it is that said first order phonetic order comprises DVD sound equipment, air-conditioning, reading lamp, vehicle window, skylight, rearview mirror, boot and other controlling object title, and distributes a code for each first order phonetic order.

Described method, its be said second level phonetic order comprise with first order phonetic order in the action keyword that is complementary of controlling object title, and distribute a code for each second level phonetic order.

Described method, it is that said first order phonetic order code is associated with second level phonetic order code, I and II voice combination instruction code is made up of both code combinations; Be used to implement relevant control to corresponding electric appliance.

Described method, it is that the action keyword of said second level phonetic order allows to be complementary with a plurality of first order phonetic orders.

Described method, its action keyword that is said second level phonetic order comprises: unlatching and synonym thereof are opened, open, open, play, answer, are navigated; Stop and synonym stops, stops, breaks; Close and synonym closes, shuts, withdraws from, breaks off.

Variance factor in view of spoken voice of the band of phonetic order and received pronunciation; The accurate phonetic model that is the basis with the received pronunciation of phonetic order keyword; Set up phonetic order robustness phonetic model database; Its target is under phonetic order pronounces coarse condition, makes sound identification module can keep the accurate discrimination of expection, thereby improves the reliability and stability of voice voice activated control speech recognition.

Substantial effect of the present invention:

1, the inventive method is effectively improved influences the voice of automobile speech recognition systems discrimination variance factor,

2, the inventive method serves as an identification comparison sample with phonetic order robustness phonetic model database; Sound identification module is effectively improved influences the voice of automobile speech recognition systems discrimination variance factor; The accurate discrimination that keeps expection improves the reliability and stability of automobile speech recognition systems speech recognition.

3, the inventive method allows comprising driver's several passengers, carries out phonetic order identification, meets vast automobile consumption crowd's actual user demand.

4, adopt the inventive method can be applied to the electrical equipment voice activated control of the automobile of class separately.

Description of drawings

Fig. 1 phonetic order stage division of the present invention logical flow chart.

The robustness method logical flow chart of Fig. 2 classification phonetic order of the present invention.

The process flow diagram of Fig. 3 embodiment of the invention sound instruction stage division.

The process flow diagram of Fig. 4 embodiment of the invention classification acoustic control phonetic order robustness recognition methods.

Embodiment

Below through embodiment and combine accompanying drawing that technical scheme of the present invention is done further to specify.

Usually the formation of automobile speech recognition systems comprises speech transducer, sound identification module, command execution module and voice cue module, and the method for raising automobile speech recognition systems discrimination of the present invention is based on the automobile speech recognition systems of prior art.

It is as shown in Figure 1 that the present invention improves the phonetic order stage division flow process of automobile speech recognition systems discrimination.Phonetic order stage division flow process may further comprise the steps:

S101 starts the voice voice activated control through button;

The voice cue module of S102 voice voice activated control is sent voice suggestion: welcome to use first order phonetic order " first order keyword phrase " (the keyword phrase comprises the title of whole automobile voice acoustic control objects);

S103 voice voice activated control is gathered " first order keyword " voice signal that car takes advantage of personnel to send through the voice sensing;

The S104 sound identification module is discerned the first order keyword of phonetic order and is confirmed; As if discerning and confirming as " denying ", return S103; If discern and confirm as " being ", change S105;

Voice suggestion is sent in the voice cue module continuation of S105 voice voice activated control: welcome to use second level phonetic order " second level keyword phrase " (the keyword phrase comprises the control action name with whole automobile voice acoustic control objects);

S106 voice voice activated control continues to gather " second level keyword " voice signal that car takes advantage of personnel to send through the voice sensing;

The S107 sound identification module is discerned the second level keyword of phonetic order and is confirmed; As if discerning and confirming as " denying ", return S106; If discern and confirm as " being ", change S108;

The S108 voice voice activated control combination first order and second level phonetic order, and output control signal word is to command execution module;

The S109 process ends.

The present invention improves the flow process of the classification phonetic order robustness recognition methods of automobile speech recognition systems discrimination, referring to Fig. 2.The flow process of classification phonetic order robustness recognition methods may further comprise the steps:

S201 starts the voice voice activated control through button;

The initialization of S202 voice voice activated control comprises:

The S203 definition is also set up phonetic order robustness phonetic model database;

S204 confirms the non-accurate proximity criterion of robustness phonetic model database;

The first order keyword phrase of S205 definition first order phonetic order is also set up phonetic order first order keyword word bank;

The second level keyword phrase of S206 definition second level phonetic order is also set up phonetic order second level keyword word bank;

The S207 sound identification module receives the first order phonetic order that car takes advantage of personnel to send;

The S208 sound identification module is done the identification of robustness coupling with first order keyword word bank to first order phonetic order; If coupling is identified as " denying ", return S207; If coupling is identified as " being ", change S209, simultaneously, change S210;

The first order phonetic order code value of S209 output coupling is delivered to S213;

The S210 sound identification module continues to receive the second level phonetic order that car takes advantage of personnel to send;

The S211 sound identification module is done the identification of robustness coupling with second level keyword word bank to second level phonetic order; If coupling is identified as " denying ", return S210; If coupling is identified as " being ", change S212;

The second level phonetic order code value of S212 output coupling is delivered to S213;

The S213 sound identification module is made combined treatment to the first order and the second level phonetic order code of coupling;

The control signal of the combine voice instruction code of S214 output coupling;

The S215 process ends.

Fig. 3 has provided the process flow diagram of embodiment of the invention phonetic order stage division.

Phonetic order is carried out classification according to certain principle; Make each grade phonetic order after the classification not have close keyword basically; The present invention is through being divided into two-stage with module and action; Shown in Figure 3, with originally " opening air-conditioning " and " closing air-conditioning " with 4 keywords of one-level, " reading lamp is bright " and " reading lamp goes out " is divided into secondary: the first order is " air-conditioning " and " reading lamp "; The second level is according to instruction respectively has 2 keywords to be to the first order: " opening " or " closing ", " bright " or " going out ".

Phonetic order classification action to other sound-controlled electric module can realize with this phonetic order stage division.

Fig. 4 has provided the process flow diagram of embodiment of the invention classification acoustic control phonetic order robustness recognition methods.

The tip of the tongue was tip-tilted zh ch sh r when the described cacuminal rhythm symbol of the Chinese phonetic alphabet comprised pronunciation.

The rhythm symbol of described pre-nasal sound comprises an, en, and in, un, the rhythm symbol of described back nasal sound comprises ang, eng, ing, ong.

When have in the instruction voice keyword of voice activated control stick up tongue after nasal sound " liter " " sheng "; To not stick up close " shen " of tongue or back nasal sound; " sen ", " seng " three voice all are identified as keyword " liter ", even car owner user's pronunciation can not divide situation such as cacuminal, back nasal sound or pre-nasal sound like this; Can not discern the problem that maybe can't discern yet, thereby improve the speech recognition system discrimination.

Embodiment of the invention classification acoustic control phonetic order robustness identification process is a first order phonetic order keyword with " air-conditioning ", " reading lamp ", and second level phonetic order " opens " or " closing ", " bright " or " going out " are made illustrative and described.

(2) voice cue module output voice suggestion signal " first order phonetic order is used in welcome: DVD sound equipment, air-conditioning, reading lamp, vehicle window, skylight, rearview mirror and other controlling object title "; To each first order phonetic order allocation of codes, the voice suggestion signal is play through the playback parts of sound system;

(3) the system voice identification module receives first order phonetic order " air-conditioning ", " reading lamp ", " ... " and identification and the affirmation that car takes advantage of personnel to send, and output first order phonetic order confirmation signal is delivered to voice cue module;

(4) voice cue module output voice suggestion signal: available second level phonetic order " opens " or " closing ", " bright " or " going out ",

(5) the system voice identification module receive that second level phonetic order that car takes advantage of personnel to send " is opened " or " closing ", " bright " or " going out " ... And identification and affirmation; Output second level phonetic order confirmation signal is to sound identification module;

(6) control and executive module of voice activated control is arrived in sound identification module output first and second grade voice combined commands " air-conditioning is opened "/" air-conditioning is closed ", " reading lamp is bright "/" reading lamp goes out ", " ... ";

(7) carry out the combination of first and second grade voice combined command;

(8) export the corresponding electrical appliance control signal of first and second grade voice combined command.

It will be understood by those skilled in the art that and under the prerequisite that does not deviate from broad scope of the present invention, the foregoing description is made some changes.Thereby the present invention is not limited in disclosed specific embodiment.Its scope should contain core of the present invention and the interior all changes of protection domain that appended claims limits.

Claims

1. method that improves automobile voice activated control phonetic recognization rate, it may further comprise the steps:

2. method according to claim 1 is characterized in that, the phonetic order stage division flow process of said step 2 may further comprise the steps:

(5) voice cue module is according to the different keywords of first order keyword; The different phonetic cue of output second level phonetic order: " continuing to use second level phonetic order "; Comprise that available commands opens, stops or close "; " available commands liter, stop or fall ", the voice suggestion signal is play the corresponding voice suggestion content of " second level keyword phrase " through the playback parts of sound system;

(9) process ends;

3. method according to claim 1 is characterized in that, the method for said classification phonetic order robustness identification may further comprise the steps:

(1) starts the voice voice activated control;

(2) voice voice activated control initialization;

1) definition and foundation are comparable to the non-accurate phonetic model database abbreviation phonetic order robustness phonetic model database of robustness of the accurate phonetic model of phonetic order;

The keyword code of the first order phonetic order of (5) output coupling;

(11) process ends.

4. method according to claim 1 is characterized in that, the timesharing control of said step 1 is to generate the timesharing control signal with voice cue module output voice suggestion signal, and the timesharing control signal is implemented timesharing to the entertain mem signal of sound system and enabled control.

5. method according to claim 2; It is characterized in that; Said first order phonetic order comprises DVD sound equipment, air-conditioning, reading lamp, vehicle window, skylight, rearview mirror, boot and other controlling object title, and distributes a code for each first order phonetic order.

6. method according to claim 2 is characterized in that, said second level phonetic order comprise with first order phonetic order in the action keyword that is complementary of controlling object title, and distribute a code for each second level phonetic order.

7. according to claim 5 or 6 described methods, it is characterized in that said first order phonetic order code is associated with second level phonetic order code, I and II voice combination instruction code is made up of both code combinations; Be used to implement relevant control to corresponding electric appliance.

8. method according to claim 7 is characterized in that, the action keyword of said second level phonetic order allows to be complementary with a plurality of first order phonetic orders.

9. according to claim 6 or 7 described methods, it is characterized in that the action keyword of said second level phonetic order comprises: unlatching and synonym thereof are opened, open, open, play, answer, are navigated; Stop and synonym stops, stops, breaks; Close and synonym closes, shuts, withdraws from, breaks off.