CN103903617A

CN103903617A - Voice recognition method and electronic device

Info

Publication number: CN103903617A
Application number: CN201210568770.7A
Authority: CN
Inventors: 戴海生; 陆游龙
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2012-12-24
Filing date: 2012-12-24
Publication date: 2014-07-02

Abstract

The invention provides a voice recognition method and an electronic device, and is applied to a voice recognition system at least comprising a first recognition engine and a second engine. The method comprises: obtaining voice information to be recognized, based on the voice information to be recognized, obtaining at least one voice unit to be recognized at least comprising a first voice unit to be recognized; and based on the first recognition engine and the second recognition engine, recognizing the first voice unit to be recognized, and obtaining a first recognition result.

Description

A kind of audio recognition method and electronic equipment

Technical field

The application belongs to speech recognition technology field, is specifically related to a kind of audio recognition method and electronic equipment.

Background technology

Speech recognition technology is exactly to identify by electronic equipment the phonetic order that user sends, and then carries out corresponding operation, and no longer needs user manually to control electronic equipment.Speech recognition technology not only can be applied in the occasions such as phonetic dialing, Voice Navigation, the typing of dictation data, can also be applied in speech recognition retrieval.

At present, speech recognition system for example, for common large vocabulary,, include the vocabulary of millions of film names, music name, place name, in the process of search identification, can not distinguish these words, but in the universal identification engine that includes these words, identify one by one search.

Present inventor is realizing in the process of the embodiment of the present application technical scheme, at least finds to exist in prior art following technical matters:

Owing to there being a large amount of data in universal identification engine, and different words has similarity in pronunciation, in identification search procedure, tend to obtain the result that can not meet actual needs, have the technical matters that discrimination is low, for example, user sends voice operating order, " search Journey to the West ", the result identifying includes the too much incoherent result such as " Journey to the West play ", " grapefruit note ";

Owing to utilizing universal identification engine to identify search in prior art in millions of words, can there is technical matters of a specified duration consuming time again;

And then, because discrimination is low, consuming time for a long time, cause user to experience poor.

Summary of the invention

The embodiment of the present invention provides a kind of method and electronic equipment of speech recognition, the low technical matters of discrimination existing for solving prior art, has realized and increased substantially discrimination, has met again identifying and cover the technique effect of all words.

A kind of audio recognition method, is applied in the electronic equipment of the speech recognition system that at least comprises the first identification engine and the second identification engine, and described method comprises:

Obtain a voice messaging to be identified;

Based on described voice messaging to be identified, obtain at least one voice unit to be identified that at least comprises the first voice unit to be identified;

Based on described the first identification engine and described the second identification engine, described the first voice unit to be identified is identified, obtain the first recognition result.

Further, described the second identification engine is specially:

Based on preset rules, the first content of described the first identification in engine screened and the first identification engine that includes second content that obtains; Or

There is the second identification engine of three content different from the described first content of described the first identification in engine.

Further, in the time that described the second identification engine is specially described the first identification engine, describedly based on described the first identification engine or described the second identification engine, described the first voice unit to be identified is identified, is obtained the first recognition result, specifically comprise:

Based on described the first identification engine, described the first voice unit to be identified is identified, obtain the second recognition result;

Judge that whether described the second recognition result meets first pre-conditioned;

Meet described first when pre-conditioned at described the second recognition result, export described the second recognition result as described the first recognition result.

Further, described judge described the second recognition result whether meet first pre-conditioned after, described method also comprises:

Do not meet described first when pre-conditioned at described the second recognition result, based on described the first identification engine, described the first voice unit to be identified is identified, obtain described the first recognition result;

Export described the first recognition result.

Based on described the first identification engine, described the first voice unit to be identified is identified, obtain the 3rd recognition result;

Based on described the first identification engine, described the first voice unit to be identified is identified, obtain the 4th recognition result;

Judge that whether described the 3rd recognition result or described the 4th recognition result meet second pre-conditioned;

Meet described second when pre-conditioned at described the 3rd recognition result or described the 4th recognition result, export described the 3rd recognition result or described the 4th recognition result as described the first recognition result.

Further, in the time that described the second identification engine is specially described the second identification engine, describedly based on described the first identification engine or described the second identification engine, described the first voice unit to be identified is identified, is obtained the first recognition result, specifically comprise:

Based on described the first identification engine, described the first voice unit to be identified is identified, obtain the 5th recognition result;

Based on described the second identification engine, described the first voice unit to be identified is identified, obtain the 6th recognition result;

Judge that whether described the 5th recognition result and described the 6th recognition result meet the 3rd pre-conditioned;

Meet described the 3rd pre-conditioned and described the 6th recognition result at described the 5th recognition result and do not meet the described the 3rd when pre-conditioned, export described the 5th recognition result as described the first recognition result.

Further, described judge described the 5th recognition result and described the 6th recognition result whether meet the 3rd pre-conditioned after, described method also comprises:

Do not meet described the 3rd pre-conditioned and described the 6th recognition result at described the 5th recognition result and meet the described the 3rd when pre-conditioned, export described the 6th recognition result as described the first recognition result.

All meet the described the 3rd when pre-conditioned at described the 5th recognition result and described the 6th recognition result, export described the 5th recognition result or described the 6th recognition result as described the first recognition result.

A kind of electronic equipment, at least comprises the speech recognition system of the first identification engine and the second identification engine in described electronic equipment, described electronic equipment comprises:

First obtains unit, for obtaining a voice messaging to be identified;

Second obtains unit, for based on described voice messaging to be identified, obtains at least one voice unit to be identified that at least comprises the first voice unit to be identified;

Recognition unit, for based on described the first identification engine and described the second identification engine, identifies described the first voice unit to be identified, obtains the first recognition result.

Further, described the second identification engine is specially:

Further, in the time that described the second identification engine is specially described the first identification engine, described recognition unit specifically comprises:

The first recognin unit, for based on described the first identification engine, identifies described the first voice unit to be identified, obtains the second recognition result;

The first judgment sub-unit, first pre-conditioned for judging that whether described the second recognition result meets;

The first output subelement, for meeting described first at described the second recognition result when pre-conditioned, exports described the second recognition result as described the first recognition result.

Further, described recognition unit also comprises:

The second recognin unit, for not meeting described first at described the second recognition result when pre-conditioned, based on described the first identification engine, identifies described the first voice unit to be identified, obtains described the first recognition result;

The second output subelement, for exporting described the first recognition result.

The 3rd recognin unit, for based on described the first identification engine, identifies described the first voice unit to be identified, obtains the 3rd recognition result;

The 4th recognin unit, for based on described the first identification engine, identifies described the first voice unit to be identified, obtains the 4th recognition result;

The second judgment sub-unit, second pre-conditioned for judging that whether described the 3rd recognition result or described the 4th recognition result meet;

The 3rd output subelement, for meeting described second at described the 3rd recognition result or described the 4th recognition result when pre-conditioned, exports described the 3rd recognition result or described the 4th recognition result as described the first recognition result.

Further, in the time that described the second identification engine is specially described the second identification engine, described recognition unit specifically comprises:

The 5th recognin unit, for based on described the first identification engine, identifies described the first voice unit to be identified, obtains the 5th recognition result;

The 6th recognin unit, for based on described the second identification engine, identifies described the first voice unit to be identified, obtains the 6th recognition result;

The 3rd judgment sub-unit, the 3rd pre-conditioned for judging that whether described the 5th recognition result and described the 6th recognition result meet;

The 4th output subelement, does not meet the described the 3rd when pre-conditioned for meet described the 3rd pre-conditioned and described the 6th recognition result at described the 5th recognition result, exports described the 5th recognition result as described the first recognition result.

Further, described recognition unit also comprises:

The 5th output subelement, meets the described the 3rd when pre-conditioned for do not meet described the 3rd pre-conditioned and described the 6th recognition result at described the 5th recognition result, exports described the 6th recognition result as described the first recognition result.

Further, described recognition unit also comprises:

The 6th output subelement, for all meeting the described the 3rd at described the 5th recognition result and described the 6th recognition result when pre-conditioned, exports described the 5th recognition result or described the 6th recognition result as described the first recognition result.

The one or more technical schemes that provide in the embodiment of the present invention, at least have following technique effect or advantage:

By adopt at least two identification engines that include the first identification engine and the second identification engine in speech recognition system, at least one voice unit to be identified that at least comprises the first voice unit to be identified obtaining is identified, obtain the first recognition result, solve the low technical matters of discrimination existing in prior art, realize and increased substantially discrimination, met again identifying and cover the technique effect of all words;

Again, by adopting the second identification engine to be specially the second identification engine different from described the first identification engine content, universal identification engine of the prior art can be divided into several identification engines, and utilize in multiple identification engines that comprise different content and search for voice messaging to be identified, solve in prior art owing to utilizing universal identification engine to identify search in prior art in millions of words, can there is technical matters of a specified duration consuming time, realize reduction search time, improved the technique effect of the efficiency of identification search.

And then, owing to having improved discrimination and having reduced search time, user is experienced.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of audio recognition method in one embodiment of the invention;

Fig. 2 is the structural drawing of the second identification engine in one embodiment of the invention;

Fig. 3 is the structural drawing of electronic equipment in one embodiment of the invention.

Embodiment

Technical scheme in the embodiment of the present invention is for addressing the above problem, and general thought is as follows:

The present invention is by obtaining a voice messaging to be identified; Based on described voice messaging to be identified, obtain at least one voice unit to be identified that at least comprises the first voice unit to be identified; Based on described the first identification engine and described the second identification engine, described the first voice unit to be identified is identified, obtain the first recognition result, so solve the low technical matters of discrimination existing in prior art.

In order better to understand technique scheme, below in conjunction with Figure of description and concrete embodiment, technique scheme is described in detail.

One embodiment of the invention provides a kind of audio recognition method, be applied in the electronic equipment of the speech recognition system that at least comprises the first identification engine and the second identification engine, wherein, described speech recognition system at least comprises the first identification engine and the second identification engine, can also comprise multiple identification engines such as the 3rd identification engine, the 4th identification engine, described speech recognition system can be used for voice messaging to search for identification.

As shown in Figure 1, described audio recognition method comprises step:

S101: obtain a voice messaging to be identified.

S102: based on described voice messaging to be identified, obtain at least one voice unit to be identified that at least comprises the first voice unit to be identified.

In specific embodiment, user sends phonetic order to electronic equipment, includes described voice messaging to be identified in this phonetic order, includes at least one voice unit to be identified in described voice messaging to be identified.For example, when user sends phonetic order to electronic equipment, while " searching film Journey to the West ",, during voice messaging to be identified " is searched film Journey to the West ", include the first voice unit to be identified " Journey to the West ", the second voice unit to be identified " is searched ", three voice units to be identified of the 3rd voice unit to be identified " film ".In addition, in the embodiment of the present application, in order to improve recognition effect, after electronic equipment receives voice messaging, also can convert thereof into corresponding voice signal, and this voice signal is carried out to front-end processing, the impact bringing to eliminate noise and different speaker, makes signal after treatment more can reflect the essential characteristic of voice.In the embodiment of the present application, the most frequently used front-end processing technology has end-point detection and voice to strengthen.Certainly, those of ordinary skills can also use other front-end processing technology.

After based on S102, acquisition at least comprises at least one voice unit to be identified of the first voice unit to be identified, carry out S103: based on described the first identification engine and described the second identification engine, described the first voice unit to be identified is identified, obtained the first recognition result.

Wherein, as shown in Figure 2, described the second identification engine can comprise two kinds of identification engines:

The first identification engine 201: based on preset rules, the first content in described the first identification engine is screened and the identification engine that includes second content that obtains.Wherein, described second content is contained in described first content.

In specific embodiment, include multiple preset rules for the foundation of the first identification engine 201.

Rule one: use temperature according to user, for example, through the first content in the first identification engine is screened, user is used to the word that temperature is high, be the frequent frequent words using of user, be integrated into second content, and set up the first identification engine 201 based on described second content.

Rule two: according to the release time, for example, through the first content in the first identification engine is screened, by the word of releasing in the recent period, as the most emerging word in this week or in this month, be integrated into second content, and set up the first identification engine 201 based on described second content.

In addition, can also customize especially the first identification engine 201 obtaining, for example, based on syntax rule, acoustic model based on special training or based on grammer weight.Wherein, described preset rules and customized rules are not limited to the above-mentioned rule of mentioning, and according to actual needs, those of ordinary skills can also use other rules.

From the above, by at least two identification engines of the first identification engine and the second identification engine, at least one voice unit to be identified that at least comprises the first voice unit to be identified obtaining is identified, solve the low technical matters of discrimination existing in prior art, realized and increased substantially discrimination.

The second in the embodiment of the present application identification engine 202: the identification engine with three content different from described first content in described the first identification engine.

In specific embodiment, universal identification engine of the prior art can be divided into multiple identification engines that comprise different content, for example, the first identification engine comprises first content, and the second identification engine comprises three content different from first content, wherein, in the process splitting, except forming the first identification engine and the second identification engine 202, can also form other identification engines, as the three, four identification engine, just give an example no longer one by one in this application.

From the above, by setting up multiple identification engines that include different content, and utilize multiple identification engines to search for identification to voice messaging to be identified, solve in prior art owing to utilizing universal identification engine to identify search in prior art in the vocabulary that includes millions of words, can there is technical matters of a specified duration consuming time, realize reduction search time, improved the technique effect of the efficiency of identification search.

In the embodiment of the present application, in the time that described the second identification engine is specially the first identification engine 201, step S102 comprises two kinds of concrete implementations:

Mode one:

In specific embodiment, in the time that the second identification engine is specially the first identification engine 201, the first voice unit to be identified is identified search process in, first, extract the characteristic parameter of described voice unit to be identified, then, described characteristic parameter is carried out to dynamic comparison with each speech model of corresponding second content in described the first identification engine 201, obtain a recognition result, judge again whether described recognition result meets default degree of confidence, wherein, degree of confidence is used for the degree of reliability of the recognition result that characterizes acquisition, described default degree of confidence is pre-defined by system, also can arrange voluntarily according to user's needs.In the time that described recognition result meets default degree of confidence, export described the second recognition result as final recognition result.Wherein, identification engine is not limited to above-mentioned a kind of mode to the identifying of voice unit to be identified, and those of ordinary skills can also adopt other mode.

Described judge described the second recognition result whether meet first pre-conditioned after, the described method in the embodiment of the present application also comprises:

Export described the first recognition result.

Continue to use the example of degree of confidence above,: in the time that described the second recognition result does not meet default degree of confidence, utilize the first identification engine to identify described the first voice unit to be identified, obtain the first recognition result and the first recognition result is output as to final recognition result.

From the above, after the first identification engine does not obtain satisfied recognition result, utilize the first identification engine to identify the first voice unit to be identified, realized identifying and cover the technique effect of all words.

Introduce the second implementation of step S102 below, that is, and mode two:

Continue to use the example of degree of confidence above, be: in the time that the second identification engine is specially the first identification engine 201, the first voice unit to be identified is identified search process in, first, extract the characteristic parameter of described voice unit to be identified, then, described characteristic parameter is carried out to dynamic comparison with each speech model and first each speech model of identifying the corresponding first content in engine of the corresponding second content in the first identification engine 201, obtain the 3rd recognition result of corresponding the first identification engine 201 and the 4th recognition result of corresponding the first identification engine, then, judge whether described the 3rd recognition result meets default degree of confidence, in the time that described the 3rd recognition result meets default degree of confidence, export described the 3rd recognition result as final recognition result.In the time that described the 3rd recognition result does not meet default degree of confidence, output the 4th recognition result is as final recognition result.

Wherein, corresponding the first identification set of the first identification engine.When the second identification engine is the first identification engine, corresponding the second identification set, the second identification set belongs to the first identification set.

Wherein, based on described the first identification engine and described the second identification engine, described the first voice unit to be identified is identified, obtain the step of the first recognition result:, can first first identify including at least one voice unit to be identified in described voice messaging to be identified with the second identification engine higher than the first identification engine based on the second identification engine priority.Identifying engine with first when identification while more not meeting voice match condition includes at least one voice unit to be identified in to described voice messaging to be identified and identifies; Can certainly identify including at least one voice unit to be identified in described voice messaging to be identified based on the second identification engine and the first identification engine simultaneously, in the time that the second identification engine meets voice match condition at least one voice unit coupling to be identified, output recognition result (, this speech recognition completes); If in the time that the second identification engine does not meet voice match condition to described at least one voice unit coupling to be identified, due to carry out simultaneously the second identification engine and first identification engine described at least one voice unit to be identified is mated, so the first identification engine mated described at least one voice unit to be identified before in the time that the second identification engine does not meet voice match condition to described at least one voice unit coupling to be identified, from improving the efficiency of speech recognition.

In addition, in the process that adopts the second way to identify, because the first identification engine and the first identification engine are identified the first voice unit to be identified simultaneously, even if identified at the first identification engine, there is not satisfied recognition result, the first identification engine has also recognized certain phase, has realized the technique effect of saving recognition time.

Further, in the time that described the second identification engine is specially described the second identification engine 202, S102 specific implementation process is:

Continue to continue to use the example of degree of confidence above, be, in the time that the second identification engine is specially the second identification engine 202, described the second identification engine 202 includes three content different from described first content in the first identification engine, first, the characteristic parameter of voice unit to be identified is carried out to dynamic comparison with each speech model and first each speech model of identifying corresponding first content in engine of corresponding the 3rd content in described the second identification engine 202, obtain the 5th recognition result of corresponding the first identification engine and the 6th recognition result of corresponding the second identification engine 202, then, judge the whether satisfied reliability that pre-sets of described the 5th recognition result and described the 6th recognition result, meet and pre-set reliability and described six recognition results and do not meet while pre-seting reliability at described the 5th recognition result, only export described the 5th recognition result as final recognition result, do not meet and pre-set reliability and described the 6th recognition result and meet while pre-seting reliability at described the 5th recognition result, only export described the 6th recognition result and tie as final identification, if described the 5th recognition result and described the 6th recognition result all meet while pre-seting reliability, export the 5th recognition result and the 6th recognition result as final recognition result simultaneously.

Another embodiment of the present invention provides a kind of electronic equipment, at least comprises the speech recognition system of the first identification engine and the second identification engine in described electronic equipment, and as shown in Figure 3, described electronic equipment comprises:

First obtains unit 301, for obtaining a voice messaging to be identified;

Second obtains unit 302, for based on described voice messaging to be identified, obtains at least one voice unit to be identified that at least comprises the first voice unit to be identified.

In specific embodiment, user sends phonetic order to electronic equipment, includes described voice messaging to be identified in this phonetic order, again, includes at least one voice unit to be identified in described voice messaging to be identified.For example, when user sends phonetic order to electronic equipment, while " searching film Journey to the West ",, during voice messaging to be identified " is searched film Journey to the West ", include the first voice unit to be identified " Journey to the West ", the second voice unit to be identified " is searched ", three voice units to be identified of the 3rd voice unit to be identified " film ".In addition, in the embodiment of the present application, in order to improve recognition effect, after electronic equipment receives voice messaging, also can convert thereof into corresponding voice signal, and this voice signal is carried out to front-end processing, the impact bringing to eliminate noise and different speaker, makes signal after treatment more can reflect the essential characteristic of voice.In the embodiment of the present application, the most frequently used front-end processing technology has end-point detection and voice to strengthen.Certainly, those of ordinary skills can also use other front-end processing technology.

In the embodiment of the present application, described electronic equipment also comprises:

Recognition unit 303, for based on described the first identification engine and described the second identification engine, identifies described the first voice unit to be identified, obtains the first recognition result.

Wherein, described the second acquisition unit 302 and described first obtains unit 301 and is connected, and described recognition unit 303 and described second obtains unit 302 and is connected.

Described the second identification engine can comprise two kinds of identification engines:

In the embodiment of the present application, the second identification engine 202: the identification engine with three content different from described first content in described the first identification engine.

In specific embodiment, universal identification engine of the prior art can be divided into multiple identification engines that comprise different content, for example, the first identification engine comprises first content, and the second identification engine comprises three content different from first content, wherein, in the process of sealing off, except forming the first identification engine and the second identification engine 202, can also form other identification engines, as the three, four identification engine, just give an example no longer one by one in this application.

Further, in the time that described the second identification engine is specially described the first identification engine 201, described recognition unit comprises two kinds of implementations.

In mode one, described recognition unit 303 specifically comprises:

The first output subelement, for meeting described first at described the second recognition result when pre-conditioned, exports described the second recognition result as described the first recognition result;

In addition, in the time that described the second recognition result does not meet default degree of confidence, utilize the first identification engine to identify described the first voice unit to be identified, obtain the first recognition result and the first recognition result is output as to final recognition result.

In mode two, described recognition unit 303 specifically comprises:

Further, in the time that described the second identification engine is specially described the second identification engine 202, described recognition unit specifically comprises:

The 4th output subelement, does not meet the described the 3rd when pre-conditioned for meet described the 3rd pre-conditioned and described the 6th recognition result at described the 5th recognition result, exports described the 5th recognition result as described the first recognition result;

The 5th output subelement, meets the described the 3rd when pre-conditioned for do not meet described the 3rd pre-conditioned and described the 6th recognition result at described the 5th recognition result, exports described the 6th recognition result as described the first recognition result;

The electronic equipment of introducing due to the present embodiment is for implementing the electronic equipment that in the embodiment of the present application, information processing method adopts, so based on information processing method in the embodiment of the present application, the embodiment that those skilled in the art can understand electronic equipment in the embodiment of the present application with and various version, so introduce no longer in detail for this electronic equipment at this.As long as those skilled in the art implement the electronic equipment that in the embodiment of the present application, information processing method adopts, all belong to the scope of the application institute wish protection.

Obviously, those skilled in the art can carry out various changes and modification and not depart from the spirit and scope of the present invention the present invention.Like this, if within of the present invention these are revised and modification belongs to the scope of the claims in the present invention and equivalent technologies thereof, the present invention is also intended to comprise these changes and modification interior.

Claims

1. an audio recognition method, is characterized in that, is applied in the electronic equipment of the speech recognition system that at least comprises the first identification engine and the second identification engine, and described method comprises:

Obtain a voice messaging to be identified;

2. the method for claim 1, is characterized in that, described the second identification engine is specially:

3. method as claimed in claim 2, it is characterized in that, in the time that described the second identification engine is specially described the first identification engine, described based on described the first identification engine or described the second identification engine, described the first voice unit to be identified is identified, obtain the first recognition result, specifically comprise:

4. method as claimed in claim 3, is characterized in that, described judge described the second recognition result whether meet first pre-conditioned after, described method also comprises:

Export described the first recognition result.

5. method as claimed in claim 2, it is characterized in that, in the time that described the second identification engine is specially described the first identification engine, described based on described the first identification engine or described the second identification engine, described the first voice unit to be identified is identified, obtain the first recognition result, specifically comprise:

6. method as claimed in claim 2, it is characterized in that, in the time that described the second identification engine is specially described the second identification engine, described based on described the first identification engine or described the second identification engine, described the first voice unit to be identified is identified, obtain the first recognition result, specifically comprise:

7. method as claimed in claim 6, is characterized in that, described judge described the 5th recognition result and described the 6th recognition result whether meet the 3rd pre-conditioned after, described method also comprises:

8. method as claimed in claim 6, is characterized in that, described judge described the 5th recognition result and described the 6th recognition result whether meet the 3rd pre-conditioned after, described method also comprises:

9. an electronic equipment, is characterized in that, at least comprises the speech recognition system of the first identification engine and the second identification engine in described electronic equipment, and described electronic equipment comprises:

First obtains unit, for obtaining a voice messaging to be identified;

10. electronic equipment as claimed in claim 9, is characterized in that, described the second identification engine is specially:

11. electronic equipments as claimed in claim 10, is characterized in that, in the time that described the second identification engine is specially described the first identification engine, described recognition unit specifically comprises:

12. electronic equipments as claimed in claim 11, is characterized in that, described recognition unit also comprises:

13. electronic equipments as claimed in claim 10, is characterized in that, in the time that described the second identification engine is specially described the first identification engine, described recognition unit specifically comprises:

14. electronic equipments as claimed in claim 10, is characterized in that, in the time that described the second identification engine is specially described the second identification engine, described recognition unit specifically comprises:

15. electronic equipments as claimed in claim 14, is characterized in that, described recognition unit also comprises:

16. electronic equipments as claimed in claim 14, is characterized in that, described recognition unit also comprises: