CN103247291B

CN103247291B - A kind of update method of speech recognition apparatus, Apparatus and system

Info

Publication number: CN103247291B
Application number: CN201310163915.XA
Authority: CN
Inventors: 徐丹华; 蒋洪睿; 郑伟军; 王细勇; 王青
Original assignee: Huawei Device Co Ltd
Current assignee: Huawei Device Co Ltd
Priority date: 2013-05-07
Filing date: 2013-05-07
Publication date: 2016-01-13
Anticipated expiration: 2033-05-07
Also published as: WO2014180218A1; CN103247291A

Abstract

The present invention discloses a kind of update method, Apparatus and system of speech recognition apparatus, relates to speech recognition technology, invents for improving phonetic recognization rate.Method comprises: receive voice input signal; Utilize local voice identification equipment to carry out speech recognition to voice input signal, obtain local voice recognition result; Optimal identification result is obtained as final voice identification result from local voice recognition result and high in the clouds voice identification result, wherein high in the clouds voice identification result is local voice equipment carries out speech recognition to voice input signal while, utilizes high in the clouds speech recognition apparatus to carry out speech recognition acquisition to voice input signal; Whether the reliability in conjunction with the field feedback obtained and final voice identification result determination local voice recognition result meets the demands; When determining that the reliability of local voice recognition result does not meet the demands, high in the clouds speech recognition apparatus is utilized to upgrade local voice identification equipment.

Description

A kind of update method of speech recognition apparatus, Apparatus and system

Technical field

The present invention relates to speech recognition technology, particularly relate to a kind of update method of speech recognition apparatus, Apparatus and system.

Background technology

Speech recognition technology makes machine, by identification and understanding process, voice signal be changed into the technology of corresponding text or order.Wherein, speech recognition engine comprises acoustic model and language model usually, wherein, acoustic model has been responsible for the conversion of voice to phoneme (sound as phonetic in the phonetic symbol in English, Chinese is female), language model has been responsible for the conversion of phoneme to text, and acoustic model and language model have coordinated the identifying of speech-to-text.

Three kinds of speech recognition technologies are had: the first is the speech recognition technology identifying engine based on high in the clouds in prior art, the second is the speech recognition technology based on local speech recognition engine, and the third is the speech recognition technology simultaneously identifying engine based on local speech recognition engine and high in the clouds.But in the prior art, because the accuracy of local speech recognition engine is lower, cause the discrimination of local speech recognition engine low, have impact on phonetic recognization rate.

Summary of the invention

In view of this, the invention provides a kind of update method of speech recognition apparatus, Apparatus and system, to improve phonetic recognization rate.

For achieving the above object, the embodiment of the present invention adopts following technical scheme:

First aspect, the invention provides a kind of update method of speech recognition apparatus, comprising:

Receive voice input signal;

Utilize local voice identification equipment to carry out speech recognition to described voice input signal, obtain local voice recognition result;

Optimal identification result is obtained as final voice identification result from described local voice recognition result and high in the clouds voice identification result, wherein said high in the clouds voice identification result is described local voice equipment carries out speech recognition to described voice input signal while, utilizes described high in the clouds speech recognition apparatus to carry out speech recognition acquisition to described voice input signal;

Determine whether the reliability of described local voice recognition result meets the demands in conjunction with the field feedback obtained and described final voice identification result;

When determining that the reliability of described local voice recognition result does not meet the demands, described high in the clouds speech recognition apparatus is utilized to upgrade described local voice identification equipment.

In the first possibility implementation of first aspect, described utilization this locality identifies that engine carries out speech recognition to described voice input signal, obtains local voice recognition result and comprises:

Utilize this geoacoustic model in local identification equipment and local language models to carry out speech recognition to described voice input signal respectively, obtain local voice recognition result.

May in implementation at the second of first aspect, described from described local voice recognition result with obtain the voice identification result of high in the clouds from high in the clouds speech recognition apparatus and obtain optimal identification result and comprise as final voice identification result:

The reliability of described local voice recognition result is determined in conjunction with speech recognition parameter;

When the reliability of described local voice recognition result meets pre-conditioned, using described local voice recognition result as described final voice identification result;

When the reliability of described local voice recognition result do not meet described pre-conditioned time, using described high in the clouds voice identification result as described final voice identification result.

In the third possibility implementation of first aspect, the field feedback that described combination obtains and described final voice identification result are determined whether the reliability of described local voice recognition result meets the demands and are comprised:

When the field feedback obtained represents that described final voice identification result is correct and utilizes described high in the clouds voice identification result as described final voice identification result, determine that the reliability of described local voice recognition result does not meet the demands;

When the field feedback obtained represents described final voice identification result mistake and gets correct voice identification result by described user, determine that the reliability of described local voice recognition result does not meet the demands.

In the 4th kind of possibility implementation of first aspect, described when determining that the reliability of described local voice recognition result does not meet the demands, utilize described high in the clouds speech recognition apparatus to carry out renewal to described local voice identification equipment and comprise:

When determining that the reliability of described local voice recognition result does not meet the demands, if determine the order word identification error in described local voice recognition result, then the acoustic model incremental update bag obtained from described high in the clouds speech recognition apparatus is utilized to upgrade this geoacoustic model described local voice identification equipment;

When determining that the reliability of described local voice recognition result does not meet the demands, if determine the correct but text identification mistake of pinyin string identification in described local voice recognition result, then the language model incremental update bag obtained from described high in the clouds speech recognition apparatus is utilized to upgrade the local language models described local voice identification equipment;

When determining that the reliability of described local voice recognition result does not meet the demands, if determine the pinyin string identification error in described local voice recognition result, then the acoustic model incremental update bag that obtains from described high in the clouds speech recognition apparatus and language model incremental update bag is utilized to upgrade this geoacoustic model described local voice identification equipment and local language models respectively.

In conjunction with the 4th kind of possibility implementation of first aspect, in the 5th kind of possibility implementation of first aspect, described method also comprises: correct voice identification result is sent to described high in the clouds speech recognition apparatus.

May implementation or may implementation in conjunction with the 5th kind of first aspect in conjunction with the first or the second of first aspect or first aspect or the third, may in implementation at the 6th kind of first aspect, described method also comprises:

Add up the frequency of utilization of word in described local voice recognition result, and described frequency of utilization is sent to described high in the clouds identification equipment.

Second aspect, the invention provides a kind of update method of speech recognition apparatus, comprising:

Receive voice input signal;

Utilize high in the clouds speech recognition apparatus to carry out speech recognition to described voice input signal, obtain high in the clouds voice identification result;

When the reliability in conjunction with field feedback and final voice identification result determination local voice recognition result does not meet the demands, described high in the clouds speech recognition apparatus is utilized to upgrade local voice identification equipment; Wherein, described local voice recognition result be described high in the clouds speech ciphering equipment to described voice input signal carry out speech recognition obtain described high in the clouds voice identification result while, utilize local voice identification equipment to carry out speech recognition acquisition to described voice input signal; Described final voice identification result obtains from described high in the clouds voice identification result and described local voice recognition result.

In the first possibility implementation of second aspect, the described high in the clouds identification engine that utilizes carries out speech recognition to described voice input signal, obtains high in the clouds voice identification result and comprises:

Utilize the high in the clouds acoustic model in the identification equipment of high in the clouds and high in the clouds language model to carry out speech recognition to described voice input signal respectively, obtain high in the clouds voice identification result.

In the second possibility implementation of second aspect, when the reliability in conjunction with field feedback and final voice identification result determination local voice recognition result does not meet the demands, utilize described high in the clouds speech recognition apparatus to carry out renewal to local voice identification equipment and comprise:

When the reliability in conjunction with field feedback and final voice identification result determination local voice recognition result does not meet the demands, obtain the types of models needing in local voice identification equipment to upgrade;

When the described types of models needing to upgrade is this geoacoustic model, obtain correct voice identification result, generate acoustic model incremental update bag, and utilize described acoustic model incremental update bag to upgrade described geoacoustic model;

When described need upgrade types of models be local language models and this geoacoustic model time, obtain correct voice identification result, production language model incremental upgrades bag and acoustics model incremental upgrades bag, and utilizes described language model incremental update bag and acoustics model incremental renewal bag to upgrade described local language models and this geoacoustic model;

When the described types of models needing to upgrade is local language models, obtain correct voice identification result, production language model incremental upgrades bag, and utilizes described language model incremental update bag to upgrade described local language models.

The first or the second in conjunction with second aspect may implementations, may in implementation at the third of second aspect, and described method also comprises:

Vocabulary in high in the clouds identification equipment described in regular update; And/or

Receive the frequency of utilization of word in the local voice recognition result of described local voice identification equipment transmission, utilize described frequency of utilization to upgrade the vocabulary in the identification equipment of described high in the clouds.

The third aspect, the invention provides a kind of updating device of speech recognition apparatus, comprising:

Receiving element, for receiving voice input signal;

Recognition unit, for utilizing local voice identification equipment to carry out speech recognition to the voice input signal that described receiving element receives, obtains local voice recognition result;

Selection unit, for the local voice recognition result that obtains from described recognition unit with obtain optimal identification result as final voice identification result from the high in the clouds voice identification result that high in the clouds speech recognition apparatus obtains, wherein said high in the clouds voice identification result is described local voice equipment carries out speech recognition to described voice input signal while, utilizes described high in the clouds speech recognition apparatus to carry out speech recognition acquisition to described voice input signal;

Processing unit, for the field feedback in conjunction with acquisition and determine whether the reliability of described local voice recognition result meets the demands from the final voice identification result that described selection unit obtains;

Updating block, during for determining that the reliability of described local voice recognition result does not meet the demands when described processing unit, utilizes described high in the clouds speech recognition apparatus to upgrade described local voice identification equipment.

In the first possibility implementation of the third aspect, described recognition unit carries out speech recognition to described voice input signal respectively specifically for utilizing this geoacoustic model in local identification equipment and local language models, obtains local voice recognition result.

In the second possibility implementation of the third aspect, described selection unit comprises:

Judge module, for determining the reliability of described local voice recognition result in conjunction with speech recognition parameter;

Select module, for when the reliability of described judge module determination local voice recognition result meets pre-conditioned, using described local voice recognition result as described final voice identification result; When described judge module determine the reliability of described local voice recognition result do not meet described pre-conditioned time, using described high in the clouds voice identification result as described final voice identification result.

May in implementation at the third of the third aspect, described processing unit specifically for:

May in implementation at the 4th kind of the third aspect, described updating block specifically for:

In the 5th kind of possibility implementation of the third aspect, described device also comprises:

Transmitting element, the correct voice identification result for being determined by described processing unit sends to described high in the clouds speech recognition apparatus.

In conjunction with a third aspect of the present invention or the third aspect the first-five may any one in implementation may implementation, may in implementation at the 6th kind of the third aspect, described device also comprises:

Statistic unit, for adding up the frequency of utilization of word the local voice recognition result that obtains from described recognition unit, and sends to described high in the clouds identification equipment by described frequency of utilization.

Fourth aspect, the invention provides a kind of updating device of speech recognition apparatus, comprising:

Receiving element, for receiving voice input signal;

Recognition unit, for utilizing high in the clouds speech recognition apparatus to carry out speech recognition to the voice input signal that described receiving element receives, obtains high in the clouds voice identification result;

Updating block, for when the reliability in conjunction with field feedback and final voice identification result determination local voice recognition result does not meet the demands, utilizes described high in the clouds speech recognition apparatus to upgrade local voice identification equipment; Wherein, described local voice recognition result be described high in the clouds speech ciphering equipment to described voice input signal carry out speech recognition obtain described high in the clouds voice identification result while, utilize local voice identification equipment to carry out speech recognition acquisition to described voice input signal; Described final voice identification result obtains from described high in the clouds voice identification result and described local voice recognition result.

May in implementation in the first of fourth aspect, described recognition unit specifically for: utilize the high in the clouds acoustic model in the identification equipment of high in the clouds and high in the clouds language model to carry out speech recognition to described voice input signal respectively, obtain high in the clouds voice identification result.

In the second possibility implementation of fourth aspect, described updating block comprises:

Data obtaining module, for when the reliability in conjunction with field feedback and final voice identification result determination local voice recognition result does not meet the demands, obtains the types of models needing in local voice identification equipment to upgrade;

Update module, for when the types of models that the needs that described data obtaining module obtains upgrade is this geoacoustic model, obtain correct voice identification result, generate acoustic model incremental update bag, and utilize described acoustic model incremental update bag to upgrade described geoacoustic model; When described need upgrade types of models be local language models and this geoacoustic model time, obtain correct voice identification result, production language model incremental upgrades bag and acoustics model incremental upgrades bag, and utilizes described language model incremental update bag and acoustics model incremental renewal bag to upgrade described local language models and this geoacoustic model; When the types of models that the needs that described data obtaining module obtains upgrade is local language models, obtain correct voice identification result, production language model incremental upgrades bag, and utilizes described language model incremental update bag to upgrade described local language models.

In the third possibility implementation of fourth aspect, described device also comprises:

Vocabulary updating block, for the vocabulary in high in the clouds identification equipment described in regular update; And/or for receiving the frequency of utilization of word in local voice recognition result that described local voice identification equipment sends, utilize described frequency of utilization to upgrade the vocabulary in the identification equipment of described high in the clouds.

5th aspect, the invention provides a kind of speech recognition apparatus updating device system, comprising:

Speech recognition apparatus updating device as described in the third aspect and the speech recognition apparatus updating device as described in fourth aspect.

The update method of the speech recognition apparatus that the embodiment of the present invention provides, when in conjunction with local voice recognition result and high in the clouds voice identification result determination local voice recognition result unreliable time, high in the clouds speech recognition apparatus can be utilized to upgrade local voice identification equipment, thus the restriction that in prior art, local voice identification equipment cannot be expanded due to the restriction of model when obtaining high in the clouds voice identification result is avoided, therefore, utilize the methods, devices and systems of the embodiment of the present invention, improve phonetic recognization rate.

Accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the process flow diagram of the update method of the speech recognition apparatus of the embodiment of the present invention one;

Fig. 2 is the process flow diagram of the update method of the speech recognition apparatus of the embodiment of the present invention three;

Fig. 3 is the updating device schematic diagram of the speech recognition apparatus of the embodiment of the present invention four;

Fig. 4 is the updating device schematic diagram of the speech recognition apparatus of the embodiment of the present invention four;

Fig. 5 is the updating device schematic diagram of the speech recognition apparatus of the embodiment of the present invention five;

Fig. 6 is the structural drawing of the updating device of the speech recognition apparatus of the embodiment of the present invention five.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

As shown in Figure 1, the update method of the speech recognition apparatus of the embodiment of the present invention one comprises:

Step 11, local voice identification equipment receive voice input signal.

In this step, first start the speech recognition class software of local voice identification equipment, when user loquiturs, namely local voice identification equipment receives voice input signal.

Step 12, utilize local voice identification equipment to carry out speech recognition to described voice input signal, obtain local voice recognition result.

In this step, mainly utilize this geoacoustic model in local identification equipment and local language models to carry out speech recognition to described voice input signal respectively, obtain local voice recognition result.Wherein, this geoacoustic model is identical with of the prior art with the concrete recognition methods of local language models, does not repeat them here.

Step 13, from described local voice recognition result and high in the clouds voice identification result, obtain optimal identification result as final voice identification result, wherein said high in the clouds voice identification result is described local voice equipment carries out speech recognition to described voice input signal while, utilizes described high in the clouds speech recognition apparatus to carry out speech recognition acquisition to described voice input signal.

In this step, first the speech recognition parameter such as connected applications context environmental, degree of confidence determines the reliability of described local voice recognition result, namely determines whether the accuracy of local voice recognition result meets the demands.Wherein, application context environment refers to some information of scene that user uses, and voice as front in user be " making a phone call ", then when carrying out speech recognition after, the scene of user's use should be exactly the identification of name; Voice as front in user are " sending short messages to Zhang San ", then the scene that when carrying out speech recognition after, user uses should be exactly the dictation identification of short message content.Degree of confidence refers to the trusted degree of recognition result, and value is generally between 0-100, and the degree of confidence as recognition result " Zhang San " is 80, then show that its reliability is higher, then this recognition result can accept; And for example the degree of confidence of " Li Si " is 20, then show that its reliability is poor, then this recognition result can be refused.

When choosing final recognition result, if the reliability of local voice recognition result meets pre-conditioned, then using described local voice recognition result as described final voice identification result.If the reliability of described local voice recognition result does not meet described pre-conditioned, owing to there is certain network delay, then can continue to wait for high in the clouds voice identification result, then using described high in the clouds voice identification result as described final voice identification result.Such as, described pre-conditioned can be 90 for degree of confidence.If the degree of confidence of local voice recognition result is more than 90, then using local voice recognition result as final voice identification result, otherwise using high in the clouds voice identification result as final voice identification result.

Step 14, the field feedback combining acquisition and described final voice identification result determine whether the reliability of described local voice recognition result meets the demands.

In this step, described final voice identification result needs to be supplied to user by user interface (UserInterface, UI), for user operation, and then obtains field feedback according to the operation of user.And be decided by the field feedback of acquisition to the judgement of final voice identification result.Such as, as in the identification of local command word, user eliminates corresponding operation, then the field feedback obtained represents final recognition result mistake; In note dictation, user have modified short message text, then the field feedback obtained represents final recognition result mistake, but amended text is correct recognition result.This shows, while acquisition field feedback, also can determine correct voice identification result.In addition, after obtaining correct voice identification result, also correct voice identification result can be sent to described high in the clouds speech recognition apparatus.

Therefore, in this step, when the field feedback obtained represents that described final voice identification result is correct and utilizes described high in the clouds voice identification result as described final voice identification result, determine that the reliability of described local voice recognition result does not meet the demands.When the field feedback obtained represents described final voice identification result mistake and gets correct voice identification result by described user, determine that the reliability of described local voice recognition result does not meet the demands.

And if final voice identification result is correct and local voice recognition result is correct, then user is without the need to carrying out any operation, also just without the need to obtaining field feedback.Or, final voice identification result mistake, but user does not feed back correct recognition result, so without the need to obtaining field feedback yet.

Step 15, when determining that the reliability of described local voice recognition result does not meet the demands, described high in the clouds speech recognition apparatus is utilized to upgrade described local voice identification equipment.

In this step, when determining that the reliability of described local voice recognition result does not meet the demands, if determine the order word identification error in described local voice recognition result, then the acoustic model incremental update bag obtained from described high in the clouds speech recognition apparatus is utilized to upgrade this geoacoustic model described local voice identification equipment.When determining that the reliability of described local voice recognition result does not meet the demands, if determine the correct but text identification mistake of pinyin string identification in described local voice recognition result, then the language model incremental update bag obtained from described high in the clouds speech recognition apparatus is utilized to upgrade the local language models described local voice identification equipment.When determining that the reliability of described local voice recognition result does not meet the demands, if determine the pinyin string identification error in described local voice recognition result, then the acoustic model incremental update bag that obtains from described high in the clouds speech recognition apparatus and language model incremental update bag is utilized to upgrade this geoacoustic model described local voice identification equipment and local language models respectively.

Concrete, the identification of local command word and local dictation identify it is all be arranged in local identification equipment, are two kinds of functions of local identification equipment.Wherein, the identification of local command word is mainly used in the identification etc. to fixing sentence pattern, name, order word, as the fix command word such as " phoning Zhang San ", " opening music player "; The identification that local dictation identification is mainly used in continuous speech, as the dictation of short message content, as " today, supper was eaten? " Deng.

In the identification of local command word, if the order word that user expects is " phoning Zhang San ", but final voice identification result is " phoning Li Si ", then user can cancel this final voice identification result, or choose " Li Si " from alternate list (if any), to identify final voice identification result identification error.So by this kind of operation of user, can know that final voice identification result is wrong.Again such as, in note dictation, if user has done amendment to final voice identification result, then shown that the dictation content in final voice identification result is wrong, and the amended text of user can have been obtained as final voice identification result.In such cases, can determine to need to upgrade this geoacoustic model.Therefore, the acoustic model incremental update bag obtained from described high in the clouds speech recognition apparatus need be utilized to upgrade this geoacoustic model described local voice identification equipment.

In this locality dictation identifies, such as, be " China Team's performance is very to vertical " in final voice identification result, the amended final voice identification result of user is " China Team's performance is very to power ", " stand " identical with " power " pinyin string, now, determine to upgrade the local language models in described local voice identification equipment.Again such as, in final voice identification result be " Spring Festival Party knows better ", and the amended final voice identification result net result of user is " Spring Festival Party is very excellent ", now, determine all to upgrade the local language models in described local voice identification equipment and this geoacoustic model.

As seen from the above, the update method of the speech recognition apparatus of the embodiment of the present invention one, when in conjunction with local voice recognition result and high in the clouds voice identification result determination local voice recognition result unreliable time, high in the clouds speech recognition apparatus can be utilized to upgrade local voice identification equipment, thus the restriction that in prior art, local voice identification equipment cannot be expanded due to the restriction of model when obtaining high in the clouds voice identification result is avoided, therefore, utilize the method for the embodiment of the present invention, improve phonetic recognization rate.

On the basis of the embodiment of the present invention one, the embodiment of the present invention and the update method of speech recognition apparatus also comprise: the frequency of utilization of adding up word in described local voice recognition result, and described frequency of utilization is sent to described high in the clouds identification equipment.Such as, according to the time interval of setting, as monthly, the frequency of utilization of heat frequency word in described local voice recognition result can be added up, and described frequency of utilization sent to described high in the clouds identification equipment, to expand the identification of heat frequency word in local voice identification.

As shown in Figure 2, the update method of the speech recognition apparatus of the embodiment of the present invention three comprises:

Step 21, high in the clouds speech recognition apparatus receive voice input signal.

In this step, similar with step 11, first start the speech recognition class software of high in the clouds speech recognition apparatus, when user loquiturs, namely high in the clouds speech recognition apparatus receives voice input signal.

Step 22, utilize high in the clouds speech recognition apparatus to carry out speech recognition to described voice input signal, obtain high in the clouds voice identification result.

In this step, mainly utilize the high in the clouds acoustic model in the identification equipment of high in the clouds and high in the clouds language model to carry out speech recognition to described voice input signal respectively, obtain high in the clouds voice identification result.Wherein, high in the clouds acoustic model is identical with of the prior art with the concrete recognition methods of high in the clouds language model, does not repeat them here.

Step 23, when the reliability in conjunction with field feedback and final voice identification result determination local voice recognition result does not meet the demands, described high in the clouds speech recognition apparatus is utilized to upgrade local voice identification equipment; Wherein, described local voice recognition result be described high in the clouds speech ciphering equipment to described voice input signal carry out speech recognition obtain described high in the clouds voice identification result while, utilize local voice identification equipment to carry out speech recognition acquisition to described voice input signal; Described final voice identification result obtains from described high in the clouds voice identification result and described local voice recognition result.

In this step, when the reliability in conjunction with field feedback and final voice identification result determination local voice recognition result does not meet the demands, the types of models needing in local voice identification equipment to upgrade is obtained.When the described types of models needing to upgrade is this geoacoustic model, obtain correct voice identification result, generate acoustic model incremental update bag, and utilize described acoustic model incremental update bag to upgrade described geoacoustic model; When described need upgrade types of models be local language models and this geoacoustic model time, obtain correct voice identification result, production language model incremental upgrades bag and acoustics model incremental upgrades bag, and utilizes described language model incremental update bag and acoustics model incremental renewal bag to upgrade described local language models and this geoacoustic model; When the described types of models needing to upgrade is local language models, obtain correct voice identification result, production language model incremental upgrades bag, and utilizes described language model incremental update bag to upgrade described local language models.

As seen from the above, the update method of the speech recognition apparatus of the embodiment of the present invention two, when in conjunction with local voice recognition result and high in the clouds voice identification result determination local voice recognition result unreliable time, high in the clouds speech recognition apparatus can be utilized to upgrade local voice identification equipment, thus the restriction that in prior art, local voice identification equipment cannot be expanded due to the restriction of model when obtaining high in the clouds voice identification result is avoided, therefore, utilize the method for the embodiment of the present invention, improve phonetic recognization rate.

On the basis of the embodiment of the present invention three, the update method of the speech recognition apparatus of the embodiment of the present invention four also can comprise: regularly (as monthly etc.) upgrade as described in vocabulary in the identification equipment of high in the clouds.And described high in the clouds identification equipment also can receive the frequency of utilization of word in the local voice recognition result of described local voice identification equipment transmission, utilizes described frequency of utilization to upgrade the vocabulary in the identification equipment of described high in the clouds, to improve discrimination further.

As shown in Figure 3, the updating device of the speech recognition apparatus of the embodiment of the present invention four, comprising:

Receiving element 31, for receiving voice input signal; Recognition unit 32, for utilizing local voice identification equipment to carry out speech recognition to the voice input signal that described receiving element receives, obtains local voice recognition result; Selection unit 33, for the local voice recognition result that obtains from described recognition unit with obtain optimal identification result as final voice identification result from the high in the clouds voice identification result that high in the clouds speech recognition apparatus obtains, wherein said high in the clouds voice identification result is described local voice equipment carries out speech recognition to described voice input signal while, utilizes described high in the clouds speech recognition apparatus to carry out speech recognition acquisition to described voice input signal; Processing unit 34, for the field feedback in conjunction with acquisition and determine whether the reliability of described local voice recognition result meets the demands from the final voice identification result that described selection unit obtains; Updating block 35, during for determining that the reliability of described local voice recognition result does not meet the demands when described processing unit, utilizes described high in the clouds speech recognition apparatus to upgrade described local voice identification equipment.

Wherein, described recognition unit 32 carries out speech recognition to described voice input signal respectively specifically for utilizing this geoacoustic model in local identification equipment and local language models, obtains local voice recognition result.

Described selection unit 33 can comprise: judge module, for determining the reliability of described local voice recognition result in conjunction with speech recognition parameter; Select module, for when the reliability of described judge module determination local voice recognition result meets pre-conditioned, using described local voice recognition result as described final voice identification result; When described judge module determine the reliability of described local voice recognition result do not meet described pre-conditioned time, using described high in the clouds voice identification result as described final voice identification result.

Described processing unit 34 specifically for: when the field feedback obtained represents that described final voice identification result is correct and utilizes described high in the clouds voice identification result as described final voice identification result, determine that the reliability of described local voice recognition result does not meet the demands; When the field feedback obtained represents described final voice identification result mistake and gets correct voice identification result by described user, determine that the reliability of described local voice recognition result does not meet the demands.

Described updating block 35 specifically for: when determining that the reliability of described local voice recognition result does not meet the demands, if determine the order word identification error in described local voice recognition result, then the acoustic model incremental update bag obtained from described high in the clouds speech recognition apparatus is utilized to upgrade this geoacoustic model described local voice identification equipment; When determining that the reliability of described local voice recognition result does not meet the demands, if determine the correct but text identification mistake of pinyin string identification in described local voice recognition result, then the language model incremental update bag obtained from described high in the clouds speech recognition apparatus is utilized to upgrade the local language models described local voice identification equipment; When determining that the reliability of described local voice recognition result does not meet the demands, if determine the pinyin string identification error in described local voice recognition result, then the acoustic model incremental update bag that obtains from described high in the clouds speech recognition apparatus and language model incremental update bag is utilized to upgrade this geoacoustic model described local voice identification equipment and local language models respectively.

As seen from the above, the updating device of the speech recognition apparatus of the embodiment of the present invention four, when in conjunction with local voice recognition result and high in the clouds voice identification result determination local voice recognition result unreliable time, high in the clouds speech recognition apparatus can be utilized to upgrade local voice identification equipment, thus the restriction that in prior art, local voice identification equipment cannot be expanded due to the restriction of model when obtaining high in the clouds voice identification result is avoided, therefore, utilize the device of the embodiment of the present invention, improve phonetic recognization rate.

As shown in Figure 4, described device also can comprise: transmitting element 41, and the correct voice identification result for being determined by described processing unit sends to described high in the clouds speech recognition apparatus; Statistic unit 42, for adding up the frequency of utilization of word the local voice recognition result that obtains from described recognition unit, and sends to described high in the clouds identification equipment by described frequency of utilization.

As shown in Figure 5, the updating device of the speech recognition apparatus of the embodiment of the present invention five, comprising:

Receiving element 51, for receiving voice input signal; Recognition unit 52, for utilizing high in the clouds speech recognition apparatus to carry out speech recognition to the voice input signal that described receiving element receives, obtains high in the clouds voice identification result; Updating block 53, for when the reliability in conjunction with field feedback and final voice identification result determination local voice recognition result does not meet the demands, utilizes described high in the clouds speech recognition apparatus to upgrade local voice identification equipment; Wherein, described local voice recognition result be described high in the clouds speech ciphering equipment to described voice input signal carry out speech recognition obtain described high in the clouds voice identification result while, utilize local voice identification equipment to carry out speech recognition acquisition to described voice input signal; Described final voice identification result obtains from described high in the clouds voice identification result and described local voice recognition result.

Wherein, described recognition unit 52 specifically for: utilize the high in the clouds acoustic model in the identification equipment of high in the clouds and high in the clouds language model to carry out speech recognition to described voice input signal respectively, obtain high in the clouds voice identification result.

Described updating block 53 can comprise: data obtaining module, for when the reliability in conjunction with field feedback and final voice identification result determination local voice recognition result does not meet the demands, obtain the types of models needing in local voice identification equipment to upgrade; Update module, for when the types of models that the needs that described data obtaining module obtains upgrade is this geoacoustic model, obtain correct voice identification result, generate acoustic model incremental update bag, and utilize described acoustic model incremental update bag to upgrade described geoacoustic model; When described need upgrade types of models be local language models and this geoacoustic model time, obtain correct voice identification result, production language model incremental upgrades bag and acoustics model incremental upgrades bag, and utilizes described language model incremental update bag and acoustics model incremental renewal bag to upgrade described local language models and this geoacoustic model; When the types of models that the needs that described data obtaining module obtains upgrade is local language models, obtain correct voice identification result, production language model incremental upgrades bag, and utilizes described language model incremental update bag to upgrade described local language models.

As seen from the above, the updating device of the speech recognition apparatus of the embodiment of the present invention five, when in conjunction with local voice recognition result and high in the clouds voice identification result determination local voice recognition result unreliable time, high in the clouds speech recognition apparatus can be utilized to upgrade local voice identification equipment, thus the restriction that in prior art, local voice identification equipment cannot be expanded due to the restriction of model when obtaining high in the clouds voice identification result is avoided, therefore, utilize the device of the embodiment of the present invention, improve phonetic recognization rate.

As shown in Figure 6, the updating device of the speech recognition apparatus of the embodiment of the present invention five also can comprise:

Vocabulary updating block 54, for the vocabulary in high in the clouds identification equipment described in regular update; And/or for receiving the frequency of utilization of word in local voice recognition result that described local voice identification equipment sends, utilize described frequency of utilization to upgrade the vocabulary in the identification equipment of described high in the clouds.

Wherein, the principle of work of the updating device of the speech ciphering equipment of the embodiment of the present invention can refer to the description that preceding method is city.

In addition, the embodiment of the present invention additionally provides a kind of renewal system of speech recognition apparatus, comprises the updating device of embodiment four and the speech ciphering equipment described in embodiment five.

One of ordinary skill in the art will appreciate that all or part of flow process realized in above-described embodiment method, that the hardware that can carry out instruction relevant by computer program has come, described program can be stored in a computer read/write memory medium, this program, when performing, can comprise the flow process of the embodiment as above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-OnlyMemory, ROM) or random store-memory body (RandomAccessMemory, RAM) etc.

The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of described claim.

Claims

1. a update method for speech recognition apparatus, is characterized in that, comprising:

Receive voice input signal;

Optimal identification result is obtained as final voice identification result from described local voice recognition result and high in the clouds voice identification result, wherein said high in the clouds voice identification result is described local voice identification equipment carries out speech recognition to described voice input signal while, utilizes high in the clouds speech recognition apparatus to carry out speech recognition acquisition to described voice input signal;

Determine correct voice identification result, and described correct voice identification result is sent to described high in the clouds speech recognition apparatus;

2. method according to claim 1, is characterized in that, the described local voice identification equipment that utilizes carries out speech recognition to described voice input signal, obtains local voice recognition result and comprises:

Utilize this geoacoustic model in described local voice identification equipment and local language models to carry out speech recognition to described voice input signal respectively, obtain described local voice recognition result.

3. method according to claim 1, is characterized in that, describedly from described local voice recognition result and high in the clouds voice identification result, obtains optimal identification result comprise as final voice identification result:

4. method according to claim 1, is characterized in that, the field feedback that described combination obtains and described final voice identification result are determined whether the reliability of described local voice recognition result meets the demands and comprised:

5., according to the arbitrary described method of claim 1-4, it is characterized in that, described method also comprises:

Add up the frequency of utilization of word in described local voice recognition result, and described frequency of utilization is sent to described high in the clouds speech recognition apparatus.

6. a update method for speech recognition apparatus, is characterized in that, comprising:

Receive voice input signal;

When the reliability in conjunction with field feedback and final voice identification result determination local voice recognition result does not meet the demands, obtain the types of models needing in local voice identification equipment to upgrade; Wherein, described local voice recognition result be described high in the clouds speech recognition apparatus to described voice input signal carry out speech recognition obtain described high in the clouds voice identification result while, utilize described local voice identification equipment to carry out speech recognition acquisition to described voice input signal; Described final voice identification result obtains from described high in the clouds voice identification result and described local voice recognition result;

7. method according to claim 6, is characterized in that, the described high in the clouds speech recognition apparatus that utilizes carries out speech recognition to described voice input signal, obtains high in the clouds voice identification result and comprises:

Utilize the high in the clouds acoustic model in the speech recognition apparatus of described high in the clouds and high in the clouds language model to carry out speech recognition to described voice input signal respectively, obtain described high in the clouds voice identification result.

8. the method according to claim 6 or 7, is characterized in that, described method also comprises:

Vocabulary in high in the clouds speech recognition apparatus described in regular update; And/or

Receive the frequency of utilization of word in the local voice recognition result of described local voice identification equipment transmission, utilize described frequency of utilization to upgrade the vocabulary in the speech recognition apparatus of described high in the clouds.

9. a updating device for speech recognition apparatus, is characterized in that, comprising:

Receiving element, for receiving voice input signal;

Described processing unit, also for determining correct voice identification result;

Transmitting element, the described correct voice identification result for being determined by described processing unit sends to described high in the clouds speech recognition apparatus;

Updating block, for when determining that the reliability of described local voice recognition result does not meet the demands, if determine the order word identification error in described local voice recognition result, then the acoustic model incremental update bag obtained from described high in the clouds speech recognition apparatus is utilized to upgrade this geoacoustic model described local voice identification equipment; When determining that the reliability of described local voice recognition result does not meet the demands, if determine the correct but text identification mistake of pinyin string identification in described local voice recognition result, then the language model incremental update bag obtained from described high in the clouds speech recognition apparatus is utilized to upgrade the local language models described local voice identification equipment; When determining that the reliability of described local voice recognition result does not meet the demands, if determine the pinyin string identification error in described local voice recognition result, then the acoustic model incremental update bag that obtains from described high in the clouds speech recognition apparatus and language model incremental update bag is utilized to upgrade this geoacoustic model described local voice identification equipment and local language models respectively.

10. device according to claim 9, is characterized in that,

Described recognition unit carries out speech recognition to described voice input signal respectively specifically for utilizing this geoacoustic model in described local voice identification equipment and local language models, obtains described local voice recognition result.

11. devices according to claim 9, is characterized in that, described selection unit comprises:

12. devices according to claim 9, is characterized in that, described processing unit specifically for:

13. according to the arbitrary described device of claim 9-12, and it is characterized in that, described device also comprises:

Statistic unit, for adding up the frequency of utilization of word the local voice recognition result that obtains from described recognition unit, and sends to described high in the clouds speech recognition apparatus by described frequency of utilization.

The updating device of 14. 1 kinds of speech recognition apparatus, is characterized in that, comprising:

Receiving element, for receiving voice input signal;

Updating block, for when the reliability in conjunction with field feedback and final voice identification result determination local voice recognition result does not meet the demands, utilizes described high in the clouds speech recognition apparatus to upgrade local voice identification equipment; Wherein, described local voice recognition result be described high in the clouds speech recognition apparatus to described voice input signal carry out speech recognition obtain described high in the clouds voice identification result while, utilize described local voice identification equipment to carry out speech recognition acquisition to described voice input signal; Described final voice identification result obtains from described high in the clouds voice identification result and described local voice recognition result;

Described updating block comprises:

Data obtaining module, for when determining that the reliability of described local voice recognition result does not meet the demands in conjunction with described field feedback and described final voice identification result, obtains in described local voice identification equipment the types of models needing to upgrade;

15. devices according to claim 14, is characterized in that,

Described recognition unit specifically for: utilize the high in the clouds acoustic model in the speech recognition apparatus of described high in the clouds and high in the clouds language model to carry out speech recognition to described voice input signal respectively, obtain described high in the clouds voice identification result.

16. devices according to claim 14, is characterized in that, described device also comprises:

Vocabulary updating block, for the vocabulary in high in the clouds speech recognition apparatus described in regular update; And/or for receiving the frequency of utilization of word in local voice recognition result that described local voice identification equipment sends, utilize described frequency of utilization to upgrade the vocabulary in the speech recognition apparatus of described high in the clouds.

17. 1 kinds of speech recognition apparatus updating device systems, is characterized in that, comprising:

The updating device of the speech recognition apparatus as described in as arbitrary in claim 9-13 and as arbitrary in claim 14-16 as described in the updating device of speech recognition apparatus.