CN103700368A

CN103700368A - Speech recognition method, speech recognition device and electronic equipment

Info

Publication number: CN103700368A
Application number: CN201410013478.8A
Authority: CN
Inventors: 王伟宁; 戴海生; 宫玉强
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2014-01-13
Filing date: 2014-01-13
Publication date: 2014-04-02
Anticipated expiration: 2034-01-13
Also published as: CN103700368B

Abstract

The invention provides a speech recognition method, a speech recognition device and electronic equipment. The method comprises the steps: receiving a speech input to obtain an audio signal corresponding to the speech input; recognizing the audio signal to obtain a recognition result by utilizing a first speech recognition device, wherein the recognition result comprises a recognition content and a confidence, and the confidence degree is used for determining the reliability degree of the recognition content; presetting at least two confidence thresholds which are different from each other; selecting one confidence threshold from the at least two confidence thresholds; and on the basis of the confidence in the recognition result and the selected confidence threshold, judging whether the recognition content is accurate. According to the technical scheme disclosed by the embodiment of the invention, the recognition rate and robustness of the speech recognition can be considered under different situations by adopting different confidence thresholds, and thus the user experience is improved.

Description

Method, speech recognition equipment and electronic equipment for speech recognition

Technical field

The present invention relates to areas of information technology, more specifically, relate to a kind of method for speech recognition, speech recognition equipment and electronic equipment.

Background technology

Speech recognition technology is by identifying and understanding and voice changed into the technology of corresponding text or order.In speech recognition technology, by voice being carried out to the processing such as feature extraction, pattern match, model training, and instruction that electron gain equipment can respond, the text that records in electronic equipment etc., thereby user can utilize language and electronic equipment to carry out alternately.

In real voice environment, conventionally have noise, and can mix the interference tones such as pause, s cough in real spoken language, this all affects the recognition accuracy of existing speech recognition system.In addition, if the vocabulary that user says is not in the predefined territory of speech recognition system, be also easier to cause identification error.Therefore, for business-like speech recognition system, the voice that expectation refusal is wrong.Correspondingly, confidence evaluation is used to guarantee the accuracy of identified content, and refusal is by the voice of wrong identification.

Confidence evaluation can carry out test of hypothesis to the recognition result of speech recognition equipment, by the confidence threshold value of prior setting, the reliability of recognition result is evaluated the mistake in positioning result, thereby discrimination and the robustness of raising recognition system.Therefore, it is very crucial that confidence threshold value is reasonably set, and this has become current technical barrier.

Summary of the invention

The embodiment of the present invention provides a kind of method for speech recognition, speech recognition equipment and electronic equipment, it makes it possible to adopt different confidence threshold value to take into account discrimination and the robustness of speech recognition different in the situation that, thereby has improved user's experience.

First aspect, provides a kind of method for speech recognition, is applied to an electronic equipment that comprises the first speech recognition equipment, and described method can comprise: receive a phonetic entry, and obtain the sound signal corresponding with this phonetic entry; Utilize described the first speech recognition equipment to carry out identifying processing and obtain a recognition result described sound signal, this recognition result comprises identification content and degree of confidence, and this degree of confidence is for determining the degree of reliability of this identification content; Set in advance at least two confidence threshold value, each confidence threshold value differs from one another; From described at least two confidence threshold value, select a confidence threshold value; Degree of confidence based in described recognition result and the confidence threshold value of described selection judge that whether described identification content is accurate.

In the described method for speech recognition, described in set in advance at least two confidence threshold value and can comprise: at least one in the identification content that can identify according to described the first speech recognition equipment and its network condition sets in advance at least two confidence threshold value.

In the described method for speech recognition, the content that described the first speech recognition equipment can be identified can comprise a plurality of order words, and at least one in the described identification content that can identify according to described the first speech recognition equipment and its network condition sets in advance at least two confidence threshold value and can comprise: for the order of first in described a plurality of order words word, the first confidence threshold value is set; For the order of second in described a plurality of order words word, the second confidence threshold value is set, this second order word is different from described the first order word.

In the described method for speech recognition, the identification content that can identify according to described the first speech recognition equipment sets in advance at least two confidence threshold value with at least one in its network condition and can comprise: for described the first speech recognition equipment, have the situation that network is connected the 3rd confidence threshold value is set; For described the first speech recognition equipment, do not there is the situation of network connection and the 4th confidence threshold value is set.

In the described method for speech recognition, describedly from described at least two confidence threshold value, select a confidence threshold value to comprise: to determine that whether identification content in described recognition result is corresponding to described the second order word; When described identification content is ordered word corresponding to described second, select the second confidence threshold value; When described identification content does not correspond to described the second order word, determine whether described the first speech recognition equipment has network and connect; When described the first speech recognition equipment has network connection, select the 3rd confidence threshold value; When described the first speech recognition equipment does not have network connection, select the 4th confidence threshold value.

In the described method for speech recognition, described degree of confidence based in described recognition result and the confidence threshold value of described selection judge whether described identification content accurately can comprise: the degree of confidence in described recognition result is compared with selected the second confidence threshold value or selected the 3rd confidence threshold value, and obtain a comparative result; According to described comparative result, judge that whether described identification content is accurate.

In the described method for speech recognition, also can comprise: when the described identification content of judgement is inaccurate, described sound signal is sent to the second speech recognition equipment being connected with described electronic equipment network, this second speech recognition equipment can carry out identifying processing and obtain the second identification content described sound signal; From described the second speech recognition equipment, receive described the second identification content, and using this second identification content as final identification content.

In the described method for speech recognition, also can comprise: described sound signal is sent to the second speech recognition equipment being connected with described electronic equipment network, this second speech recognition equipment can carry out identifying processing and obtain the second identification content described sound signal; When judging that in described decision operation described identification content is inaccurate, at the second speech recognition equipment described in a Preset Time Duan Zhongcong, receive described the second identification content.

In the described method for speech recognition, also can comprise: when receiving described the second identification content at described Preset Time Duan Zhongwei, obtain a low confidence threshold that is less than selected confidence threshold value; With based on this low confidence threshold, judge that whether described identification content accurate.

Second aspect, provides a kind of speech recognition equipment, is applied to an electronic equipment, and this speech recognition equipment can comprise: audio frequency input block, for receiving a phonetic entry, and obtains the sound signal corresponding with this phonetic entry; Recognition unit, for described sound signal is carried out identifying processing and obtained a recognition result, this recognition result comprises identification content and degree of confidence, this degree of confidence is for determining the degree of reliability of this identification content; Threshold value setting unit, for setting in advance at least two confidence threshold value, each confidence threshold value differs from one another; Threshold value acquiring unit, for selecting a confidence threshold value from described at least two confidence threshold value; Judging unit, judges that for the confidence threshold value of the degree of confidence based on described recognition result and described selection whether described identification content is accurate.

In described speech recognition equipment, at least one in the identification content that described threshold value setting unit can be identified according to described recognition unit and its network condition sets in advance at least two confidence threshold value.

In described speech recognition equipment, the content that described speech recognition equipment can be identified can comprise a plurality of order words, and described threshold value setting unit can set in advance at least two confidence threshold value as follows: for the order of first in described a plurality of order words word, the first confidence threshold value is set; For the order of second in described a plurality of order words word, the second confidence threshold value is set, this second order word is different from described the first order word.

In described speech recognition equipment, described threshold value setting unit can set in advance at least two confidence threshold value as follows: for described speech recognition equipment, have the situation of network connection and the 3rd confidence threshold value is set; For described speech recognition equipment, do not there is the situation of network connection and the 4th confidence threshold value is set.

In described speech recognition equipment, described threshold value acquiring unit can comprise: determining means, for determining the identification content of described recognition result, whether corresponding to described second, order word, and when described identification content does not correspond to described the second order word, determine whether described the first speech recognition equipment has network and connect; Alternative pack, be used for when described determining means is determined described identification content corresponding to described the second order word, select the second confidence threshold value, when described determining means determines that described speech recognition equipment has network connection, select the 3rd confidence threshold value, when described determining means determines that described speech recognition equipment does not have network connection, select the 4th confidence threshold value.

In described speech recognition equipment, described judging unit can judge that whether described identification content is accurate as follows: the degree of confidence in described recognition result is compared with selected the second confidence threshold value or selected the 3rd confidence threshold value, and obtain a comparative result; According to described comparative result, judge that whether described identification content is accurate.

In described speech recognition equipment, also can comprise: transmitting element, for when described judging unit judges that described identification content is inaccurate, described sound signal is sent to another speech recognition equipment being connected with described speech recognition equipment network, this another speech recognition equipment can carry out identifying processing and obtain the second identification content described sound signal; Receiving element, for receive described the second identification content from described another speech recognition equipment, and using this second identification content as final identification content.

In described speech recognition equipment, also can comprise: transmitting element, for described sound signal is sent to another speech recognition equipment being connected with described electronic equipment network, this another speech recognition equipment can carry out identifying processing and obtain the second identification content described sound signal; Receiving element, when judging that in described decision operation described identification content is inaccurate, receives described the second identification content at another speech recognition equipment described in a Preset Time Duan Zhongcong, usings this second identification content as final identification content.

In described speech recognition equipment, if described receiving element receives described the second identification content at described Preset Time Duan Zhongwei, described threshold value acquiring unit can obtain a low confidence threshold that is less than selected confidence threshold value, and described judging unit judges that based on this low confidence threshold whether described identification content is accurate.

The third aspect, provides a kind of electronic equipment, comprises speech recognition equipment as above.

According to the technical scheme of the above-mentioned method for speech recognition, speech recognition equipment and the electronic equipment of the embodiment of the present invention, by setting in advance a plurality of confidence threshold value and therefrom selecting confidence threshold value to judge the accuracy of identification content, allow to adopt changeably confidence threshold value to judge that identification content is to take into account discrimination and the robustness of speech recognition, thereby improved user's experience.

Accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the Organization Chart illustrating according to the device that carries out speech recognition of the embodiment of the present invention;

Fig. 2 has been indicative icon according to the process flow diagram of the method for speech recognition of the embodiment of the present invention;

Fig. 3 has been indicative icon according to the process flow diagram of the confidence threshold value setting of the described method for speech recognition of the embodiment of the present invention;

Fig. 4 has been indicative icon according to the process flow diagram of the selection confidence threshold value of the described method for speech recognition of the embodiment of the present invention;

Fig. 5 is indicative icon according to another embodiment of the present invention for the process flow diagram of the method for speech recognition;

Fig. 6 has been indicative icon according to the block diagram of the speech recognition equipment of the embodiment of the present invention;

Fig. 7 the has been indicative icon block diagram of speech recognition equipment according to another embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, rather than whole embodiment, in the situation that not conflicting, the embodiment in the application and the feature in embodiment be combination in any mutually.

Fig. 1 is the Organization Chart that illustrates the communication of each device that carries out speech recognition.

As shown in Figure 1, the first speech recognition equipment 10 receives voice from user, then received voice is identified, if can successfully received voice be identified, obtains identifying accordingly content; If failed, received voice are identified, cannot be obtained identifying content.This first speech recognition equipment 10 can be independent speech recognition equipment, also can be integrated in electronic equipments such as mobile phone, notebook, flat computer.

Utilize current network interconnection technology, described the first speech recognition equipment 10 also may for example be connected with the second speech recognition equipment 20 via network, this second speech recognition equipment 20 conventionally can utilize powerful Internet resources and realize speech recognition more accurately, so may share voice identification result with described the first speech recognition equipment 10.This second speech recognition equipment 20 can be independent speech recognition equipment, also can be integrated in other electronic equipments, for example, is integrated in the electronic equipments such as the webserver, notebook.The first speech recognition equipment 10 can be given the second speech recognition equipment 20 by the voice transfer of described reception, and receives from the second speech recognition equipment 20 content of identifying.

Each speech recognition equipment shown in Fig. 1 is only schematic.The first speech recognition equipment 10 and the second speech recognition equipment 20 are in reciprocity status.For example, the second speech recognition equipment 20 can receive voice, by the voice transfer of described reception, gives the first speech recognition equipment 10, and receives from the first speech recognition equipment 10 content of identifying.

In each embodiment according to the present invention, to be described in the scheme that in individual voice recognition device, (for example, in the first speech recognition equipment 10) carries out speech recognition, and different speech recognition equipments is shared voice identification result, to take into account discrimination and the robustness of speech recognition, thereby improved user's experience.

Fig. 2 has been indicative icon according to the process flow diagram of the method 200 for speech recognition of the embodiment of the present invention.The electronic equipment that this method 200 for speech recognition can be applicable to speech recognition equipment as shown in Figure 1 or comprise described speech recognition equipment.

As shown in Figure 2, should can comprise for the method 200 of speech recognition: receive a phonetic entry, and obtain the sound signal (S210) corresponding with this phonetic entry; Utilize described the first speech recognition equipment to carry out identifying processing and obtain a recognition result described sound signal, this recognition result comprises identification content and degree of confidence, and this degree of confidence is for determining the degree of reliability (S220) of this identification content; Set in advance at least two confidence threshold value, each confidence threshold value differ from one another (S230); From described at least two confidence threshold value, select a confidence threshold value (S240); Whether accurately degree of confidence based in described recognition result and the confidence threshold value of described selection judge described identification content (S250).

In S210, can utilize such as the recording device of microphone, phonographic recorder etc. and receive phonetic entry, described recording device becomes electronic signal by received speech conversion, i.e. the sound signal corresponding with described phonetic entry, thus identify.The voice that receive can be the sound sending with various language (such as Chinese, English, German etc.), can be also the sound that hybrid language is expressed, for example, in Chinese, be mixed with English word.The concrete mode of the mode of sending of the voice that receive and reception voice is not construed as limiting the invention.

In S220, described the first speech recognition equipment can adopt any speech recognition technology that occur existing future carry out identifying processing and obtain a recognition result described sound signal, described recognition result comprises identification content and degree of confidence, and this degree of confidence is for determining the degree of reliability of this identification content.The speech recognition of pattern matching mode of take is example, and in the training stage, user gives an account of each word in vocabulary successively, and as template, deposits its eigenvector in template base; Then, at cognitive phase, from raw tone (being above-mentioned sound signal), extract eigenvector, and by the eigenvector of input voice successively with template base in each template carry out similarity comparison, similarity (being degree of confidence) soprano is exported as recognition result.

In fact, be difficult to carry out exactly voice, this is because following reason, for example, speech pattern is not only different to different speaker, to same speaker, is even also different, and for example speaker voice messaging when arbitrarily speaking and conscientiously speak is different; Voice itself have ambiguity and are subject to contextual impact can change stress, tone, volume and the rate of articulation etc.; Neighbourhood noise and interference have a strong impact on speech recognition.Therefore,, for same phonetic entry, the degree of confidence in the recognition result under varying environment or background also alters a great deal.

Being set, single confidence threshold value judges identification content situation whether accurately, if the height of this confidence threshold value setting, the probability that may cause obtaining identification content (recognition failures) is too large, if this confidence threshold value setting is low, may cause the more identification content in recognition result inaccurate.For example, if phonetic entry is the sound that hybrid language is expressed, for example, in Chinese, be mixed with " opening filefox " of English word, the degree of confidence in recognition result is conventionally lower, if now utilize common confidence threshold value, may cause recognition failures.

In S230, set in advance at least two confidence threshold value, each confidence threshold value differs from one another.With respect to a confidence threshold value is only set, judge that whether identification content is accurate, embodiments of the invention set in advance at least two confidence threshold value, and after this according to different situations, choose different confidence threshold value and judge.As example, at least one in the identification content that can identify according to described the first speech recognition equipment and its network condition sets in advance at least two confidence threshold value.

Fig. 3 has been indicative icon arranges 230 process flow diagram according to the confidence threshold value of the described method for speech recognition of the embodiment of the present invention.As shown in Figure 3, in the situation that the content that described the first speech recognition equipment can be identified comprises a plurality of order words, can the first confidence threshold value (S231) be set for the order of first in described a plurality of order words word; For the order of second in described a plurality of order words word, the second confidence threshold value is set, this second order word is different from described the first order word (S232); For described the first speech recognition equipment, there is the situation of network connection and the 3rd confidence threshold value (S233) is set; For described the first speech recognition equipment, do not there is the situation of network connection and the 4th confidence threshold value (S234) is set.

In S231 and S232, for different order words, different confidence threshold value is set.For example, if the accuracy rate of the first speech recognition equipment identification Chinese speech is high, can be for the higher confidence threshold value of order word setting of Chinese; If the accuracy rate of the first speech recognition equipment identification English Phonetics is low, can be for the lower confidence threshold value of order word setting of English.In addition, in S230, can also other confidence threshold value be set for the 3rd order word, the number of the confidence threshold value based on the setting of order word does not form the restriction to the embodiment of the present invention.Described the first order word can be a specific order word, can be also a class order word that comprises a plurality of order words, for example, be a plurality of Chinese order words.Described the second order word can be a specific order word, can be also a class order word that comprises a plurality of order words, for example, can a special-purpose confidence threshold value be set for elusive order word " FileFox ".

In S233 and S234, for described the first speech recognition equipment, whether there is network and connect to arrange different confidence threshold value, described the 3rd confidence threshold value can be higher than described the 4th confidence threshold value.When the first speech recognition equipment has network connection, if the first speech recognition equipment utilizes the 3rd confidence threshold value and recognition failures, the second speech recognition equipment that can ask network to connect carries out speech recognition to phonetic entry, and the identification content that the second speech recognition equipment is obtained is as final identification content, thereby can in the situation that guaranteeing higher recognition accuracy, there is higher discrimination.Yet, if not having network, the first speech recognition equipment do not connect, suitably reduce confidence threshold value, thereby guarantee the prior discrimination of user.

Can take as required suitable confidence threshold value setting steps, for example, can only adopt above-mentioned S231 and S232, or only adopt above-mentioned S233 and S234.Also can under other scene, take other confidence threshold value setting steps.In addition,, although S230 is illustrated as in Fig. 1 after described S220, can before S210, (in advance) carries out this S230 and each confidence threshold value is set.

In S240, can from described at least two confidence threshold value, select a confidence threshold value according to the current scene of the first speech recognition equipment, for example, can select confidence threshold value according to the network connection state of the identification content corresponding with phonetic entry and the first speech recognition equipment.Can adjust as required the foundation of selection in practice.

Fig. 4 has been indicative icon according to the process flow diagram of the selection confidence threshold value of the described method for speech recognition of the embodiment of the present invention.Below in conjunction with Fig. 4, carry out exemplary description.

As shown in Figure 4, obtain recognition result in S220 after, determine whether the identification content in described recognition result orders word (S241) corresponding to described second; When described identification content is corresponding to described the second order during word (being in S241), select the second confidence threshold value (S242); When described identification content does not correspond to described the second order during word (in S241 no), determine whether described the first speech recognition equipment has network and connect (S243); When described the first speech recognition equipment has network and connects (being in S243), select the 3rd confidence threshold value; When described the first speech recognition equipment does not have network and connects (in S243 no), select the 4th confidence threshold value.

In the example of Fig. 4, in conjunction with two different factors (identifying content is connected with network), select confidence threshold value.In practice, can only according to identification content, select confidence threshold value, when described identification content does not correspond to described the second order word, can select the confidence threshold value of an acquiescence, or can also determine that whether described identification content is corresponding to described the first order word, when described identification content is ordered word corresponding to the described the 3rd, select other confidence threshold value.In a word, consider basis being set the two selecting confidence threshold value of current speech recognition scene and each confidence threshold value.

In S250, the degree of confidence based in described recognition result and the confidence threshold value of described selection judge that whether described identification content is accurate.As example, the degree of confidence in described recognition result can be compared with selected the second confidence threshold value or selected the 3rd confidence threshold value, and obtain a comparative result; According to described comparative result, judge that whether described identification content is accurate.For example, when the degree of confidence in described recognition result is more than or equal to selected confidence threshold value, the identification content in judgement recognition result is accurate, thereby using the identification content in recognition result as final identification content; When the degree of confidence in described recognition result is less than selected confidence threshold value, the identification content in judgement recognition result is inaccurate, thus recognition failures.

According to the technical scheme of the above-mentioned method for speech recognition of the embodiment of the present invention, by setting in advance a plurality of confidence threshold value and therefrom selecting confidence threshold value to judge the accuracy of identification content, allow to adopt changeably confidence threshold value to judge identification content, to take into account discrimination and the robustness of speech recognition, thereby improved user's experience.

In the above-mentioned method for speech recognition, utilize the first speech recognition equipment to carry out speech recognition.As described in conjunction with Figure 1, the second speech recognition equipment that the first speech recognition equipment can also be connected to the network is shared voice identification result, below in conjunction with Fig. 5, is described.

Fig. 5 is indicative icon according to another embodiment of the present invention for the process flow diagram of the method 500 of speech recognition.Should also comprise for the method 500 of speech recognition the step S210-S250 of the above-described method 200 for speech recognition, different from method 200 for speech recognition is, in S250, after recognition failures, also comprise following step S251-S254.

When identification content when judge described recognition result in S250 in is inaccurate, described sound signal (is for example sent to the second speech recognition equipment of being connected with described electronic equipment network, the second speech recognition equipment 20 in Fig. 1), this second speech recognition equipment can carry out identifying processing and obtain the second identification content (S251) described sound signal; And wait for that from described the second speech recognition equipment, receiving described second identifies content (S252), if receive described the second identification content (being S252) from the second speech recognition equipment, this second identification content is finished as final identification content; If fail to receive described the second identification content (S252 no) from the second speech recognition equipment, obtain a low confidence threshold (S253) that is less than selected confidence threshold value; With based on this low confidence threshold, judge described identification content whether accurately (S254) to finish identification.

In the example of Fig. 5, when identification content when judge described recognition result in S250 in is inaccurate, described sound signal is sent to the second speech recognition equipment (S251) being connected with described electronic equipment network.But be not limited to this, after can also obtaining sound signal in S210, immediately described sound signal is sent to the second speech recognition equipment (S252) being connected with described electronic equipment network, thereby when judging that in described S250 described identification content is inaccurate, can from described the second speech recognition equipment, receive described the second identification content as early as possible.

In S252, wait for while receiving described the second identification content from described the second speech recognition equipment, if network congestion or interruption, waits for too long now may cause receiving described the second identification content, if can greatly reduce user's experience.Therefore, can be set in S252 a stand-by period (for example Preset Time section), thereby if receive described the second identification content at this Preset Time Duan Zhongwei, just no longer wait for and receiving.

When not receiving described the second identification content (S252 no) from the second speech recognition equipment, for identification content is provided to user, can again investigate the recognition result in the first speech recognition equipment, to strive for improving discrimination.If user is very high to the accuracy requirement of identification, without carrying out this SS253 and S254, directly finish identification.In S253, can obtain described low confidence threshold by the confidence threshold value of selecting is deducted to a predetermined value in S240, can also be to reselect to obtain described low confidence threshold among the confidence threshold value arranging in S230.

Decision operation and S250 in S254 are similar, based on this low confidence threshold, judge whether accurate (S254) is to finish identification for described identification content.For example, degree of confidence in described recognition result can be compared with described low confidence threshold, when the degree of confidence in described recognition result is more than or equal to described low confidence threshold, identification content in judgement recognition result is accurate, thereby using the identification content in recognition result as final identification content; When the degree of confidence in described recognition result is less than described low confidence threshold, the identification content in judgement recognition result is inaccurate, i.e. recognition failures.

Therefore, because network timeout, server be busy etc., reason cannot obtain in time while utilizing the Network Recognition result that the second speech recognition equipment carries out, by reducing confidence threshold value, the local result of reusing the first speech recognition equipment replaces the feedbacks such as server is busy, network timeout, thereby make the user can be under the condition of network and server inclement condition, obtain recognition result, promote user and experience.If directly select described low confidence threshold in S240, can cause under the good condition of network condition, adopt a large amount of recognition results not too reliably that utilizes the first speech recognition equipment to carry out in this locality.By confidence threshold value is set for twice in S240 and S253, avoided this problem, it just reduces confidence threshold value when not obtaining network result in time.

Therefore, technical scheme in the method 500 for speech recognition of describing in conjunction with Fig. 5, can further adopt neatly confidence threshold value to judge identification content, the advantage that makes full use of each speech recognition equipment is taken into account discrimination and the robustness of speech recognition, thereby has improved user's experience.

Fig. 6 has been indicative icon according to the block diagram of the speech recognition equipment 600 of the embodiment of the present invention.This speech recognition equipment 600 can be applicable to speech recognition equipment as shown in Figure 1 or comprises in the electronic equipment of described speech recognition equipment.

This speech recognition equipment 600 can comprise: audio frequency input block 610, for receiving a phonetic entry, and obtains the sound signal corresponding with this phonetic entry; Recognition unit 620, for described sound signal is carried out identifying processing and obtained a recognition result, this recognition result comprises identification content and degree of confidence, this degree of confidence is for determining the degree of reliability of this identification content; Threshold value setting unit 630, for setting in advance at least two confidence threshold value, each confidence threshold value differs from one another; Threshold value acquiring unit 640, for selecting a confidence threshold value from described at least two confidence threshold value; Judging unit 650, judges that for the confidence threshold value of the degree of confidence based on described recognition result and described selection whether described identification content is accurate.

Described audio frequency input block 610 is such as the recording device that is microphone, phonographic recorder etc., and it receives phonetic entry, and received speech conversion is become to electronic signal, i.e. the sound signal corresponding with described phonetic entry, thus identify.The voice that receive can be the sound sending with various language, can be also the sound that hybrid language is expressed.The concrete mode of the mode of sending of the voice that receive and reception voice is not construed as limiting the invention.

Described recognition unit 620 can adopt any speech recognition technology that occur existing future carry out identifying processing and obtain a recognition result described sound signal.The speech recognition of pattern matching mode of take is example, and in the training stage, user gives an account of each word in vocabulary successively, and as template, deposits its eigenvector in template base; Then, at cognitive phase, from the sound signal of phonetic entry, extract eigenvector, and by this eigenvector successively with template base in each template carry out similarity comparison, similarity (being degree of confidence) soprano is exported as recognition result.

Utilizing regularly single confidence threshold value to judge whether accurately situation of identification content, if the height of this confidence threshold value setting, the probability that may cause obtaining identification content (recognition failures) is too large, if this confidence threshold value setting is low, may cause the more identification content in recognition result inaccurate.

Described threshold value setting unit 630 sets in advance at least two confidence threshold value, to after this choose different confidence threshold value according to different situations, judges.As example, at least one in the identification content that described threshold value setting unit 630 can be identified according to described recognition unit and its network condition sets in advance at least two confidence threshold value.Described threshold value setting unit 630 can arrange suitable confidence threshold value as required, also can under other scene, take other confidence threshold value setting steps.

The content that can identify at described speech recognition equipment comprises a plurality of order words, described threshold value setting unit 630 can arrange different confidence threshold value for different order words, for example, for the order of first in described a plurality of order words word, the first confidence threshold value is set; For the order of second in described a plurality of order words word, the second confidence threshold value is set, this second order word is different from described the first order word.In addition, described threshold value setting unit 630 can also arrange other confidence threshold value for the 3rd order word.For example, if the accuracy rate of described speech recognition equipment identification Chinese speech is high, can be for the higher confidence threshold value of order word setting of Chinese; If the accuracy rate of described speech recognition equipment identification English Phonetics is low, can be for the lower confidence threshold value of order word setting of English.Each of described the first order word and the second order word can be a specific order word, can be also a class order word that comprises a plurality of order words.

Whether described threshold value setting unit 630 can also have network for described speech recognition equipment connects to arrange different confidence threshold value, for example, described threshold value setting unit 630 can have the situation of network connection and the 3rd confidence threshold value is set for described speech recognition equipment; For described speech recognition equipment, do not have the situation of network connection and the 4th confidence threshold value is set, described the 3rd confidence threshold value can be higher than described the 4th confidence threshold value.When speech recognition equipment has network connection, if speech recognition equipment utilizes the 3rd confidence threshold value and recognition failures, another speech recognition equipment that can ask network to connect carries out speech recognition to phonetic entry, and the identification content that another speech recognition equipment is obtained is as final identification content, thereby can in the situation that guaranteeing higher recognition accuracy, there is higher discrimination.Yet, if not having network, speech recognition equipment do not connect, suitably reduce confidence threshold value, thereby guarantee the prior discrimination of user.

Described threshold value acquiring unit 640 can be selected a confidence threshold value according to the current scene of speech recognition equipment from described at least two confidence threshold value, for example, can select confidence threshold value according to the network connection state of the identification content corresponding with phonetic entry and speech recognition equipment.Can adjust as required the foundation of selection in practice.

For example, described threshold value acquiring unit 640 can comprise: determining means, for determining the identification content of described recognition result, whether corresponding to described second, order word, and when described identification content does not correspond to described the second order word, determine whether described the first speech recognition equipment has network and connect; Alternative pack, be used for when described determining means is determined described identification content corresponding to described the second order word, select the second confidence threshold value, when described determining means determines that described speech recognition equipment has network connection, select the 3rd confidence threshold value, when described determining means determines that described speech recognition equipment does not have network connection, select the 4th confidence threshold value.

In addition, described threshold value acquiring unit 640 can only be selected confidence threshold value according to identification content, when determining means determines that described identification content does not correspond to described the second order word, alternative pack can be selected the confidence threshold value of an acquiescence, or determining means can also determine that described identification content is whether during corresponding to described the first order word, the 3rd order word etc., to select other confidence threshold value.In a word, described threshold value acquiring unit 640 will be considered basis being set the two selecting confidence threshold value of current speech recognition scene and each confidence threshold value.

The degree of confidence of described judging unit 650 based in described recognition result and the confidence threshold value of described selection judge that whether described identification content is accurate.As example, described judging unit 650 can be compared the degree of confidence in described recognition result with selected confidence threshold value, and obtains a comparative result; According to described comparative result, judge that whether described identification content is accurate.When the degree of confidence in described recognition result is more than or equal to selected confidence threshold value, the identification content in described judging unit 650 judgement recognition results is accurate, thereby using the identification content in recognition result as final identification content; When the degree of confidence in described recognition result is less than selected confidence threshold value, the identification content in described judging unit 650 judgement recognition results is inaccurate, thus recognition failures.

Alternatively, described speech recognition equipment also can comprise transmitting element 660 and receiving element 670, as shown in the dotted line frame in Fig. 6.For example, when the described identification content of described judging unit 650 judgement is inaccurate, described transmitting element 660 can be sent to described sound signal another speech recognition equipment being connected with described speech recognition equipment network, and this another speech recognition equipment can carry out identifying processing and obtain the second identification content described sound signal; Described receiving element 670 can receive described the second identification content from described another speech recognition equipment, and using this second identification content as final identification content.In the example of Fig. 5, when identification content when judge described recognition result in S250 in is inaccurate, described sound signal is sent to another speech recognition equipment being connected with described electronic equipment network.

In addition, described transmitting element 660 can also be after described audio frequency input block 610 obtains sound signal, immediately described sound signal is sent to another speech recognition equipment being connected with described electronic equipment network, thereby described receiving element 660 can receive described the second identification content from described another speech recognition equipment as early as possible when the described identification content of judging unit 650 judgement is inaccurate.

If network congestion or interrupt, may cause described receiving element 670 can not receive described the second identification content, if waits for too long now can greatly reduce user's experience.Therefore, can be set a stand-by period (for example Preset Time section), thereby if receiving element 670 receives described the second identification content at this Preset Time Duan Zhongwei, described speech recognition equipment just no longer receives.Now, described threshold value acquiring unit 640 can obtain a low confidence threshold that is less than selected confidence threshold value, and described judging unit 650 judges that based on this low confidence threshold whether described identification content is accurate.

When receiving element 670 does not receive described the second identification content from another speech recognition equipment, for identification content is provided to user, can again investigate the recognition result in speech recognition equipment, to strive for improving discrimination.Therefore, described threshold value acquiring unit 640 obtains low confidence threshold, this threshold value acquiring unit 640 can obtain described low confidence threshold by the confidence threshold value of current selection is deducted to a predetermined value, can also among each set confidence threshold value, reselect to obtain described low confidence threshold.Subsequently, described judging unit 650 judges that based on this low confidence threshold whether described identification content is accurate.

Therefore, because network timeout, server be busy etc., reason cannot obtain in time while utilizing the Network Recognition result that another speech recognition equipment carries out, by reducing confidence threshold value, the local result of reusing speech recognition equipment replaces the feedbacks such as server is busy, network timeout, thereby make the user can be under the condition of network and server inclement condition, obtain recognition result, promote user and experience.If described threshold value acquiring unit 640 is directly selected described low confidence threshold, can cause under the good condition of network condition, adopt a large amount of recognition results not too reliably that utilizes speech recognition equipment to carry out in this locality.By twice, confidence threshold value is set and has avoided this problem, it just reduces confidence threshold value when not obtaining network result in time.

According in the technical scheme of the above-mentioned speech recognition equipment of the embodiment of the present invention, allow to adopt changeably confidence threshold value to judge identification content, and the advantage that makes full use of each speech recognition equipment takes into account discrimination and the robustness of speech recognition, thereby improved user's experience.

Fig. 7 the has been indicative icon block diagram of speech recognition equipment 700 according to another embodiment of the present invention.This speech recognition equipment 700 can with other speech recognition equipment coupled in communication, this speech recognition equipment 700 comprises: storer 710, for program code stored; Processor 720, for carrying out described program code to realize the method for describing in conjunction with Fig. 2-5.

Storer 710 can comprise at least one in ROM (read-only memory) and random access memory, and provides instruction and data to processor 720.A part for storer 710 can also comprise non-volatile row random access memory (NVRAM).

Processor 720 can be general processor, digital signal processor (DSP), special IC (ASIC), ready-made programmable gate array (FPGA) or other programmable logic device (PLD), discrete gate or transistor logic device, discrete hardware components.General processor can be the processor of microprocessor or any routine etc.

In conjunction with the step of the disclosed method of the embodiment of the present invention, can directly be presented as by processor complete, or complete with the hardware in processor and software module combination.Software module can be positioned at random access memory, flash memory, ROM (read-only memory), in the storage medium of this area maturations such as programmable read only memory or electrically erasable programmable storer, register.This storage medium is arranged in storer 710, and the information in processor 720 read memories 710 completes the step of said method in conjunction with its hardware.

In conjunction with Fig. 6-7, disclose according in the situation of the speech recognition equipment of the embodiment of the present invention in the above, all electronic equipments that comprise described speech recognition equipment are open scopes in the embodiment of the present invention all also.

Those of ordinary skills can recognize, unit and the algorithm steps of each example of describing in conjunction with embodiment disclosed herein, can realize with the combination of electronic hardware or computer software and electronic hardware.These functions are carried out with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can specifically should be used for realizing described function with distinct methods to each, but this realization should not thought and exceeds scope of the present invention.

Those skilled in the art can be well understood to, and for convenience and simplicity of description, the specific works process of the device of foregoing description and unit, can, with reference to the corresponding process in preceding method embodiment, not repeat them here.

In the several embodiment that provide in the application, should be understood that disclosed equipment and method can realize by another way.For example, device embodiment described above is only schematic, for example, the division of described unit, be only that a kind of logic function is divided, during actual realization, can have other dividing mode, for example a plurality of unit or assembly can in conjunction with or can be integrated into another equipment, or some features can ignore, or do not carry out.

The described unit as separating component explanation can or can not be also physically to separate, and the parts that show as unit can be or can not be also physical locations.Can select according to the actual needs some or all of unit wherein to realize the object of the present embodiment scheme.

If described function usings that the form of SFU software functional unit realizes and during as production marketing independently or use, can be stored in a computer read/write memory medium.Understanding based on such, the part that technical scheme of the present invention contributes to prior art in essence in other words or the part of this technical scheme can embody with the form of software product, this computer software product is stored in a storage medium, comprise that some instructions are with so that a computer equipment (can be personal computer, server, or the network equipment etc.) carry out all or part of step of method described in each embodiment of the present invention.And aforesaid storage medium comprises: various media that can be program code stored such as USB flash disk, portable hard drive, ROM (read-only memory), random access memory, magnetic disc or CDs.

The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited to this, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; can expect easily changing or replacing, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion by the described protection domain with claim.

Claims

1. for a method for speech recognition, be applied to an electronic equipment that comprises the first speech recognition equipment, described method comprises:

Receive a phonetic entry, and obtain the sound signal corresponding with this phonetic entry;

Utilize described the first speech recognition equipment to carry out identifying processing and obtain a recognition result described sound signal, this recognition result comprises identification content and degree of confidence, and this degree of confidence is for determining the degree of reliability of this identification content;

Set in advance at least two confidence threshold value, each confidence threshold value differs from one another;

From described at least two confidence threshold value, select a confidence threshold value;

Degree of confidence based in described recognition result and the confidence threshold value of described selection judge that whether described identification content is accurate.

2. according to the process of claim 1 wherein, described in set in advance at least two confidence threshold value and comprise: at least one in the identification content that can identify according to described the first speech recognition equipment and its network condition sets in advance at least two confidence threshold value.

3. according to the method for claim 2, wherein, the content that described the first speech recognition equipment can be identified comprises a plurality of order words, and at least one in the described identification content that can identify according to described the first speech recognition equipment and its network condition sets in advance at least two confidence threshold value and comprise:

For the order of first in described a plurality of order words word, the first confidence threshold value is set;

For the order of second in described a plurality of order words word, the second confidence threshold value is set, this second order word is different from described the first order word.

4. according to the method for claim 2 or 3, wherein, at least one in the described identification content that can identify according to described the first speech recognition equipment and its network condition sets in advance at least two confidence threshold value and comprises:

For described the first speech recognition equipment, there is the situation of network connection and the 3rd confidence threshold value is set;

For described the first speech recognition equipment, do not there is the situation of network connection and the 4th confidence threshold value is set.

5. according to the method for claim 4, wherein, describedly from described at least two confidence threshold value, select a confidence threshold value to comprise:

Determine whether the identification content in described recognition result orders word corresponding to described second;

When described identification content is ordered word corresponding to described second, select the second confidence threshold value;

When described identification content does not correspond to described the second order word, determine whether described the first speech recognition equipment has network and connect;

When described the first speech recognition equipment has network connection, select the 3rd confidence threshold value;

When described the first speech recognition equipment does not have network connection, select the 4th confidence threshold value.

6. according to the method for claim 5, wherein, described degree of confidence based in described recognition result and the confidence threshold value of described selection judge whether described identification content accurately comprises:

Degree of confidence in described recognition result is compared with selected the second confidence threshold value or selected the 3rd confidence threshold value, and obtain a comparative result;

According to described comparative result, judge that whether described identification content is accurate.

7. according to the method for claim 1, also comprise:

When the described identification content of judgement is inaccurate, described sound signal is sent to the second speech recognition equipment being connected with described electronic equipment network, this second speech recognition equipment can carry out identifying processing and obtain the second identification content described sound signal;

From described the second speech recognition equipment, receive described the second identification content, and using this second identification content as final identification content.

8. according to the method for claim 1, also comprise:

Described sound signal is sent to the second speech recognition equipment being connected with described electronic equipment network, this second speech recognition equipment can carry out identifying processing and obtain the second identification content described sound signal;

When judging that in described decision operation described identification content is inaccurate, at the second speech recognition equipment described in a Preset Time Duan Zhongcong, receive described the second identification content.

9. method according to Claim 8, also comprises:

When receiving described the second identification content at described Preset Time Duan Zhongwei, obtain a low confidence threshold that is less than selected confidence threshold value; With

Based on this low confidence threshold, judge that whether described identification content is accurate.

10. a speech recognition equipment, is applied to an electronic equipment, and this speech recognition equipment comprises:

Audio frequency input block, for receiving a phonetic entry, and obtains the sound signal corresponding with this phonetic entry;

Recognition unit, for described sound signal is carried out identifying processing and obtained a recognition result, this recognition result comprises identification content and degree of confidence, this degree of confidence is for determining the degree of reliability of this identification content;

Threshold value setting unit, for setting in advance at least two confidence threshold value, each confidence threshold value differs from one another;

Threshold value acquiring unit, for selecting a confidence threshold value from described at least two confidence threshold value;

Judging unit, judges that for the confidence threshold value of the degree of confidence based on described recognition result and described selection whether described identification content is accurate.

11. according to the speech recognition equipment of claim 10, and wherein, at least one in the identification content that described threshold value setting unit can be identified according to described recognition unit and its network condition sets in advance at least two confidence threshold value.

12. according to the speech recognition equipment of claim 11, and wherein, the content that described speech recognition equipment can be identified comprises a plurality of order words, and described threshold value setting unit sets in advance at least two confidence threshold value as follows:

13. according to the speech recognition equipment of claim 11 or 12, and wherein, described threshold value setting unit sets in advance at least two confidence threshold value as follows:

For described speech recognition equipment, there is the situation of network connection and the 3rd confidence threshold value is set;

For described speech recognition equipment, do not there is the situation of network connection and the 4th confidence threshold value is set.

14. according to the speech recognition equipment of claim 13, and wherein, described threshold value acquiring unit comprises:

Determining means, whether corresponding to described the second order word, and when described identification content does not correspond to described the second order word, determines whether described the first speech recognition equipment has network and connect for the identification content of determining described recognition result;

Alternative pack, be used for when described determining means is determined described identification content corresponding to described the second order word, select the second confidence threshold value, when described determining means determines that described speech recognition equipment has network connection, select the 3rd confidence threshold value, when described determining means determines that described speech recognition equipment does not have network connection, select the 4th confidence threshold value.

15. according to the speech recognition equipment of claim 14, wherein, described judging unit judges that whether described identification content is accurate as follows: the degree of confidence in described recognition result is compared with selected the second confidence threshold value or selected the 3rd confidence threshold value, and obtain a comparative result; According to described comparative result, judge that whether described identification content is accurate.

16. according to the speech recognition equipment of claim 10, also comprises:

Transmitting element, for when described judging unit judges that described identification content is inaccurate, described sound signal is sent to another speech recognition equipment being connected with described speech recognition equipment network, this another speech recognition equipment can carry out identifying processing and obtain the second identification content described sound signal;

Receiving element, for receive described the second identification content from described another speech recognition equipment, and using this second identification content as final identification content.

17. according to the speech recognition equipment of claim 10, also comprises:

Transmitting element, for described sound signal being sent to another speech recognition equipment being connected with described electronic equipment network, this another speech recognition equipment can carry out identifying processing and obtain the second identification content described sound signal;

Receiving element, when judging that in described decision operation described identification content is inaccurate, receives described the second identification content at another speech recognition equipment described in a Preset Time Duan Zhongcong, usings this second identification content as final identification content.

18. according to the speech recognition equipment of claim 17, and wherein, described receiving element receives described the second identification content at described Preset Time Duan Zhongwei,

Described threshold value acquiring unit obtains a low confidence threshold that is less than selected confidence threshold value,

Described judging unit judges that based on this low confidence threshold whether described identification content is accurate.

19. 1 kinds of electronic equipments, comprise the speech recognition equipment as described in any one in claim 10-18.