CN103440867A - Method and system for recognizing voice - Google Patents

Method and system for recognizing voice Download PDF

Info

Publication number
CN103440867A
CN103440867A CN2013103350500A CN201310335050A CN103440867A CN 103440867 A CN103440867 A CN 103440867A CN 2013103350500 A CN2013103350500 A CN 2013103350500A CN 201310335050 A CN201310335050 A CN 201310335050A CN 103440867 A CN103440867 A CN 103440867A
Authority
CN
China
Prior art keywords
recognition result
clouds
local
identification engine
confidence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013103350500A
Other languages
Chinese (zh)
Other versions
CN103440867B (en
Inventor
朱国正
任严佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201310335050.0A priority Critical patent/CN103440867B/en
Publication of CN103440867A publication Critical patent/CN103440867A/en
Application granted granted Critical
Publication of CN103440867B publication Critical patent/CN103440867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a method and system for recognizing voice. The method comprises the step of obtaining a voice message sent by a user, the step of sequentially sending the voice message to a cloud terminal recognition engine and a local recognition engine to enable the cloud terminal recognition engine and the local recognition engine to respectively recognize the voice message, the step of outputting a cloud terminal recognition result if the cloud terminal recognition result returned by the cloud terminal recognition engine is received at first, and the step of outputting a local recognition result if the local recognition result of the local recognition engine is received at first and the confidence coefficient corresponding to the local recognition result exceeds the upper limit of a set confidence coefficient section. By means of the method and system for recognizing the voice, a reliable voice recognition result can also be provided for the user under the condition that the network is poor or the network does not exist.

Description

Audio recognition method and system
Technical field
The present invention relates to the speech recognition technology field, be specifically related to a kind of audio recognition method and system.
Background technology
Growing along with Computer Science and Technology, speech recognition technology is ripe gradually.And be widely used in mobile phone, TV, the field such as vehicle-mounted.Take vehicle-mounted is example, because the people can not use easily the manual manipulation interface when driving, makes speech recognition as a kind of interactive mode relatively easily, and making vehicle-mountedly can provide more function.In prior art, the pattern of speech recognition is generally: receive user's voice messaging, connect with the high in the clouds speech recognition server, send voice messaging to server, by server, this information is identified, then returned to recognition result to client.But not necessarily have stable network to connect on mobile device, high in the clouds is returned and may be experienced larger delay in this case, reduce the user and experience, even there is no network, cause high in the clouds identification not available.
Summary of the invention
The invention provides a kind of audio recognition method and system, can be in the situation that network be bad or do not have network also can provide reliable voice identification result for the user.
For this reason, the invention provides following technical scheme:
A kind of audio recognition method comprises:
Obtain the voice messaging that the user sends;
Described voice messaging is sent to respectively to high in the clouds identification engine and local identification engine, so that described high in the clouds identification engine and local identification engine are identified described voice messaging respectively;
If first receive the high in the clouds recognition result that described high in the clouds identification engine returns, export described high in the clouds recognition result;
If first receive the local recognition result of described local identification engine, and degree of confidence corresponding to described local recognition result be greater than the confidence interval upper limit of setting, exports described local recognition result.
Preferably, described method also comprises:
If described degree of confidence, in described confidence interval, reduces the described confidence interval upper limit successively within the waiting time of setting;
If receive the high in the clouds recognition result that described high in the clouds identification engine returns within described waiting time, export described high in the clouds recognition result;
If receive the high in the clouds recognition result that described high in the clouds identification engine returns within described waiting time, and degree of confidence corresponding to described local recognition result be greater than the confidence interval upper limit after reduction, exports described local recognition result.
Preferably, each waiting time is identical or different.
Preferably, described method also comprises:
If after reducing the frequency threshold value of number of times over setting of the described confidence interval upper limit, the degree of confidence that described local recognition result is corresponding still is less than the confidence interval lower limit after reduction, and do not receive yet described high in the clouds recognition result, to the user, return to recognition failures information.
Preferably, described method also comprises:
If first receive described local recognition result, and degree of confidence corresponding to described local recognition result be less than the confidence interval lower limit of setting, abandons described local recognition result, continues to wait for that described high in the clouds identification engine returns to the high in the clouds recognition result;
If the stand-by period surpasses the obstruction duration of setting, to the user, return to recognition failures information.
Preferably, described method also comprises:
After receiving the speech recognition request of user's transmission, open high in the clouds identification engine and the local engine of identifying.
A kind of speech recognition system comprises:
The voice messaging acquiring unit, the voice messaging sent for obtaining the user;
Transmitting element, for described voice messaging being sent to respectively to high in the clouds identification engine and local identification engine, so that described high in the clouds identification engine and local identification engine are identified described voice messaging respectively;
Receiving element, the high in the clouds recognition result and the described local local recognition result of identifying engine that for receiving described high in the clouds identification engine, return;
Output unit, during for the high in the clouds recognition result that first receives at described receiving element that described high in the clouds identification engine returns, export described high in the clouds recognition result; First receive the local recognition result of described local identification engine at described receiving element, and degree of confidence corresponding to described local recognition result be greater than on the confidence interval of setting in limited time, export described local recognition result.
Preferably, described system also comprises:
The degree of confidence adjustment unit for when described degree of confidence is in described confidence interval, reduces the described confidence interval upper limit successively within the waiting time of setting;
Described output unit, during the high in the clouds recognition result that also for described receiving element within described waiting time, receives that described high in the clouds identification engine returns, export described high in the clouds recognition result; Within described waiting time, described receiving element does not receive the high in the clouds recognition result that described high in the clouds identification engine returns, and degree of confidence corresponding to described local recognition result be greater than on the confidence interval after reduction in limited time, exports described local recognition result.
Preferably, described system also comprises:
Statistic unit, reduce the number of times of the described confidence interval upper limit for adding up described degree of confidence adjustment unit;
Described output unit, also for after surpassing at described number of times the frequency threshold value of setting, if the degree of confidence that local recognition result is corresponding still is less than the confidence interval lower limit after reduction, and do not receive yet described high in the clouds recognition result, to the user, return to recognition failures information.
Preferably, described receiving element, also for formerly receiving described local recognition result, and degree of confidence corresponding to described local recognition result is less than under the confidence interval of setting in limited time, abandon described local recognition result, continue to wait for that described high in the clouds identification engine returns to the high in the clouds recognition result; And, after surpassing in the stand-by period obstruction duration of setting, to the user, return to recognition failures information.
Preferably, described system also comprises:
Trigger element, after the speech recognition request receiving user's transmission, open high in the clouds identification engine and the local engine of identifying.
The audio recognition method that the embodiment of the present invention provides and system, combine this locality identification with high in the clouds identification, after the voice messaging that receives user's transmission, send to respectively high in the clouds identification engine and local identification engine to be identified described voice messaging.And, during the high in the clouds recognition result that formerly receives that identification engine in high in the clouds returns, directly export the high in the clouds recognition result.If first receive the local recognition result of local identification engine, and degree of confidence corresponding to local recognition result be greater than on the confidence interval of setting in limited time, exports local recognition result.And adhere to that the high in the clouds recognition result is better than local recognition result, if high in the clouds identification can return results before this locality identification provides a relatively accurate recognition result, adopt the high in the clouds recognition result.Thereby can when there is no network insertion, complete utilize local identification engine complete need not network local function, as make a phone call, send short messages, listen to the music etc.
Further, if the degree of confidence of the local recognition result first received is lower, in the confidence interval arranged, by constantly reducing the degree of confidence thresholding of local identification, until a qualified output or recognition failures are arranged.
The scheme provided due to the embodiment of the present invention combines this locality identification with high in the clouds identification, can guarantee in the situation that network is bad or do not have network that reliable voice identification result is provided as much as possible.
The accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, below will the accompanying drawing of required use in embodiment be briefly described, apparently, the accompanying drawing the following describes is only some embodiment that put down in writing in the present invention, for those of ordinary skills, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is a kind of process flow diagram of embodiment of the present invention audio recognition method;
Fig. 2 is the another kind of process flow diagram of embodiment of the present invention audio recognition method;
Fig. 3 is a kind of structural representation of embodiment of the present invention speech recognition system;
Fig. 4 is the another kind of structural representation of embodiment of the present invention speech recognition system.
Embodiment
In order to make those skilled in the art person understand better the scheme of the embodiment of the present invention, below in conjunction with drawings and embodiments, the embodiment of the present invention is described in further detail.
The embodiment of the present invention provides a kind of audio recognition method and system, in conjunction with high in the clouds identification and local identification, can when there is no network insertion, complete utilize local identification engine complete need not network local function, as make a phone call, send short messages, listen to the music etc.Also can connect according to network time delay dynamic reducing to the requirement of local engine results.
As shown in Figure 1, be a kind of process flow diagram of embodiment of the present invention audio recognition method, comprise the following steps:
Step 101, obtain the voice messaging that the user sends.
Step 102, send to respectively high in the clouds identification engine and local identification engine by described voice messaging, so that described high in the clouds identification engine and local identification engine are identified described voice messaging respectively.
The voice messaging that can send with the recording module recording user particularly.The voice messaging of recording can directly send to high in the clouds identification engine and local identification engine; Also can first with the voice detection module, filter out effective load point, and then send to high in the clouds identification engine and local identification engine.
Step 103, if first receive the high in the clouds recognition result that described high in the clouds identification engine returns, export described high in the clouds recognition result.
Because the server in high in the clouds identification engine performance is powerful, recognition result has higher degree of confidence, therefore, after preferentially receiving the high in the clouds recognition result, can directly export this recognition result.
Step 104, if first receive the local recognition result of described local identification engine, and degree of confidence corresponding to described local recognition result be greater than the confidence interval upper limit of setting, exports described local recognition result.
Due in the situation that network environment is bad, the recognition result in high in the clouds may have sizable delay.Now, obtain the confidence value of the corresponding local recognition result of this voice messaging and this result, if this confidence value is greater than the degree of confidence thresholding that system arranges, illustrate that this recognition result is fully available, therefore export local recognition result, without waiting for again the high in the clouds recognition result.
Visible, the audio recognition method that the embodiment of the present invention provides, this locality identification is identified and combined with high in the clouds, and the degree of confidence of the priority of returning according to high in the clouds recognition result and local recognition result and the local recognition result preferentially returned decides the knowledge of selecting to add result.And the result of adhering to all the time high in the clouds is better than this locality, if high in the clouds identification can return results before this locality identification provides a relatively accurate identification, just adopt the result in high in the clouds.
In order further to solve in network delay or the unavailable situation of network and also to access the voice identification result with certain accuracy rate, another embodiment of audio recognition method of the present invention can also dynamically adjust the degree of confidence thresholding of local identification according to current network condition, export best result in the shortest time delay.
As shown in Figure 2, be the another kind of process flow diagram of embodiment of the present invention audio recognition method, comprise the following steps:
Step 201 in Fig. 2 is identical to step 103 with the step 101 in Fig. 1 to step 203, does not repeat them here.
Step 204, if first receive the local recognition result of local identification engine, obtain the degree of confidence that local recognition result is corresponding.
In addition, in step 204, need to determine follow-up processing operation according to the degree of confidence of local recognition result, guarantee the best result of output within the shortest time delay.Particularly, if degree of confidence is less than the confidence interval lower limit of setting, perform step 205; If degree of confidence, in the confidence interval of setting, performs step 208; If degree of confidence is greater than the confidence interval upper limit of setting, perform step 213.
Step 205, abandon local recognition result, continues to wait for that high in the clouds identification engine returns to the high in the clouds recognition result.
Step 206, judge whether the stand-by period surpasses the obstruction duration of setting; If so, perform step 207; Otherwise continue to wait for.
Step 207, return to recognition failures information to the user.
Step 208 reduces the described confidence interval upper limit successively within the waiting time of setting.
Step 209, whether judgement receives the high in the clouds recognition result within described waiting time.If so, perform step 210; Otherwise, perform step 211.
Step 210, output high in the clouds recognition result.
Step 211, judge the confidence interval upper limit after whether degree of confidence that local recognition result is corresponding is greater than current reduction.If so, perform step 213; Otherwise, perform step 212.
Step 212, whether the number of times that judgement reduces the described confidence interval upper limit surpasses the frequency threshold value set (such as being that frequency threshold value can be 1 to 3 etc.).If so, perform step 207; Otherwise, return to step 208.
Step 213, export local recognition result.
It should be noted that, the waiting time of mentioning in above-mentioned steps 208 is the time interval of reducing the confidence interval upper limit, such as can be 2-5 second etc., and reduce time interval of the confidence interval upper limit at every turn can be identical, also can be different.And the stand-by period of mentioning in above-mentioned steps 206 and above-mentioned waiting time are two different concepts, the described stand-by period refers to waits for the time that receives the high in the clouds recognition result, its starting point can be to send to respectively high in the clouds identification engine and local identification engine to start timing described voice messaging, can be also to start timing from abandoning local recognition result, this embodiment of the present invention is not done to restriction.
In addition, in actual applications, do not receive the high in the clouds recognition result in certain hour after the described confidence interval upper limit of each reduction, and in the situation that degree of confidence corresponding to local recognition result can not meet the demands, whether the number of times that also can not go judgement to reduce the described confidence interval upper limit surpasses the frequency threshold value of setting, but whether the time that judgement is waited for surpasses the stand-by period limited, if surpass, to the user, return to recognition failures information, to prevent waits for too long, affect the user and experience.
The speech data that has powerful server handling ability and magnanimity due to high in the clouds is compared, the recognition result degree of confidence is high, and local identification need not network support, very high recognition speed and the very wide scope of application are arranged, on especially applicable mobile devices that connect without stabilizing network.Therefore, the audio recognition method of the embodiment of the present invention combines this locality identification with high in the clouds identification, take into account both advantages separately, after the voice messaging that gets user's transmission, sends to high in the clouds identification engine and local identification engine to be identified it simultaneously.If high in the clouds identification can return results before this locality identification provides a relatively accurate identification, adopt the high in the clouds recognition result.Otherwise, constantly reducing the degree of confidence thresholding of local identification, until a qualified output or recognition failures are arranged, therefore can guarantee in the situation that network is bad or do not have network that reliable voice identification result is provided as much as possible.
The audio recognition method of the embodiment of the present invention, meet network identification to local command when obstructed by simple local identification engine efficiently, in addition, the time delay that can reduce identification due to the choice strategy to high in the clouds and local recognition result, can dynamically adjust according to current network condition the degree of confidence thresholding of local identification, thereby guarantee to export in the shortest time delay best result.
In addition, it should be noted that, in actual applications, can, after the speech recognition request that receives user's transmission, open high in the clouds identification engine and the local engine of identifying.Such as, described speech recognition request can send when the user presses the speech recognition key, or provides the voice arousal function to the user, and on backstage, one direct-open recording sends when recognizing special key words.
Can adopt some conventional recognition methodss for this locality identification engine to the identification of special key words, such as, local identification engine reads the grammar file that predefined is good, this document has defined the set of the order word that speech recognition supports, and the set of same function order word all exists in dictionary, the efficiently access of local identification engine.Local identification engine generates a recognition network by grammar file, local identification engine extracts the characteristic information of input voice and mates in the enterprising walking along the street of recognition network footpath, final every user says any a word in this grammar file range of definition, all can, by system identification, thereby know, described special key words.
Certainly, high in the clouds identification engine and local identification engine are specifically adopted to which kind of speech recognition technology, and the embodiment of the present invention is not done restriction, especially to this locality identification engine, can need to select according to concrete application scenarios, can not affect the above-mentioned effect that the present invention can reach.
Correspondingly, the embodiment of the present invention also provides a kind of speech recognition system, as shown in Figure 3, is a kind of structural representation of this system.
In this embodiment, described system comprises:
Voice messaging acquiring unit 301, the voice messaging sent for obtaining the user.
Transmitting element 302, for described voice messaging being sent to respectively to high in the clouds identification engine and local identification engine, so that described high in the clouds identification engine and local identification engine are identified described voice messaging respectively.
Receiving element 303, the high in the clouds recognition result and the described local local recognition result of identifying engine that for receiving described high in the clouds identification engine, return.
Output unit 304, during for the high in the clouds recognition result that first receives at receiving element 303 that described high in the clouds identification engine returns, export described high in the clouds recognition result; First receive the local recognition result of described local identification engine at receiving element 303, and degree of confidence corresponding to described local recognition result be greater than on the confidence interval of setting in limited time, export described local recognition result.
The speech recognition system that the embodiment of the present invention provides, identify this locality identification to combine with high in the clouds, and the degree of confidence of the priority of returning according to high in the clouds recognition result and local recognition result and the local recognition result preferentially returned decides the knowledge of selecting to add result.And the result of adhering to all the time high in the clouds is better than this locality, if high in the clouds identification can return results before this locality identification provides a relatively accurate identification, just adopt the result in high in the clouds.
In order further to solve in network delay or the unavailable situation of network and also to access the voice identification result with certain accuracy rate, another embodiment of speech recognition system of the present invention can also dynamically adjust the degree of confidence thresholding of local identification according to current network condition, export best result in the shortest time delay.
As shown in Figure 4, be the structural representation of another embodiment of speech recognition system of the present invention.
From embodiment illustrated in fig. 3 different, in this embodiment, described system also comprises:
Degree of confidence adjustment unit 401 for when described degree of confidence is in described confidence interval, reduces the described confidence interval upper limit successively within the waiting time of setting.
Correspondingly, in this embodiment, during the recognition result of described output unit 304 also receives for receiving element 303 within described waiting time that described high in the clouds identification engine returns high in the clouds, export described high in the clouds recognition result; Within described waiting time, receiving element 303 does not receive the high in the clouds recognition result that described high in the clouds identification engine returns, and degree of confidence corresponding to described local recognition result be greater than on the confidence interval after reduction in limited time, exports described local recognition result.
In addition, in order to prevent from waiting for the overlong time of recognition result output, affect the user and experience, as shown in Figure 4, this system also can further comprise: statistic unit 402, and for adding up the number of times of the described confidence interval upper limit of described degree of confidence adjustment unit 401 reduction.
Correspondingly, after output unit 304 also is used in the frequency threshold value of number of times over setting of described statistic unit 401 statistics, if the degree of confidence that local recognition result is corresponding still is less than the confidence interval lower limit after reduction, and do not receive yet described high in the clouds recognition result, to the user, return to recognition failures information.
In order to guarantee the accuracy rate of local recognition result of output, above-mentioned Fig. 3 and embodiment illustrated in fig. 4 in, described receiving element 303 also can be used for formerly receiving described local recognition result, and degree of confidence corresponding to described local recognition result is less than under the confidence interval of setting in limited time, abandon described local recognition result, continue to wait for that described high in the clouds identification engine returns to the high in the clouds recognition result; And, after surpassing in the stand-by period obstruction duration of setting, to the user, return to recognition failures information.Certainly, in actual applications, also can be by receiving element 303 by above-mentioned advisory output unit 304, and return to recognition failures information by output unit 304 to the user.
In addition, the unlatching of high in the clouds identification engine and local identification engine can have different modes, such as, in the various embodiments described above, described system also can comprise trigger element (not shown), after the speech recognition request receiving user's transmission, open high in the clouds identification engine and the local engine of identifying.Described speech recognition request can send when the user presses the speech recognition key, or provides the voice arousal function to the user, and on backstage, one direct-open recording sends when recognizing special key words.
Can adopt some conventional recognition methodss for this locality identification engine to the identification of special key words, such as, local identification engine reads the grammar file that predefined is good, this document has defined the set of the order word that speech recognition supports, and the set of same function order word all exists in dictionary, the efficiently access of local identification engine.Local identification engine generates a recognition network by grammar file, local identification engine extracts the characteristic information of input voice and mates in the enterprising walking along the street of recognition network footpath, final every user says any a word in this grammar file range of definition, all can, by system identification, thereby know, described special key words.
Certainly, high in the clouds identification engine and local identification engine are specifically adopted to which kind of speech recognition technology, and the embodiment of the present invention is not done restriction, especially to this locality identification engine, can need to select according to concrete application scenarios, can not affect the above-mentioned effect that the present invention can reach.
Visible by foregoing description, the speech recognition system of the embodiment of the present invention, meet network identification to local command when obstructed by simple local identification engine efficiently, in addition, the time delay that can reduce identification due to the choice strategy to high in the clouds and local recognition result, can dynamically adjust according to current network condition the degree of confidence thresholding of local identification, thereby guarantee to export in the shortest time delay best result.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and between each embodiment, identical similar part is mutually referring to getting final product, and each embodiment stresses is the difference with other embodiment.Especially, for system embodiment, due to it, substantially similar in appearance to embodiment of the method, so describe fairly simplely, relevant part gets final product referring to the part explanation of embodiment of the method.
It should be noted that, system embodiment described above is only schematic, the wherein said unit as the separating component explanation can or can not be also physically to separate, the parts that show as unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed on a plurality of network element.Can select according to the actual needs some or all of module wherein to realize the purpose of the present embodiment scheme.Those of ordinary skills in the situation that do not pay creative work, can understand and implement.
Above the embodiment of the present invention is described in detail, has applied embodiment herein the present invention is set forth, the explanation of above embodiment is just for helping to understand method and apparatus of the present invention; , for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention simultaneously.

Claims (11)

1. an audio recognition method, is characterized in that, comprising:
Obtain the voice messaging that the user sends;
Described voice messaging is sent to respectively to high in the clouds identification engine and local identification engine, so that described high in the clouds identification engine and local identification engine are identified described voice messaging respectively;
If first receive the high in the clouds recognition result that described high in the clouds identification engine returns, export described high in the clouds recognition result;
If first receive the local recognition result of described local identification engine, and degree of confidence corresponding to described local recognition result be greater than the confidence interval upper limit of setting, exports described local recognition result.
2. method according to claim 1, is characterized in that, described method also comprises:
If described degree of confidence, in described confidence interval, reduces the described confidence interval upper limit successively within the waiting time of setting;
If receive the high in the clouds recognition result that described high in the clouds identification engine returns within described waiting time, export described high in the clouds recognition result;
If receive the high in the clouds recognition result that described high in the clouds identification engine returns within described waiting time, and degree of confidence corresponding to described local recognition result be greater than the confidence interval upper limit after reduction, exports described local recognition result.
3. method according to claim 2, is characterized in that, each waiting time is identical or different.
4. method according to claim 2, is characterized in that, described method also comprises:
If after reducing the frequency threshold value of number of times over setting of the described confidence interval upper limit, the degree of confidence that described local recognition result is corresponding still is less than the confidence interval lower limit after reduction, and do not receive yet described high in the clouds recognition result, to the user, return to recognition failures information.
5. method according to claim 2, is characterized in that, described method also comprises:
If first receive described local recognition result, and degree of confidence corresponding to described local recognition result be less than the confidence interval lower limit of setting, abandons described local recognition result, continues to wait for that described high in the clouds identification engine returns to the high in the clouds recognition result;
If the stand-by period surpasses the obstruction duration of setting, to the user, return to recognition failures information.
6. according to the described method of claim 1 to 5 any one, it is characterized in that, described method also comprises:
After receiving the speech recognition request of user's transmission, open high in the clouds identification engine and the local engine of identifying.
7. a speech recognition system, is characterized in that, comprising:
The voice messaging acquiring unit, the voice messaging sent for obtaining the user;
Transmitting element, for described voice messaging being sent to respectively to high in the clouds identification engine and local identification engine, so that described high in the clouds identification engine and local identification engine are identified described voice messaging respectively;
Receiving element, the high in the clouds recognition result and the described local local recognition result of identifying engine that for receiving described high in the clouds identification engine, return;
Output unit, during for the high in the clouds recognition result that first receives at described receiving element that described high in the clouds identification engine returns, export described high in the clouds recognition result; First receive the local recognition result of described local identification engine at described receiving element, and degree of confidence corresponding to described local recognition result be greater than on the confidence interval of setting in limited time, export described local recognition result.
8. system according to claim 7, is characterized in that, described system also comprises:
The degree of confidence adjustment unit for when described degree of confidence is in described confidence interval, reduces the described confidence interval upper limit successively within the waiting time of setting;
Described output unit, during the high in the clouds recognition result that also for described receiving element within described waiting time, receives that described high in the clouds identification engine returns, export described high in the clouds recognition result; Within described waiting time, described receiving element does not receive the high in the clouds recognition result that described high in the clouds identification engine returns, and degree of confidence corresponding to described local recognition result be greater than on the confidence interval after reduction in limited time, exports described local recognition result.
9. system according to claim 8, is characterized in that, described system also comprises:
Statistic unit, reduce the number of times of the described confidence interval upper limit for adding up described degree of confidence adjustment unit;
Described output unit, also for after surpassing at described number of times the frequency threshold value of setting, if the degree of confidence that local recognition result is corresponding still is less than the confidence interval lower limit after reduction, and do not receive yet described high in the clouds recognition result, to the user, return to recognition failures information.
10. system according to claim 8, is characterized in that,
Described receiving element, also for formerly receiving described local recognition result, and degree of confidence corresponding to described local recognition result is less than under the confidence interval of setting in limited time, abandons described local recognition result, continues to wait for that described high in the clouds identification engine returns to the high in the clouds recognition result; And, after surpassing in the stand-by period obstruction duration of setting, to the user, return to recognition failures information.
11. according to the described system of claim 7 to 10 any one, it is characterized in that, described system also comprises:
Trigger element, after the speech recognition request receiving user's transmission, open high in the clouds identification engine and the local engine of identifying.
CN201310335050.0A 2013-08-02 2013-08-02 Audio recognition method and system Active CN103440867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310335050.0A CN103440867B (en) 2013-08-02 2013-08-02 Audio recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310335050.0A CN103440867B (en) 2013-08-02 2013-08-02 Audio recognition method and system

Publications (2)

Publication Number Publication Date
CN103440867A true CN103440867A (en) 2013-12-11
CN103440867B CN103440867B (en) 2016-08-10

Family

ID=49694558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310335050.0A Active CN103440867B (en) 2013-08-02 2013-08-02 Audio recognition method and system

Country Status (1)

Country Link
CN (1) CN103440867B (en)

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103730119A (en) * 2013-12-18 2014-04-16 惠州市车仆电子科技有限公司 Vehicle-mounted man-machine voice interaction system
CN104536978A (en) * 2014-12-05 2015-04-22 奇瑞汽车股份有限公司 Voice data identifying method and device
CN104681026A (en) * 2013-11-27 2015-06-03 夏普株式会社 Voice Recognition Terminal, Server, Method Of Controlling Server, Voice Recognition System,non-transitory Storage Medium
CN105118508A (en) * 2015-09-14 2015-12-02 百度在线网络技术(北京)有限公司 Voice recognition method and device
CN105261366A (en) * 2015-08-31 2016-01-20 努比亚技术有限公司 Voice identification method, voice engine and terminal
CN105551488A (en) * 2015-12-15 2016-05-04 深圳Tcl数字技术有限公司 Voice control method and system
CN105551494A (en) * 2015-12-11 2016-05-04 奇瑞汽车股份有限公司 Mobile phone interconnection-based vehicle-mounted speech recognition system and recognition method
CN105824857A (en) * 2015-01-08 2016-08-03 中兴通讯股份有限公司 Voice search method, device and terminal
CN105931639A (en) * 2016-05-31 2016-09-07 杨若冲 Speech interaction method capable of supporting multi-hierarchy command words
CN106228975A (en) * 2016-09-08 2016-12-14 康佳集团股份有限公司 The speech recognition system of a kind of mobile terminal and method
CN106328148A (en) * 2016-08-19 2017-01-11 上汽通用汽车有限公司 Natural speech recognition method, natural speech recognition device and natural speech recognition system based on local and cloud hybrid recognition
CN106384594A (en) * 2016-11-04 2017-02-08 湖南海翼电子商务股份有限公司 On-vehicle terminal for voice recognition and method thereof
CN106558313A (en) * 2016-11-16 2017-04-05 北京云知声信息技术有限公司 Audio recognition method and device
CN106782546A (en) * 2015-11-17 2017-05-31 深圳市北科瑞声科技有限公司 Audio recognition method and device
CN106847291A (en) * 2017-02-20 2017-06-13 成都启英泰伦科技有限公司 Speech recognition system and method that a kind of local and high in the clouds is combined
CN106910504A (en) * 2015-12-22 2017-06-30 北京君正集成电路股份有限公司 A kind of speech reminding method and device based on speech recognition
CN107464567A (en) * 2017-07-24 2017-12-12 深圳云知声信息技术有限公司 Audio recognition method and device
CN107564525A (en) * 2017-10-23 2018-01-09 深圳北鱼信息科技有限公司 Audio recognition method and device
CN107785019A (en) * 2017-10-26 2018-03-09 西安Tcl软件开发有限公司 Mobile unit and its audio recognition method, readable storage medium storing program for executing
CN108323234A (en) * 2017-12-27 2018-07-24 深圳达闼科技控股有限公司 A kind of detection method, detection device and server
CN108401440A (en) * 2017-08-21 2018-08-14 深圳前海达闼云端智能科技有限公司 A kind of substance detecting method and its device, detection terminal
CN108573706A (en) * 2017-03-10 2018-09-25 北京搜狗科技发展有限公司 A kind of audio recognition method, device and equipment
CN108847219A (en) * 2018-05-25 2018-11-20 四川斐讯全智信息技术有限公司 A kind of wake-up word presets confidence threshold value adjusting method and system
CN109869862A (en) * 2019-01-23 2019-06-11 四川虹美智能科技有限公司 The control method and a kind of air-conditioning system of a kind of air-conditioning, a kind of air-conditioning
CN110060668A (en) * 2018-02-02 2019-07-26 上海华镇电子科技有限公司 The system and method for identification delay is reduced in a kind of speech recognition controlled
CN110148416A (en) * 2019-04-23 2019-08-20 腾讯科技(深圳)有限公司 Audio recognition method, device, equipment and storage medium
CN110223683A (en) * 2019-05-05 2019-09-10 安徽省科普产品工程研究中心有限责任公司 Voice interactive method and system
CN110265018A (en) * 2019-07-01 2019-09-20 成都启英泰伦科技有限公司 A kind of iterated command word recognition method continuously issued
CN110299136A (en) * 2018-03-22 2019-10-01 上海擎感智能科技有限公司 A kind of processing method and its system for speech recognition
CN110706711A (en) * 2014-01-17 2020-01-17 微软技术许可有限责任公司 Merging of exogenous large vocabulary models into rule-based speech recognition
CN110970032A (en) * 2018-09-28 2020-04-07 深圳市冠旭电子股份有限公司 Sound box voice interaction control method and device
CN111091819A (en) * 2018-10-08 2020-05-01 蔚来汽车有限公司 Voice recognition device and method, voice interaction system and method
CN111261166A (en) * 2020-01-15 2020-06-09 云知声智能科技股份有限公司 Voice recognition method and device
WO2020135160A1 (en) * 2018-12-24 2020-07-02 深圳Tcl新技术有限公司 Terminal, method for determining speech servers, and computer-readable storage medium
CN111477225A (en) * 2020-03-26 2020-07-31 北京声智科技有限公司 Voice control method and device, electronic equipment and storage medium
CN112905247A (en) * 2021-01-25 2021-06-04 斑马网络技术有限公司 Method and device for automatically detecting and switching languages, terminal equipment and storage medium
CN112896048A (en) * 2021-03-15 2021-06-04 中电科创智联(武汉)有限责任公司 Vehicle-mounted all-around display system and method based on mobile phone interconnection and voice recognition
CN113053369A (en) * 2019-12-26 2021-06-29 青岛海尔空调器有限总公司 Voice control method and device of intelligent household appliance and intelligent household appliance
CN113380253A (en) * 2021-06-21 2021-09-10 紫优科技(深圳)有限公司 Voice recognition system, device and medium based on cloud computing and edge computing
CN113380254A (en) * 2021-06-21 2021-09-10 紫优科技(深圳)有限公司 Voice recognition method, device and medium based on cloud computing and edge computing
CN114446279A (en) * 2022-02-18 2022-05-06 青岛海尔科技有限公司 Voice recognition method, voice recognition device, storage medium and electronic equipment
CN114550719A (en) * 2022-02-21 2022-05-27 青岛海尔科技有限公司 Method and device for recognizing voice control instruction and storage medium
CN115410579A (en) * 2022-10-28 2022-11-29 广州小鹏汽车科技有限公司 Voice interaction method, voice interaction device, vehicle and readable storage medium
US11817101B2 (en) 2013-09-19 2023-11-14 Microsoft Technology Licensing, Llc Speech recognition using phoneme matching
CN113380254B (en) * 2021-06-21 2024-05-24 枣庄福缘网络科技有限公司 Voice recognition method, device and medium based on cloud computing and edge computing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1181684B1 (en) * 1999-03-26 2004-11-03 Scansoft, Inc. Client-server speech recognition
CN102496364A (en) * 2011-11-30 2012-06-13 苏州奇可思信息科技有限公司 Interactive speech recognition method based on cloud network
CN102708865A (en) * 2012-04-25 2012-10-03 北京车音网科技有限公司 Method, device and system for voice recognition
CN103137129A (en) * 2011-12-02 2013-06-05 联发科技股份有限公司 Voice recognition method and electronic device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1181684B1 (en) * 1999-03-26 2004-11-03 Scansoft, Inc. Client-server speech recognition
CN102496364A (en) * 2011-11-30 2012-06-13 苏州奇可思信息科技有限公司 Interactive speech recognition method based on cloud network
CN103137129A (en) * 2011-12-02 2013-06-05 联发科技股份有限公司 Voice recognition method and electronic device
CN102708865A (en) * 2012-04-25 2012-10-03 北京车音网科技有限公司 Method, device and system for voice recognition

Cited By (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11817101B2 (en) 2013-09-19 2023-11-14 Microsoft Technology Licensing, Llc Speech recognition using phoneme matching
CN104681026A (en) * 2013-11-27 2015-06-03 夏普株式会社 Voice Recognition Terminal, Server, Method Of Controlling Server, Voice Recognition System,non-transitory Storage Medium
CN103730119A (en) * 2013-12-18 2014-04-16 惠州市车仆电子科技有限公司 Vehicle-mounted man-machine voice interaction system
CN110706711A (en) * 2014-01-17 2020-01-17 微软技术许可有限责任公司 Merging of exogenous large vocabulary models into rule-based speech recognition
CN110706711B (en) * 2014-01-17 2023-11-28 微软技术许可有限责任公司 Merging exogenous large vocabulary models into rule-based speech recognition
CN104536978A (en) * 2014-12-05 2015-04-22 奇瑞汽车股份有限公司 Voice data identifying method and device
CN105824857A (en) * 2015-01-08 2016-08-03 中兴通讯股份有限公司 Voice search method, device and terminal
CN105261366A (en) * 2015-08-31 2016-01-20 努比亚技术有限公司 Voice identification method, voice engine and terminal
CN105118508A (en) * 2015-09-14 2015-12-02 百度在线网络技术(北京)有限公司 Voice recognition method and device
CN105118508B (en) * 2015-09-14 2018-10-23 百度在线网络技术(北京)有限公司 Audio recognition method and device
CN106782546A (en) * 2015-11-17 2017-05-31 深圳市北科瑞声科技有限公司 Audio recognition method and device
CN105551494A (en) * 2015-12-11 2016-05-04 奇瑞汽车股份有限公司 Mobile phone interconnection-based vehicle-mounted speech recognition system and recognition method
CN105551488A (en) * 2015-12-15 2016-05-04 深圳Tcl数字技术有限公司 Voice control method and system
CN106910504A (en) * 2015-12-22 2017-06-30 北京君正集成电路股份有限公司 A kind of speech reminding method and device based on speech recognition
CN105931639B (en) * 2016-05-31 2019-09-10 杨若冲 A kind of voice interactive method for supporting multistage order word
CN105931639A (en) * 2016-05-31 2016-09-07 杨若冲 Speech interaction method capable of supporting multi-hierarchy command words
CN106328148A (en) * 2016-08-19 2017-01-11 上汽通用汽车有限公司 Natural speech recognition method, natural speech recognition device and natural speech recognition system based on local and cloud hybrid recognition
CN106228975A (en) * 2016-09-08 2016-12-14 康佳集团股份有限公司 The speech recognition system of a kind of mobile terminal and method
CN106384594A (en) * 2016-11-04 2017-02-08 湖南海翼电子商务股份有限公司 On-vehicle terminal for voice recognition and method thereof
CN106558313A (en) * 2016-11-16 2017-04-05 北京云知声信息技术有限公司 Audio recognition method and device
CN106847291A (en) * 2017-02-20 2017-06-13 成都启英泰伦科技有限公司 Speech recognition system and method that a kind of local and high in the clouds is combined
CN108573706A (en) * 2017-03-10 2018-09-25 北京搜狗科技发展有限公司 A kind of audio recognition method, device and equipment
CN107464567A (en) * 2017-07-24 2017-12-12 深圳云知声信息技术有限公司 Audio recognition method and device
WO2019036849A1 (en) * 2017-08-21 2019-02-28 深圳前海达闼云端智能科技有限公司 Substance detection method, device thereof, and detection terminal
CN108401440A (en) * 2017-08-21 2018-08-14 深圳前海达闼云端智能科技有限公司 A kind of substance detecting method and its device, detection terminal
CN107564525A (en) * 2017-10-23 2018-01-09 深圳北鱼信息科技有限公司 Audio recognition method and device
CN107785019A (en) * 2017-10-26 2018-03-09 西安Tcl软件开发有限公司 Mobile unit and its audio recognition method, readable storage medium storing program for executing
CN108323234A (en) * 2017-12-27 2018-07-24 深圳达闼科技控股有限公司 A kind of detection method, detection device and server
CN110060668A (en) * 2018-02-02 2019-07-26 上海华镇电子科技有限公司 The system and method for identification delay is reduced in a kind of speech recognition controlled
CN110299136A (en) * 2018-03-22 2019-10-01 上海擎感智能科技有限公司 A kind of processing method and its system for speech recognition
CN108847219A (en) * 2018-05-25 2018-11-20 四川斐讯全智信息技术有限公司 A kind of wake-up word presets confidence threshold value adjusting method and system
CN108847219B (en) * 2018-05-25 2020-12-25 台州智奥通信设备有限公司 Awakening word preset confidence threshold adjusting method and system
CN110970032A (en) * 2018-09-28 2020-04-07 深圳市冠旭电子股份有限公司 Sound box voice interaction control method and device
CN111091819A (en) * 2018-10-08 2020-05-01 蔚来汽车有限公司 Voice recognition device and method, voice interaction system and method
WO2020135160A1 (en) * 2018-12-24 2020-07-02 深圳Tcl新技术有限公司 Terminal, method for determining speech servers, and computer-readable storage medium
CN109869862A (en) * 2019-01-23 2019-06-11 四川虹美智能科技有限公司 The control method and a kind of air-conditioning system of a kind of air-conditioning, a kind of air-conditioning
CN110148416B (en) * 2019-04-23 2024-03-15 腾讯科技(深圳)有限公司 Speech recognition method, device, equipment and storage medium
CN110148416A (en) * 2019-04-23 2019-08-20 腾讯科技(深圳)有限公司 Audio recognition method, device, equipment and storage medium
CN110223683A (en) * 2019-05-05 2019-09-10 安徽省科普产品工程研究中心有限责任公司 Voice interactive method and system
CN110265018A (en) * 2019-07-01 2019-09-20 成都启英泰伦科技有限公司 A kind of iterated command word recognition method continuously issued
CN110265018B (en) * 2019-07-01 2022-03-04 成都启英泰伦科技有限公司 Method for recognizing continuously-sent repeated command words
CN113053369A (en) * 2019-12-26 2021-06-29 青岛海尔空调器有限总公司 Voice control method and device of intelligent household appliance and intelligent household appliance
CN111261166A (en) * 2020-01-15 2020-06-09 云知声智能科技股份有限公司 Voice recognition method and device
CN111477225A (en) * 2020-03-26 2020-07-31 北京声智科技有限公司 Voice control method and device, electronic equipment and storage medium
CN111477225B (en) * 2020-03-26 2021-04-30 北京声智科技有限公司 Voice control method and device, electronic equipment and storage medium
CN112905247A (en) * 2021-01-25 2021-06-04 斑马网络技术有限公司 Method and device for automatically detecting and switching languages, terminal equipment and storage medium
CN112896048A (en) * 2021-03-15 2021-06-04 中电科创智联(武汉)有限责任公司 Vehicle-mounted all-around display system and method based on mobile phone interconnection and voice recognition
CN113380254A (en) * 2021-06-21 2021-09-10 紫优科技(深圳)有限公司 Voice recognition method, device and medium based on cloud computing and edge computing
CN113380253A (en) * 2021-06-21 2021-09-10 紫优科技(深圳)有限公司 Voice recognition system, device and medium based on cloud computing and edge computing
CN113380254B (en) * 2021-06-21 2024-05-24 枣庄福缘网络科技有限公司 Voice recognition method, device and medium based on cloud computing and edge computing
CN114446279A (en) * 2022-02-18 2022-05-06 青岛海尔科技有限公司 Voice recognition method, voice recognition device, storage medium and electronic equipment
CN114550719A (en) * 2022-02-21 2022-05-27 青岛海尔科技有限公司 Method and device for recognizing voice control instruction and storage medium
CN115410579A (en) * 2022-10-28 2022-11-29 广州小鹏汽车科技有限公司 Voice interaction method, voice interaction device, vehicle and readable storage medium
WO2024088085A1 (en) * 2022-10-28 2024-05-02 广州小鹏汽车科技有限公司 Speech interaction method, speech interaction apparatus, vehicle and readable storage medium

Also Published As

Publication number Publication date
CN103440867B (en) 2016-08-10

Similar Documents

Publication Publication Date Title
CN103440867A (en) Method and system for recognizing voice
AU2019246868B2 (en) Method and system for voice activation
CN112272819B (en) Method and system for passively waking up user interaction device
CN111566730B (en) Voice command processing in low power devices
CN110459221B (en) Method and device for multi-device cooperative voice interaction
US20180204569A1 (en) Voice Assistant Tracking And Activation
CN107731231B (en) Method for supporting multi-cloud-end voice service and storage device
CN104756473A (en) Handling concurrent speech
CN110709931B (en) System and method for audio pattern recognition
CN104104790A (en) Voice control method and mobile terminal device
WO2014176894A1 (en) Voice processing method and terminal
CN105975063B (en) A kind of method and apparatus controlling intelligent terminal
CN105280180A (en) Terminal control method, device, voice control device and terminal
WO2013179555A1 (en) Short-range wireless communication device
CN104049727A (en) Mutual control method for mobile terminal and vehicle-mounted terminal
CN111179930A (en) Method and system for realizing intelligent voice interaction in driving process
CN108271096A (en) A kind of task executing method, device, intelligent sound box and storage medium
CN112634872A (en) Voice equipment awakening method and device
US20220284888A1 (en) Method and apparatus for in-vehicle call, device, medium and product
CN111524512A (en) Method for starting one-shot voice conversation with low delay, peripheral equipment and voice interaction device with low delay response
CN112860431B (en) Connection method, system, equipment and storage medium of micro service node
CN111128166B (en) Optimization method and device for continuous awakening recognition function
CN113271385A (en) Call forwarding method
CN106445456A (en) Transmission method and transmission device for navigating TTS (text to speech) audio data
CN104092703A (en) CTI signal processing system and method based on TSP services

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant after: Iflytek Co., Ltd.

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant before: Anhui USTC iFLYTEK Co., Ltd.

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant