CN103440867B - Audio recognition method and system - Google Patents

Audio recognition method and system Download PDF

Info

Publication number
CN103440867B
CN103440867B CN201310335050.0A CN201310335050A CN103440867B CN 103440867 B CN103440867 B CN 103440867B CN 201310335050 A CN201310335050 A CN 201310335050A CN 103440867 B CN103440867 B CN 103440867B
Authority
CN
China
Prior art keywords
clouds
recognition result
engine
local
identifies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310335050.0A
Other languages
Chinese (zh)
Other versions
CN103440867A (en
Inventor
朱国正
任严佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201310335050.0A priority Critical patent/CN103440867B/en
Publication of CN103440867A publication Critical patent/CN103440867A/en
Application granted granted Critical
Publication of CN103440867B publication Critical patent/CN103440867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of audio recognition method and system, the method includes: obtain the voice messaging that user sends;Described voice messaging is sent respectively to high in the clouds and identifies that engine and this locality identify engine, so that described high in the clouds identifies that described voice messaging is identified by engine and local identification engine respectively;If first receiving the high in the clouds recognition result that described high in the clouds identifies that engine returns, then export described high in the clouds recognition result;If first receiving the described local local recognition result identifying engine, and confidence level corresponding to described local recognition result is more than the confidence interval upper limit set, then export described local recognition result.Utilize the present invention, can be bad at network or also be able to provide the user reliable voice identification result in the case of there is no network.

Description

Audio recognition method and system
Technical field
The present invention relates to technical field of voice recognition, be specifically related to a kind of audio recognition method and system.
Background technology
Growing along with Computer Science and Technology, speech recognition technology is the most ripe.And be widely used in Mobile phone, TV, the field such as vehicle-mounted.As a example by vehicle-mounted, owing to people can not operate interface with hands easily when driving so that voice Identify as a kind of interactive mode the most easily, make vehicle-mounted to provide more function.In prior art, speech recognition Pattern is usually: receive the voice messaging of user, sets up with high in the clouds speech recognition server and is connected, and sends voice messaging to servicing Device, is identified this information by server, returns again to recognition result to client.But not necessarily have stable in mobile device Network connects, and high in the clouds returns and may experience bigger delay in this case, reduces Consumer's Experience, even without network, leads Cause high in the clouds identification can not use.
Summary of the invention
The present invention provides a kind of audio recognition method and system, can be bad at network or also can in the case of not having network Enough provide the user reliable voice identification result.
To this end, the present invention provides following technical scheme:
A kind of audio recognition method, including:
Obtain the voice messaging that user sends;
Described voice messaging is sent respectively to high in the clouds and identifies engine and local identification engine, draw so that described high in the clouds identifies Hold up and described voice messaging is identified by local identification engine respectively;
If first receiving the high in the clouds recognition result that described high in the clouds identifies that engine returns, then export described high in the clouds and identify knot Really;
If first receiving the described local local recognition result identifying engine, and described local recognition result being corresponding Confidence level more than the confidence interval upper limit set, then exports described local recognition result.
Preferably, described method also includes:
If described confidence level is in described confidence interval, within the waiting time set, reduce described confidence the most successively The interval upper limit of degree;
If receiving the high in the clouds recognition result that described high in the clouds identifies that engine returns within described waiting time, then export institute State high in the clouds recognition result;
If do not receive the high in the clouds recognition result that described high in the clouds identifies that engine returns within described waiting time, and institute The confidence level stating local recognition result corresponding is more than the confidence interval upper limit after reducing, then export described local recognition result.
Preferably, each waiting time is identical or different.
Preferably, described method also includes:
If after the number of times reducing the described confidence interval upper limit exceedes the frequency threshold value of setting, described local recognition result Corresponding confidence level is still less than the confidence interval lower limit after reducing, and does not receives described high in the clouds recognition result yet, then to User returns recognition failures information.
Preferably, described method also includes:
If first receiving described local recognition result, and confidence level corresponding to described local recognition result is less than setting Confidence interval lower limit, then abandon described local recognition result, continue waiting for described high in the clouds and identify that engine returns to high in the clouds and identifies Result;
If the waiting time exceedes the obstruction duration of setting, then return recognition failures information to user.
Preferably, described method also includes:
After receiving the speech recognition request that user sends, open high in the clouds and identify that engine and this locality identify engine.
A kind of speech recognition system, including:
Voice messaging acquiring unit, for obtaining the voice messaging that user sends;
Transmitting element, identifies engine and local identification engine for described voice messaging is sent respectively to high in the clouds, so that Described high in the clouds identifies that described voice messaging is identified by engine and local identification engine respectively;
Receive unit, identify that the high in the clouds recognition result of engine return and described local identification are drawn for receiving described high in the clouds The local recognition result held up;
Output unit, for first receiving, at described reception unit, the high in the clouds recognition result that described high in the clouds identifies that engine returns Time, export described high in the clouds recognition result;The described local local recognition result identifying engine is first received at described reception unit, And when the confidence level that described local recognition result is corresponding is more than the confidence interval upper limit set, export described local identification knot Really.
Preferably, described system also includes:
Confidence level adjustment unit, for when described confidence level is in described confidence interval, successively in the wait set The described confidence interval upper limit is reduced in duration;
Described output unit, is additionally operable to described reception unit within described waiting time and receives described high in the clouds identification engine During the high in the clouds recognition result returned, export described high in the clouds recognition result;Within described waiting time, described reception unit does not receives Identify the high in the clouds recognition result that engine returns to described high in the clouds, and confidence level corresponding to described local recognition result is more than reducing After the confidence interval upper limit time, export described local recognition result.
Preferably, described system also includes:
Statistic unit, reduces the number of times of the described confidence interval upper limit for adding up described confidence level adjustment unit;
Described output unit, is additionally operable to after described number of times exceedes the frequency threshold value of setting, if local recognition result pair The confidence level answered, still less than the confidence interval lower limit after reducing, and does not receives described high in the clouds recognition result yet, then to Family returns recognition failures information.
Preferably, described reception unit, it is additionally operable to formerly receive described local recognition result, and described local identification When confidence level corresponding to result is less than the confidence interval lower limit set, abandons described local recognition result, continue waiting for described High in the clouds identifies that engine returns high in the clouds recognition result;And after the waiting time exceedes the obstruction duration of setting, return identification to user Failure information.
Preferably, described system also includes:
Trigger element, for after receiving the speech recognition request that user sends, opens high in the clouds and identifies engine and this locality Identify engine.
The audio recognition method of embodiment of the present invention offer and system, identify this locality and combine with high in the clouds identification, connecing After receiving the voice messaging that user sends, described voice messaging is sent respectively to high in the clouds and identifies that engine and local identification engine enter Row identifies.And when formerly receiving the high in the clouds recognition result that high in the clouds identifies engine return, directly output high in the clouds recognition result.As Fruit first receives the local local recognition result identifying engine, and confidence level corresponding to local recognition result is more than the confidence set During the interval upper limit of degree, then the local recognition result of output.And adhere to that high in the clouds recognition result is better than local recognition result, if high in the clouds Identification can return result before this locality identifies and provides a relatively accurate recognition result, then use high in the clouds recognition result.From And can complete when there is no network insertion to utilize local identification engine to complete the local function without network, as made a phone call, Send short messages, listen music etc..
Further, if the confidence level of the local recognition result first received is relatively low, in the confidence interval arranged, Then by constantly reducing the confidence level thresholding that this locality identifies, until having a qualified output or recognition failures.
This locality identification is combined by the scheme provided due to the embodiment of the present invention with high in the clouds identification, it is ensured that at network not Well or provide reliable voice identification result as much as possible in the case of there is no network.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to institute in embodiment The accompanying drawing used is needed to be briefly described, it should be apparent that, the accompanying drawing in describing below is only described in the present invention A little embodiments, for those of ordinary skill in the art, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is a kind of flow chart of embodiment of the present invention audio recognition method;
Fig. 2 is the another kind of flow chart of embodiment of the present invention audio recognition method;
Fig. 3 is a kind of structural representation of embodiment of the present invention speech recognition system;
Fig. 4 is the another kind of structural representation of embodiment of the present invention speech recognition system.
Detailed description of the invention
In order to make those skilled in the art be more fully understood that the scheme of the embodiment of the present invention, below in conjunction with the accompanying drawings and implement The embodiment of the present invention is described in further detail by mode.
The embodiment of the present invention provides a kind of audio recognition method and system, identifies in conjunction with high in the clouds and this locality identifies, Ke Yi Do not have during network insertion to complete to utilize local identification engine to complete the local function without network, as made a phone call, send short messages, listening Music etc..Can also be according to the requirement dynamically reduced the time delay that network connects local engine results.
As it is shown in figure 1, be a kind of flow chart of embodiment of the present invention audio recognition method, comprise the following steps:
Step 101, obtains the voice messaging that user sends.
Step 102, is sent respectively to described voice messaging high in the clouds and identifies engine and local identification engine, so that described cloud End identifies that described voice messaging is identified by engine and local identification engine respectively.
Specifically, the voice messaging that can send with recording module record user.The voice messaging recorded can be straight Sending and receiving are given high in the clouds and are identified engine and local identification engine;First can also filter out effective information start-stop with voice detection module Point, is then then forwarded to high in the clouds and identifies engine and local identification engine.
Step 103, if first receiving the high in the clouds recognition result that described high in the clouds identifies that engine returns, then exports described high in the clouds Recognition result.
Because the server identification engine performance in high in the clouds is powerful, recognition result has higher confidence level, is therefore preferentially connecing After receiving high in the clouds recognition result, can directly export this recognition result.
Step 104, if first receiving the described local local recognition result identifying engine, and described local identification is tied The confidence level that fruit is corresponding is more than the confidence interval upper limit set, then export described local recognition result.
Owing to, in the case of network environment is bad, the recognition result in high in the clouds may have sizable delay.Now, obtain Local recognition result corresponding to this voice messaging and the confidence value of this result, if what this confidence value was arranged more than system Confidence level thresholding, illustrates that this recognition result is completely available, therefore the local recognition result of output, it is not necessary to wait that high in the clouds identifies again Result.
Visible, that the embodiment of the present invention provides audio recognition method, identifies this locality and combines, according to cloud with high in the clouds identification Priority and the confidence level of the preferential local recognition result returned that end recognition result and local recognition result return determine choosing Knowledge add result.And adhere to that the result in high in the clouds is better than this locality all the time, if high in the clouds identifies can identify one phase of offer in this locality To returning result before identifying accurately, just use the result in high in the clouds.
In order to solve further network delay or network unavailable in the case of also be able to the language that obtains that there is certain accuracy rate Sound recognition result, another embodiment of audio recognition method of the present invention can also dynamically adjust local knowledge according to current network condition Other confidence level thresholding, in the result that the shortest output time delay is best.
As in figure 2 it is shown, be the another kind of flow chart of embodiment of the present invention audio recognition method, comprise the following steps:
Step 201 in Fig. 2 is identical with the step 101 in Fig. 1 to step 103 to step 203, does not repeats them here.
Step 204, if first receiving the local local recognition result identifying engine, then obtains local recognition result corresponding Confidence level.
It addition, in step 204, the confidence level according to local recognition result is needed to determine follow-up process operation, it is ensured that Best result is exported within the shortest time delay.Specifically, if confidence level is less than the confidence interval lower limit set, then Perform step 205;If confidence level is in the confidence interval set, then perform step 208;If confidence level is more than setting The confidence interval upper limit, then perform step 213.
Step 205, abandons local recognition result, continues waiting for high in the clouds and identifies that engine returns high in the clouds recognition result.
Step 206, it is judged that whether the waiting time exceedes the obstruction duration of setting;If it is, perform step 207;Otherwise Continue waiting for.
Step 207, returns recognition failures information to user.
Step 208, reduces the described confidence interval upper limit successively within the waiting time set.
Step 209, it is judged that whether receive high in the clouds recognition result within described waiting time.If it is, execution step 210;Otherwise, step 211 is performed.
Step 210, output high in the clouds recognition result.
Step 211, it is judged that whether the confidence level that local recognition result is corresponding is more than on the confidence interval after current reduction Limit.If it is, perform step 213;Otherwise, step 212 is performed.
Step 212, it is judged that reduce the number of times of the described confidence interval upper limit and whether exceed the frequency threshold value of setting and (such as may be used Be frequency threshold value can be 1 to 3 etc.).If it is, perform step 207;Otherwise, step 208 is returned.
Step 213, the local recognition result of output.
It should be noted that the waiting time mentioned in above-mentioned steps 208 is between the time reducing the confidence interval upper limit Every, can be such as the 2-5 second etc., and the time interval every time reducing the confidence interval upper limit can be identical, it is also possible to be different. And the waiting time mentioned in above-mentioned steps 206 is two different concepts from above-mentioned waiting time, the described waiting time refers to Waiting the time receiving high in the clouds recognition result, its starting point can be described voice messaging to be sent respectively to high in the clouds identify engine Identify that engine starts timing with this locality, it is also possible to be to start timing, to this embodiment of the present invention after abandoning local recognition result Do not limit.
It addition, in actual applications, do not receive in the certain time after every time reducing the described confidence interval upper limit High in the clouds recognition result, and in the case of confidence level corresponding to local recognition result can not meet requirement, it is also possible to do not go to judge Whether the number of times reducing the described confidence interval upper limit exceedes the frequency threshold value of setting, but judges whether the time waited exceedes The waiting time limited, if it does, then return recognition failures information to user, to prevent the waiting time long, affect user Experience.
Owing to high in the clouds has the speech data comparison of powerful server handling ability and magnanimity, recognition result confidence level Height, and local identification is without network support, has the highest recognition speed and the widest scope of application, more especially suitable nothings are stable In the mobile device that network connects.Therefore, this locality is identified and ties mutually with high in the clouds identification by the audio recognition method of the embodiment of the present invention Close, take into account both respective advantages, after getting the voice messaging that user sends, be sent simultaneously to high in the clouds and identify engine Identify that engine is identified with this locality.Can return before this locality identifies and provides a relatively accurate identification if high in the clouds identifies As a result, then high in the clouds recognition result is used.Otherwise, constantly reduce the confidence level thresholding that this locality identifies, until have one qualified Output or recognition failures, therefore can ensure that bad at network or provide reliable voice as far as possible in the case of not having network Recognition result.
The audio recognition method of the embodiment of the present invention, by simple efficient local identify engine meet network obstructed time Identification to local command, during further, since accept or reject, to high in the clouds and local recognition result, the delay that strategy can reduce identification Between, the confidence level thresholding of local identification can be dynamically adjusted according to current network condition, thus ensure in the shortest delay The result that time output is best.
In addition, it is necessary to explanation, in actual applications, can receive user send speech recognition request after, Open high in the clouds and identify engine and local identification engine.Such as, described speech recognition request can press speech recognition key user Time send, or provide a user with voice arousal function, on backstage, always on recording, sends out when recognizing special key words Send.
This locality is identified that engine can use the recognition methods of some routines to the identification of special key words, such as, originally Ground identifies that engine reads the grammar file that predefined is good, That file defines the set of the order word that speech recognition is supported, And the set of identical action command word all exists in dictionary, local identify that engine can efficiently access.Local identification engine passes through Grammar file generates one and identifies network, and the local characteristic information identifying engine extraction input voice is also carried out on network identifying Route matching, final every user says any a word as defined in the range of this grammar file, all can be recognized by the system, Thus know and described special key words.
Certainly, high in the clouds identifying, which kind of speech recognition technology engine and local identification engine specifically use, the present invention implements Example does not limits, and especially this locality is identified engine, can need to select, all without affecting this according to concrete application scenarios The bright the effect above that can reach.
Correspondingly, the embodiment of the present invention also provides for a kind of speech recognition system, as it is shown on figure 3, be a kind of knot of this system Structure schematic diagram.
In this embodiment, described system includes:
Voice messaging acquiring unit 301, for obtaining the voice messaging that user sends.
Transmitting element 302, identifies engine and local identification engine for described voice messaging is sent respectively to high in the clouds, with Described voice messaging is identified by engine and local identification engine respectively to make described high in the clouds identify.
Receive unit 303, identify that the high in the clouds recognition result of engine return and described this locality are known for receiving described high in the clouds The local recognition result of other engine.
Output unit 304, for first receiving, at reception unit 303, the high in the clouds identification knot that described high in the clouds identifies that engine returns Time really, export described high in the clouds recognition result;The described local local identification knot identifying engine is first received receiving unit 303 Really, and confidence level corresponding to described local recognition result more than the confidence interval upper limit set time, export and described local know Other result.
The speech recognition system that the embodiment of the present invention provides, identifies this locality and combines with high in the clouds identification, know according to high in the clouds Priority and the confidence level of the preferential local recognition result returned that other result returns with local recognition result determine selection Know and add result.And adhere to that the result in high in the clouds is better than this locality all the time, if high in the clouds identifies can identify in this locality that providing one aligns Return result before true identification, just use the result in high in the clouds.
In order to solve further network delay or network unavailable in the case of also be able to the language that obtains that there is certain accuracy rate Sound recognition result, another embodiment of speech recognition system of the present invention can also dynamically adjust local knowledge according to current network condition Other confidence level thresholding, in the result that the shortest output time delay is best.
As shown in Figure 4, it is the structural representation of another embodiment of speech recognition system of the present invention.
Unlike embodiment illustrated in fig. 3, in this embodiment, described system also includes:
Confidence level adjustment unit 401, for when described confidence level is in described confidence interval, successively set etc. The described confidence interval upper limit is reduced in treating duration.
Correspondingly, in this embodiment, described output unit 304 is additionally operable within described waiting time receive unit 303 When receiving the high in the clouds recognition result that described high in the clouds identifies engine return, export described high in the clouds recognition result;When described wait In long, reception unit 303 does not receives the high in the clouds recognition result that described high in the clouds identifies that engine returns, and described local identification is tied When the confidence level that fruit is corresponding is more than the confidence interval upper limit after reducing, export described local recognition result.
It addition, in order to prevent from waiting the overlong time of recognition result output, affect Consumer's Experience, as shown in Figure 4, this system Also can farther include: statistic unit 402, be used for adding up described confidence level adjustment unit 401 and reduce described confidence interval The number of times of limit.
Correspondingly, the number of times that output unit 304 can be additionally used in described statistic unit 401 statistics exceedes the number of times threshold of setting After value, if confidence level corresponding to local recognition result is still less than the confidence interval lower limit after reducing, and receive not yet Described high in the clouds recognition result, then return recognition failures information to user.
In order to ensure the accuracy rate of the local recognition result of output, in above-mentioned Fig. 3 and embodiment illustrated in fig. 4, described Reception unit 303 can be additionally used in and formerly receives described local recognition result, and the confidence that described local recognition result is corresponding When degree is less than the confidence interval lower limit set, abandon described local recognition result, continue waiting for described high in the clouds and identify that engine returns Return high in the clouds recognition result;And after the waiting time exceedes the obstruction duration of setting, return recognition failures information to user.Certainly, In actual applications, it is also possible to by receive unit 303 by above-mentioned situation notify output unit 304, and by output unit 304 to Family returns recognition failures information.
It addition, high in the clouds identify engine and the local unlatching identifying engine can by have different in the way of, such as, in above-mentioned each reality Executing in example, described system may also include trigger element (not shown), is used for after receiving the speech recognition request that user sends, Open high in the clouds and identify engine and local identification engine.Described speech recognition request can be sent out when user presses speech recognition key Send, or provide a user with voice arousal function, the always on recording on backstage, send when recognizing special key words.
This locality is identified that engine can use the recognition methods of some routines to the identification of special key words, such as, originally Ground identifies that engine reads the grammar file that predefined is good, That file defines the set of the order word that speech recognition is supported, And the set of identical action command word all exists in dictionary, local identify that engine can efficiently access.Local identification engine passes through Grammar file generates one and identifies network, and the local characteristic information identifying engine extraction input voice is also carried out on network identifying Route matching, final every user says any a word as defined in the range of this grammar file, all can be recognized by the system, Thus know and described special key words.
Certainly, high in the clouds identifying, which kind of speech recognition technology engine and local identification engine specifically use, the present invention implements Example does not limits, and especially this locality is identified engine, can need to select, all without affecting this according to concrete application scenarios The bright the effect above that can reach.
Visible by foregoing description, the speech recognition system of the embodiment of the present invention, drawn by simple efficient local identification Hold up meet network obstructed time identification to local command, further, since can to the choice strategy in high in the clouds and local recognition result To reduce the time delay identified, the confidence level thresholding of local identification can be dynamically adjusted according to current network condition, from And ensure in the result that the shortest output time delay is best.
Each embodiment in this specification all uses the mode gone forward one by one to describe, identical similar portion between each embodiment Dividing and see mutually, what each embodiment stressed is the difference with other embodiments.Real especially for system For executing example, owing to it is substantially similar to embodiment of the method, so describing fairly simple, relevant part sees embodiment of the method Part illustrate.
It should be noted that system embodiment described above is only schematically, wherein said as separated part The unit of part explanation can be or may not be physically separate, and the parts shown as unit can be or also may be used Not to be physical location, i.e. may be located at a place, or can also be distributed on multiple NE.Can be according to reality Need select some or all of module therein to realize the purpose of the present embodiment scheme.Those of ordinary skill in the art exist In the case of not paying creative work, i.e. it is appreciated that and implements.
Being described in detail the embodiment of the present invention above, the present invention is carried out by detailed description of the invention used herein Illustrating, the explanation of above example is only intended to help to understand the method and apparatus of the present invention;Simultaneously for this area one As technical staff, according to the thought of the present invention, the most all will change, to sum up institute Stating, this specification content should not be construed as limitation of the present invention.

Claims (9)

1. an audio recognition method, it is characterised in that including:
Obtain the voice messaging that user sends;
Described voice messaging is sent respectively to high in the clouds identify engine and local identify engine so that described high in the clouds identify engine and Described voice messaging is identified by local identification engine respectively;
If first receiving the high in the clouds recognition result that described high in the clouds identifies that engine returns, then export described high in the clouds recognition result;
If first receive the described local local recognition result identifying engine, and the confidence that described local recognition result is corresponding Degree more than the confidence interval upper limit set, then exports described local recognition result;
If described confidence level is in described confidence interval, within the waiting time set, reduce described confidence level district the most successively Between the upper limit;
If receiving the high in the clouds recognition result that described high in the clouds identifies that engine returns within described waiting time, then export described cloud End recognition result;
If within described waiting time, do not receive the high in the clouds recognition result that described high in the clouds identifies that engine returns, and described The confidence level that ground recognition result is corresponding is more than the confidence interval upper limit after reducing, then export described local recognition result.
Method the most according to claim 1, it is characterised in that each waiting time is identical or different.
Method the most according to claim 1, it is characterised in that described method also includes:
If after the number of times reducing the described confidence interval upper limit exceedes the frequency threshold value of setting, described local recognition result is corresponding Confidence level still less than the confidence interval lower limit after reducing, and do not receive described high in the clouds recognition result yet, then to user Return recognition failures information.
Method the most according to claim 1, it is characterised in that described method also includes:
If first receive described local recognition result, and confidence level the putting less than setting that described local recognition result is corresponding Confidence interval lower limit, then abandon described local recognition result, continues waiting for described high in the clouds and identifies that engine returns high in the clouds recognition result;
If the waiting time exceedes the obstruction duration of setting, then return recognition failures information to user.
5. according to the method described in any one of Claims 1-4, it is characterised in that described method also includes:
After receiving the speech recognition request that user sends, open high in the clouds and identify that engine and this locality identify engine.
6. a speech recognition system, it is characterised in that including:
Voice messaging acquiring unit, for obtaining the voice messaging that user sends;
Transmitting element, identifies engine and local identification engine for described voice messaging is sent respectively to high in the clouds, so that described High in the clouds identifies that described voice messaging is identified by engine and local identification engine respectively;
Receive unit, identify that the high in the clouds recognition result of engine return and described this locality identify engine for receiving described high in the clouds Local recognition result;
Output unit, during for first receiving the high in the clouds recognition result of described high in the clouds identification engine return at described reception unit, Export described high in the clouds recognition result;The described local local recognition result identifying engine is first received at described reception unit, and And confidence level corresponding to described local recognition result more than the confidence interval upper limit set time, export and described local identify knot Really;
Confidence level adjustment unit, for when described confidence level is in described confidence interval, successively in the waiting time set The interior reduction described confidence interval upper limit;
Described output unit, is additionally operable to described reception unit within described waiting time and receives described high in the clouds identification engine return High in the clouds recognition result time, export described high in the clouds recognition result;Within described waiting time, described reception unit does not receives institute State the high in the clouds recognition result that high in the clouds identifies that engine returns, and after confidence level corresponding to described local recognition result is more than reducing During the confidence interval upper limit, export described local recognition result.
System the most according to claim 6, it is characterised in that described system also includes:
Statistic unit, reduces the number of times of the described confidence interval upper limit for adding up described confidence level adjustment unit;
Described output unit, is additionally operable to after described number of times exceedes the frequency threshold value of setting, if local recognition result is corresponding Confidence level is still less than the confidence interval lower limit after reducing, and does not receives described high in the clouds recognition result yet, then return to user Return recognition failures information.
System the most according to claim 6, it is characterised in that
Described reception unit, is additionally operable to formerly receive described local recognition result, and described local recognition result is corresponding When confidence level is less than the confidence interval lower limit set, abandon described local recognition result, continue waiting for the identification of described high in the clouds and draw Hold up return high in the clouds recognition result;And after the waiting time exceedes the obstruction duration of setting, return recognition failures information to user.
9. according to the system described in any one of claim 6 to 8, it is characterised in that described system also includes:
Trigger element, for after receiving the speech recognition request that user sends, opens high in the clouds and identifies that engine identifies with local Engine.
CN201310335050.0A 2013-08-02 2013-08-02 Audio recognition method and system Active CN103440867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310335050.0A CN103440867B (en) 2013-08-02 2013-08-02 Audio recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310335050.0A CN103440867B (en) 2013-08-02 2013-08-02 Audio recognition method and system

Publications (2)

Publication Number Publication Date
CN103440867A CN103440867A (en) 2013-12-11
CN103440867B true CN103440867B (en) 2016-08-10

Family

ID=49694558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310335050.0A Active CN103440867B (en) 2013-08-02 2013-08-02 Audio recognition method and system

Country Status (1)

Country Link
CN (1) CN103440867B (en)

Families Citing this family (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2851896A1 (en) 2013-09-19 2015-03-25 Maluuba Inc. Speech recognition using phoneme matching
JP6054283B2 (en) * 2013-11-27 2016-12-27 シャープ株式会社 Speech recognition terminal, server, server control method, speech recognition system, speech recognition terminal control program, server control program, and speech recognition terminal control method
CN103730119B (en) * 2013-12-18 2017-01-11 惠州市车仆电子科技有限公司 Vehicle-mounted man-machine voice interaction system
US9601108B2 (en) * 2014-01-17 2017-03-21 Microsoft Technology Licensing, Llc Incorporating an exogenous large-vocabulary model into rule-based speech recognition
CN104536978A (en) * 2014-12-05 2015-04-22 奇瑞汽车股份有限公司 Voice data identifying method and device
CN105824857A (en) * 2015-01-08 2016-08-03 中兴通讯股份有限公司 Voice search method, device and terminal
CN105261366B (en) * 2015-08-31 2016-11-09 努比亚技术有限公司 Audio recognition method, speech engine and terminal
CN105118508B (en) * 2015-09-14 2018-10-23 百度在线网络技术(北京)有限公司 Audio recognition method and device
CN106782546A (en) * 2015-11-17 2017-05-31 深圳市北科瑞声科技有限公司 Audio recognition method and device
CN105551494A (en) * 2015-12-11 2016-05-04 奇瑞汽车股份有限公司 Mobile phone interconnection-based vehicle-mounted speech recognition system and recognition method
CN105551488A (en) * 2015-12-15 2016-05-04 深圳Tcl数字技术有限公司 Voice control method and system
CN106910504A (en) * 2015-12-22 2017-06-30 北京君正集成电路股份有限公司 A kind of speech reminding method and device based on speech recognition
CN105931639B (en) * 2016-05-31 2019-09-10 杨若冲 A kind of voice interactive method for supporting multistage order word
CN106328148B (en) * 2016-08-19 2019-12-31 上汽通用汽车有限公司 Natural voice recognition method, device and system based on local and cloud hybrid recognition
CN106228975A (en) * 2016-09-08 2016-12-14 康佳集团股份有限公司 The speech recognition system of a kind of mobile terminal and method
CN106384594A (en) * 2016-11-04 2017-02-08 湖南海翼电子商务股份有限公司 On-vehicle terminal for voice recognition and method thereof
CN106558313A (en) * 2016-11-16 2017-04-05 北京云知声信息技术有限公司 Audio recognition method and device
CN106847291A (en) * 2017-02-20 2017-06-13 成都启英泰伦科技有限公司 Speech recognition system and method that a kind of local and high in the clouds is combined
CN108573706B (en) * 2017-03-10 2021-06-08 北京搜狗科技发展有限公司 Voice recognition method, device and equipment
CN107464567A (en) * 2017-07-24 2017-12-12 深圳云知声信息技术有限公司 Audio recognition method and device
WO2019036849A1 (en) * 2017-08-21 2019-02-28 深圳前海达闼云端智能科技有限公司 Substance detection method, device thereof, and detection terminal
CN107564525A (en) * 2017-10-23 2018-01-09 深圳北鱼信息科技有限公司 Audio recognition method and device
CN107785019A (en) * 2017-10-26 2018-03-09 西安Tcl软件开发有限公司 Mobile unit and its audio recognition method, readable storage medium storing program for executing
WO2019127151A1 (en) * 2017-12-27 2019-07-04 深圳达闼科技控股有限公司 Detection method, detection device, and server
CN110060668A (en) * 2018-02-02 2019-07-26 上海华镇电子科技有限公司 The system and method for identification delay is reduced in a kind of speech recognition controlled
CN110299136A (en) * 2018-03-22 2019-10-01 上海擎感智能科技有限公司 A kind of processing method and its system for speech recognition
CN108847219B (en) * 2018-05-25 2020-12-25 台州智奥通信设备有限公司 Awakening word preset confidence threshold adjusting method and system
CN110970032A (en) * 2018-09-28 2020-04-07 深圳市冠旭电子股份有限公司 Sound box voice interaction control method and device
CN111091819A (en) * 2018-10-08 2020-05-01 蔚来汽车有限公司 Voice recognition device and method, voice interaction system and method
CN109493862B (en) * 2018-12-24 2021-11-09 深圳Tcl新技术有限公司 Terminal, voice server determination method, and computer-readable storage medium
CN109869862A (en) * 2019-01-23 2019-06-11 四川虹美智能科技有限公司 The control method and a kind of air-conditioning system of a kind of air-conditioning, a kind of air-conditioning
CN110148416B (en) * 2019-04-23 2024-03-15 腾讯科技(深圳)有限公司 Speech recognition method, device, equipment and storage medium
CN110223683A (en) * 2019-05-05 2019-09-10 安徽省科普产品工程研究中心有限责任公司 Voice interactive method and system
CN110265018B (en) * 2019-07-01 2022-03-04 成都启英泰伦科技有限公司 Method for recognizing continuously-sent repeated command words
CN113053369A (en) * 2019-12-26 2021-06-29 青岛海尔空调器有限总公司 Voice control method and device of intelligent household appliance and intelligent household appliance
CN111261166B (en) * 2020-01-15 2022-09-27 云知声智能科技股份有限公司 Voice recognition method and device
CN111477225B (en) * 2020-03-26 2021-04-30 北京声智科技有限公司 Voice control method and device, electronic equipment and storage medium
CN112905247A (en) * 2021-01-25 2021-06-04 斑马网络技术有限公司 Method and device for automatically detecting and switching languages, terminal equipment and storage medium
CN112896048A (en) * 2021-03-15 2021-06-04 中电科创智联(武汉)有限责任公司 Vehicle-mounted all-around display system and method based on mobile phone interconnection and voice recognition
CN113380254A (en) * 2021-06-21 2021-09-10 紫优科技(深圳)有限公司 Voice recognition method, device and medium based on cloud computing and edge computing
CN113380253A (en) * 2021-06-21 2021-09-10 紫优科技(深圳)有限公司 Voice recognition system, device and medium based on cloud computing and edge computing
CN114446279A (en) * 2022-02-18 2022-05-06 青岛海尔科技有限公司 Voice recognition method, voice recognition device, storage medium and electronic equipment
CN114550719A (en) * 2022-02-21 2022-05-27 青岛海尔科技有限公司 Method and device for recognizing voice control instruction and storage medium
CN115410579B (en) * 2022-10-28 2023-03-31 广州小鹏汽车科技有限公司 Voice interaction method, voice interaction device, vehicle and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1181684B1 (en) * 1999-03-26 2004-11-03 Scansoft, Inc. Client-server speech recognition
CN102496364A (en) * 2011-11-30 2012-06-13 苏州奇可思信息科技有限公司 Interactive speech recognition method based on cloud network
CN102708865A (en) * 2012-04-25 2012-10-03 北京车音网科技有限公司 Method, device and system for voice recognition
CN103137129A (en) * 2011-12-02 2013-06-05 联发科技股份有限公司 Voice recognition method and electronic device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1181684B1 (en) * 1999-03-26 2004-11-03 Scansoft, Inc. Client-server speech recognition
CN102496364A (en) * 2011-11-30 2012-06-13 苏州奇可思信息科技有限公司 Interactive speech recognition method based on cloud network
CN103137129A (en) * 2011-12-02 2013-06-05 联发科技股份有限公司 Voice recognition method and electronic device
CN102708865A (en) * 2012-04-25 2012-10-03 北京车音网科技有限公司 Method, device and system for voice recognition

Also Published As

Publication number Publication date
CN103440867A (en) 2013-12-11

Similar Documents

Publication Publication Date Title
CN103440867B (en) Audio recognition method and system
CN104715752B (en) Audio recognition method, apparatus and system
US9691390B2 (en) System and method for performing dual mode speech recognition
CN107241689B (en) Earphone voice interaction method and device and terminal equipment
US9583102B2 (en) Method of controlling interactive system, method of controlling server, server, and interactive device
CN110557451B (en) Dialogue interaction processing method and device, electronic equipment and storage medium
US10636414B2 (en) Speech processing apparatus and speech processing method with three recognizers, operation modes and thresholds
US7689424B2 (en) Distributed speech recognition method
US11244686B2 (en) Method and apparatus for processing speech
CN1722230A (en) Allocation of speech recognition tasks and combination of results thereof
CN102708865A (en) Method, device and system for voice recognition
CN103117058A (en) Multi-voice engine switch system and method based on intelligent television platform
WO2014176894A1 (en) Voice processing method and terminal
CN105975063B (en) A kind of method and apparatus controlling intelligent terminal
CN110992955A (en) Voice operation method, device, equipment and storage medium of intelligent equipment
EP4040764A2 (en) Method and apparatus for in-vehicle call, device, computer readable medium and product
CN111356117A (en) Voice interaction method and Bluetooth device
CN106059997A (en) Vehicle-mounted voice interaction method and system
WO2022206704A1 (en) Voice interaction method and electronic device
CN109964473B (en) Voice service response method and device
EP3059731A1 (en) Method and apparatus for automatically sending multimedia file, mobile terminal, and storage medium
CN106228975A (en) The speech recognition system of a kind of mobile terminal and method
CN109410926A (en) Voice method for recognizing semantics and system
CN111128166B (en) Optimization method and device for continuous awakening recognition function
CN113132214B (en) Dialogue method, dialogue device, dialogue server and dialogue storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant after: Iflytek Co., Ltd.

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant before: Anhui USTC iFLYTEK Co., Ltd.

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant