CN102708865A

CN102708865A - Method, device and system for voice recognition

Info

Publication number: CN102708865A
Application number: CN2012101233692A
Authority: CN
Inventors: 沈嘉鑫; 王力劭; 邵颖
Original assignee: BEIJING VCYBER TECHNOLOGY Co Ltd
Current assignee: BEIJING VCYBER TECHNOLOGY Co Ltd
Priority date: 2012-04-25
Filing date: 2012-04-25
Publication date: 2012-10-03

Abstract

The invention discloses a method, a device and a system for voice recognition, relating to voice recognition technology. The invention is invented in order to solve the problem that in the prior art, the network delay is caused, so that the accuracy rate of the voice recognition is lower. The technical scheme disclosed by the invention embodiment comprises the following steps of: receiving a voice message sent by a user; recognizing and analyzing the voice message by an embedded voice recognition database to obtain the local recognition result corresponding to the voice message and a reliability value of the local recognition result; outputting the local recognition result if the reliability value of the local recognition result is more than preset reliability threshold; if not, sending the voice message to a cloud computing platform server, so that the cloud computing platform server recognizes and analyzes the voice message by a remote end voice recognition database to obtain the remote end recognition result corresponding to the voice message; and outputting the remote end recognition result returned by the cloud computing platform server. The technical scheme disclosed by the embodiment of the invention can be applied to an information service system.

Description

Audio recognition method, Apparatus and system

Technical field

The present invention relates to speech recognition technology, relate in particular to a kind of audio recognition method, Apparatus and system.

Background technology

Along with the sustainable development of computing machine and infotech, interactive voice has become the necessary means of man-machine interaction.As one of important technology of interactive voice, speech recognition technology reaches its maturity, and is widely used through the development of nearly half a century.

The process of speech recognition comprises in the prior art: receive the voice messaging that the user sends; Connect with speech recognition server; This voice messaging is sent to speech recognition server, makes speech recognition server discern, resolve, obtain corresponding recognition result this voice messaging; Receive the recognition result that speech recognition server returns.

Because the speech recognition server through network side carries out speech recognition, makes each speech recognition all need carry out alternately with network side, produce network delay; And, when network condition is relatively poor, carry out may producing packet loss in the mutual process with network side, make that the accuracy rate of speech recognition is lower.

Summary of the invention

Embodiments of the invention provide a kind of audio recognition method, Apparatus and system, can reduce network delay, and improve the accuracy rate of speech recognition.

On the one hand, a kind of audio recognition method is provided, comprises: receive the voice messaging that the user sends; Through the Embedded Speech Recognition System database said voice messaging is discerned, resolved, obtain the corresponding local recognition result of said voice messaging and the confidence value of said local recognition result; If the confidence value of said local recognition result greater than preset reliable degree thresholding, is exported said local recognition result; Otherwise, send said voice messaging to the cloud computing platform server, make said cloud computing platform server discern, resolve said voice messaging through the far-end speech identification database, obtain the corresponding far-end recognition result of said voice messaging; Export the far-end recognition result that said cloud computing platform server returns.

On the other hand, a kind of speech recognition equipment is provided, comprises:

The voice receiver module is used to receive the voice messaging that the user sends;

Identification module is used for through the Embedded Speech Recognition System database said voice messaging being discerned, being resolved, and obtains the corresponding local recognition result of said voice messaging and the confidence value of said local recognition result;

First output module is if the confidence value that is used for said local recognition result is exported said local recognition result greater than preset reliable degree thresholding;

Information sending module; Be used for otherwise; Send said voice messaging to the cloud computing platform server, make said cloud computing platform server discern, resolve said voice messaging, obtain the corresponding far-end recognition result of said voice messaging through the far-end speech identification database;

Second output module is used to export the far-end recognition result that said cloud computing platform server returns.

Another aspect provides a kind of speech recognition system, comprising:

Speech recognition equipment is used to receive the voice messaging that the user sends; Through the Embedded Speech Recognition System database said voice messaging is discerned, resolved, obtain the corresponding local recognition result of said voice messaging and the confidence value of said local recognition result; If the confidence value of said local recognition result greater than preset reliable degree thresholding, is exported said local recognition result; Otherwise, send said voice messaging to the cloud computing platform server; Export the far-end recognition result that said cloud computing platform server returns;

Said cloud computing platform server is used to receive the voice messaging that said speech recognition equipment sends; Said voice messaging is discerned, resolved, obtain the corresponding far-end recognition result of said voice messaging; Send said far-end recognition result to said speech recognition equipment.

The audio recognition method that the embodiment of the invention provides, Apparatus and system combine Embedded Speech Recognition System with the high in the clouds speech recognition, if the confidence value of local recognition result greater than preset reliable degree thresholding, output should this locality recognition result; Otherwise, send voice messaging and export the far-end recognition result that it returns to the cloud computing platform server.Because the technical scheme that the embodiment of the invention provides combines Embedded Speech Recognition System with the high in the clouds speech recognition; Make that need not each speech recognition all carries out alternately with network side; Thereby under the prerequisite of the accuracy rate that guarantees speech recognition; Reduce the reciprocal process with network side, reduced network delay; And, when network condition is relatively poor, can reduce packet loss, thereby improve the accuracy rate of speech recognition; Solved prior art owing to the speech recognition server through network side carries out speech recognition, made each speech recognition all need carry out alternately, produced network delay with network side; And, when network condition is relatively poor, carry out may producing packet loss in the mutual process with network side, make that the accuracy rate of speech recognition is lower.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art; To do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

The process flow diagram of the audio recognition method that Fig. 1 provides for the embodiment of the invention one;

The process flow diagram one of the audio recognition method that Fig. 2 provides for the embodiment of the invention two;

The flowchart 2 of the audio recognition method that Fig. 3 provides for the embodiment of the invention two;

The process flow diagram of the audio recognition method that Fig. 4 provides for the embodiment of the invention three;

The structural representation one of the speech recognition equipment that Fig. 5 provides for the embodiment of the invention four;

The structural representation two of the speech recognition equipment that Fig. 6 provides for the embodiment of the invention four;

The structural representation three of the speech recognition equipment that Fig. 7 provides for the embodiment of the invention four;

The structural representation of the speech recognition system that Fig. 8 provides for the embodiment of the invention five.

Embodiment

To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills are not making the every other embodiment that is obtained under the creative work prerequisite, all belong to the scope of the present invention's protection.

In order to solve the problem that prior art produces the accuracy rate of network delay and speech recognition, the embodiment of the invention provides a kind of audio recognition method, Apparatus and system.

Embodiment one:

Audio recognition method as shown in Figure 1, that the embodiment of the invention provides comprises:

Step 101 receives the voice messaging that the user sends.

In the present embodiment, step 101 can receive the voice messaging that the user sends after the user presses voice typing key, also can carry out other operation backs the user and receive the voice messaging that the user sends, and does not limit at this.Wherein, the voice messaging of user's input can be simple phonetic order, also can give unnecessary details no longer one by one once more for comprising other information of phonetic order.

Step 102 is discerned, is resolved this voice messaging through the Embedded Speech Recognition System database, obtains corresponding local recognition result of voice messaging and confidence value that should this locality recognition result.

In the present embodiment, the Embedded Speech Recognition System database can be used to store any phonetic feature storehouse in the step 102, and in order to dwindle the scale of Embedded Speech Recognition System database, preferred, this Embedded Speech Recognition System database can be used for control store instruction.Be applied as example with music, the Embedded Speech Recognition System database can be used for storage broadcast, time-out, a last head, next etc. steering order; The steering order of Embedded Speech Recognition System database storing includes but are not limited to the above, gives unnecessary details no longer one by one at this.

In the present embodiment; Step 102 is discerned, is resolved voice messaging through the Embedded Speech Recognition System database; Obtain the process of local recognition result; Can obtain the confidence value of each phonetic feature in the Embedded Speech Recognition System database, and the phonetic feature that confidence value is the highest be as local recognition result for the phonetic feature in voice messaging and the Embedded Speech Recognition System database being carried out similarity respectively relatively; Step 102 also can obtain local recognition result through other modes, gives unnecessary details no longer one by one at this.Wherein, the confidence value of local recognition result can be confirmed through said process, also can confirm through other modes, does not limit at this.

In the present embodiment, the Embedded Speech Recognition System storehouse can be stored several kinds of typical phonetic feature storehouses in advance; Also can store multiple wide spectrum phonetic feature storehouse in advance.Need to prove; This wide spectrum phonetic feature storehouse can be through gathering the whole of China various places, various people and these people under varying environment behind the sound of (different noise background); The set of the wide spectrum phonetic feature that essence extracts; This wide spectrum phonetic feature storehouse only depends on the information in existing " phonetic feature storehouse ", and does not rely on someone's phonetic feature training result.Special, this wide spectrum phonetic feature storehouse can also comprise outer repertorie, wherein should can have the external language librarys of main flow such as English storehouse, method repertorie, German storehouse, day repertorie by outer repertorie.

Step 103, whether the confidence value of judging local recognition result is greater than preset reliable degree thresholding.

In the present embodiment, the confidence level thresholding can be provided with arbitrarily in the step 103, also can not limit at this according to the statistics setting.If the confidence value of passing through the definite local recognition result of step 103 can be through the local recognition result of step 104 output greater than preset reliable degree thresholding; Otherwise, send voice messaging to the cloud computing platform server through step 105.

Step 104 is exported local recognition result.

Step 105 is sent voice messaging to the cloud computing platform server, makes the cloud computing platform server discern, resolve voice messaging through the far-end identification database, obtains the corresponding far-end recognition result of voice messaging.

In the present embodiment, this locality can connect with the cloud computing platform server in advance, also can the confidence value of local recognition result during less than preset reliable degree thresholding and the cloud computing platform server connect, do not limit at this.Can be through connecting like multiple communication modes such as Internet, 3G mobile network and cloud computing platform server; Concrete; Can store cloud computing platform network address of server (like uniform resource position mark URL) or call number in advance, according to the network address or call number through establishing a communications link with the cloud computing platform server like Internet, 3G mobile network etc.

In the present embodiment; The cloud computing platform server can be stored multiple wide spectrum phonetic feature storehouse in advance; For example: the wide spectrum phonetic feature storehouse that is provided with according to place name, wide spectrum phonetic feature storehouse that is provided with according to the audio frequency and video title and the wide spectrum phonetic feature storehouse that is provided with according to name etc.Need to prove; This wide spectrum phonetic feature storehouse can be through gathering the whole of China various places, various people and these people under varying environment behind the sound of (different noise background); The set of the wide spectrum phonetic feature that essence extracts; This wide spectrum phonetic feature storehouse only depends on the information in existing " phonetic feature storehouse ", and does not rely on someone's phonetic feature training result.Special, this wide spectrum phonetic feature storehouse can also comprise outer repertorie, wherein should can have the external language librarys of main flow such as English storehouse, method repertorie, German storehouse, day repertorie by outer repertorie.

Step 106, the far-end recognition result that output cloud computing platform server returns.

The far-end recognition result that can directly return in the present embodiment, through step 106 output cloud computing platform server; In the time of also can being higher than the confidence value of local recognition result,, give unnecessary details no longer one by one at this through the far-end recognition result that step 106 output cloud computing platform server returns in the confidence value of far-end recognition result.

The audio recognition method that the embodiment of the invention provides combines Embedded Speech Recognition System with the high in the clouds speech recognition, if the confidence value of local recognition result greater than preset reliable degree thresholding, output should this locality recognition result; Otherwise, send voice messaging and export the far-end recognition result that it returns to the cloud computing platform server.Because the technical scheme that the embodiment of the invention provides combines Embedded Speech Recognition System with the high in the clouds speech recognition; Make that need not each speech recognition all carries out alternately with network side; Thereby under the prerequisite of the accuracy rate that guarantees speech recognition; Reduce the reciprocal process with network side, reduced network delay; And, when network condition is relatively poor, can reduce packet loss, thereby improve the accuracy rate of speech recognition; Solved prior art owing to the speech recognition server through network side carries out speech recognition, made each speech recognition all need carry out alternately, produced network delay with network side; And, when network condition is relatively poor, carry out may producing packet loss in the mutual process with network side, make that the accuracy rate of speech recognition is lower.

Embodiment two:

Audio recognition method as shown in Figure 2, that the embodiment of the invention provides comprises:

Step 201 to step 205 is obtained the confidence value of local recognition result and local recognition result, and the confidence value of local recognition result is exported during greater than preset reliable degree thresholding, otherwise sends voice command to the cloud computing platform server.Detailed process is similar with step 101 to step 105 shown in Figure 1, gives unnecessary details no longer one by one at this.

Step 206, the confidence value of sending local recognition result and local recognition result to the cloud computing platform server.

Whether step 207, the confidence value of judging the far-end recognition result be greater than the confidence value of local recognition result.

In the present embodiment, if the confidence value of confirming the far-end recognition result through step 207 during smaller or equal to the confidence value of local recognition result, could be through the local recognition result of step 208 output.

Step 208 according to the control command that the cloud computing platform server returns, is exported local recognition result.

In the present embodiment, control command is used for the local recognition result of indication output in the step 208.

Further, as shown in Figure 3, audio recognition method in the present embodiment can also comprise:

Step 209, the far-end recognition result that output cloud computing platform server returns.

In the present embodiment, if confirm the confidence value of the confidence value of far-end recognition result, can export the far-end recognition result that the cloud computing platform server returns through step 209 greater than local recognition result through step 207.

Embodiment three:

Audio recognition method as shown in Figure 4, that the embodiment of the invention provides, this method is similar with audio recognition method shown in Figure 1, and difference is, also comprises:

Step 107 is obtained database update information from the cloud computing platform server.

In the present embodiment, the database update information of obtaining from the cloud computing platform server through step 107 can be sent the database update request to the cloud computing platform server for this locality, and the corresponding information of returning according to database is obtained; Also can obtain for the information returned according to the cloud computing platform server; Can also give unnecessary details no longer one by one at this for what obtain through other modes.Wherein, the Data Update request is sent to the cloud computing platform server in this locality, can be timed sending, also can not limit at this for indicating transmission according to the user; The information that the cloud computing platform server returns can not limit at this for the information of returning according to other settings for the information of regularly returning yet.

In the present embodiment; Database update information in the step 107 can be the increase information of the phonetic feature of Embedded Speech Recognition System database, also can be the minimizing information of the phonetic feature of Embedded Speech Recognition System database; Also can be Embedded Speech Recognition System database deletion information; Can also be the stack of foregoing,, give unnecessary details no longer one by one at this like the increase information of the phonetic feature of Embedded Speech Recognition System database and Embedded Speech Recognition System database deletion information etc.

Step 108 is according to this database update information updating Embedded Speech Recognition System database.

In the present embodiment, obtain database update information through step 107 from the cloud computing platform server after, can upgrade operation accordingly to the Embedded Speech Recognition System database according to this database update information.For example: obtain Embedded Speech Recognition System database deletion information through step 107 from the cloud computing platform server, the Embedded Speech Recognition System database is carried out corresponding deletion action, give unnecessary details no longer one by one at this.

Embodiment four:

Speech recognition equipment as shown in Figure 5, that the embodiment of the invention provides comprises:

Voice receiver module 501 is used to receive the voice messaging that the user sends.

In the present embodiment, voice receiver module 501 can receive the voice messaging that the user sends after the user presses voice typing key, also can carry out other operation backs the user and receive the voice messaging that the user sends, and does not limit at this.Wherein, the voice messaging of user's input can be simple phonetic order, also can give unnecessary details no longer one by one once more for comprising other information of phonetic order.

Identification module 502 is used for through the Embedded Speech Recognition System database voice messaging being discerned, being resolved, and obtains the corresponding local recognition result of voice messaging and the confidence value of local recognition result.

In the present embodiment, the Embedded Speech Recognition System database can be used to store any phonetic feature storehouse in the identification module 502, and in order to dwindle the scale of Embedded Speech Recognition System database, preferred, this Embedded Speech Recognition System database can be used for control store instruction.Be applied as example with music, the Embedded Speech Recognition System database can be used for storage broadcast, time-out, a last head, next etc. steering order; The steering order of Embedded Speech Recognition System database storing includes but are not limited to the above, gives unnecessary details no longer one by one at this.

In the present embodiment; Identification module 502 is discerned, is resolved voice messaging through the Embedded Speech Recognition System database; Obtain the process of local recognition result; Can obtain the confidence value of each phonetic feature in the Embedded Speech Recognition System database, and the phonetic feature that confidence value is the highest be as local recognition result for the phonetic feature in voice messaging and the Embedded Speech Recognition System database being carried out similarity respectively relatively; Identification module 502 also can obtain local recognition result through other modes, gives unnecessary details no longer one by one at this.Wherein, the confidence value of local recognition result can be confirmed through said process, also can confirm through other modes, does not limit at this.

First output module 503 is if the confidence value that is used for local recognition result is exported local recognition result greater than preset reliable degree thresholding.

Information sending module 504, be used for otherwise, send voice messaging to the cloud computing platform server, make the cloud computing platform server discern, resolve voice messaging through the far-end speech identification database, obtain the corresponding far-end recognition result of voice messaging.

Second output module 505 is used to export the far-end recognition result that the cloud computing platform server returns.

The far-end recognition result that can directly return in the present embodiment, through second output module, 505 output cloud computing platform servers; In the time of also can being higher than the confidence value of local recognition result,, give unnecessary details no longer one by one at this through the far-end recognition result that second output module, 505 output cloud computing platform servers return in the confidence value of far-end recognition result.

Further, as shown in Figure 6, the speech recognition equipment that present embodiment provides also comprises:

Recognition result sending module 506 is used for sending to the cloud computing platform server confidence value of local recognition result and local recognition result.

At this moment; Second output module 505, if also be used for the confidence value of the confidence value of far-end recognition result smaller or equal to local recognition result, the control command of returning according to the cloud computing platform server; Export local recognition result, control command is used for the local recognition result of indication output.

Further, as shown in Figure 7, the speech recognition equipment that present embodiment provides can also comprise:

Lastest imformation acquisition module 507 is used for obtaining database update information from the cloud computing platform server.

In the present embodiment, the database update information of obtaining from the cloud computing platform server through lastest imformation acquisition module 507 can be sent the database update request to the cloud computing platform server for this locality, and the corresponding information of returning according to database is obtained; Also can obtain for the information returned according to the cloud computing platform server; Can also give unnecessary details no longer one by one at this for what obtain through other modes.Wherein, the Data Update request is sent to the cloud computing platform server in this locality, can be timed sending, also can not limit at this for indicating transmission according to the user; The information that the cloud computing platform server returns can not limit at this for the information of returning according to other settings for the information of regularly returning yet.

In the present embodiment; Database update information in the lastest imformation acquisition module 507 can be the increase information of the phonetic feature of Embedded Speech Recognition System database, also can be the minimizing information of the phonetic feature of Embedded Speech Recognition System database; Also can be Embedded Speech Recognition System database deletion information; Can also be the stack of foregoing,, give unnecessary details no longer one by one at this like the increase information of the phonetic feature of Embedded Speech Recognition System database and Embedded Speech Recognition System database deletion information etc.

Update module 508 is used for according to database update information updating Embedded Speech Recognition System database.

In the present embodiment, obtain database update information through lastest imformation acquisition module 507 from the cloud computing platform server after, can upgrade operation accordingly to the Embedded Speech Recognition System database according to this database update information.For example: obtain Embedded Speech Recognition System database deletion information through lastest imformation acquisition module 507 from the cloud computing platform server, the Embedded Speech Recognition System database is carried out corresponding deletion action, give unnecessary details no longer one by one at this.

The speech recognition equipment that the embodiment of the invention provides combines Embedded Speech Recognition System with the high in the clouds speech recognition, if the confidence value of local recognition result greater than preset reliable degree thresholding, output should this locality recognition result; Otherwise, send voice messaging and export the far-end recognition result that it returns to the cloud computing platform server.Because the technical scheme that the embodiment of the invention provides combines Embedded Speech Recognition System with the high in the clouds speech recognition; Make that need not each speech recognition all carries out alternately with network side; Thereby under the prerequisite of the accuracy rate that guarantees speech recognition; Reduce the reciprocal process with network side, reduced network delay; And, when network condition is relatively poor, can reduce packet loss, thereby improve the accuracy rate of speech recognition; Solved prior art owing to the speech recognition server through network side carries out speech recognition, made each speech recognition all need carry out alternately, produced network delay with network side; And, when network condition is relatively poor, carry out may producing packet loss in the mutual process with network side, make that the accuracy rate of speech recognition is lower.

Embodiment five:

Speech recognition system as shown in Figure 8, that the embodiment of the invention provides comprises:

Speech recognition equipment 801 is used to receive the voice messaging that the user sends; Through the Embedded Speech Recognition System database voice messaging is discerned, resolved, obtain the corresponding local recognition result of voice messaging and the confidence value of local recognition result; If the confidence value of local recognition result is exported local recognition result greater than preset reliable degree thresholding; Otherwise, send voice messaging to the cloud computing platform server; The far-end recognition result that output cloud computing platform server returns.

In the present embodiment, can after the user presses voice typing key, receive the voice messaging that the user sends, also can carry out other operation backs and receive the voice messaging that the user sends, not limit at this user.Wherein, the voice messaging of user's input can be simple phonetic order, also can give unnecessary details no longer one by one once more for comprising other information of phonetic order.

In the present embodiment, the Embedded Speech Recognition System database can be used to store any phonetic feature storehouse, and in order to dwindle the scale of Embedded Speech Recognition System database, preferred, this Embedded Speech Recognition System database can be used for control store instruction.Be applied as example with music, the Embedded Speech Recognition System database can be used for storage broadcast, time-out, a last head, next etc. steering order; The steering order of Embedded Speech Recognition System database storing includes but are not limited to the above, gives unnecessary details no longer one by one at this.

In the present embodiment; Through the Embedded Speech Recognition System database voice messaging is discerned, resolved; Obtain the process of local recognition result; Can obtain the confidence value of each phonetic feature in the Embedded Speech Recognition System database, and the phonetic feature that confidence value is the highest be as local recognition result for the phonetic feature in voice messaging and the Embedded Speech Recognition System database being carried out similarity respectively relatively; Also can obtain local recognition result, give unnecessary details no longer one by one at this through other modes.Wherein, the confidence value of local recognition result can be confirmed through said process, also can confirm through other modes, does not limit at this.

In the present embodiment, can directly export the far-end recognition result that the cloud computing platform server returns; In the time of also can being higher than the confidence value of local recognition result in the confidence value of far-end recognition result, the far-end recognition result that output cloud computing platform server returns is given unnecessary details at this no longer one by one.

Cloud computing platform server 802 is used to receive the voice messaging that speech recognition equipment sends; Voice messaging is discerned, resolved, obtain the corresponding far-end recognition result of voice messaging; Send the far-end recognition result to speech recognition equipment.

Further, in the speech recognition system that present embodiment provides, speech recognition equipment 801 also is used for sending to the cloud computing platform server confidence value of local recognition result and local recognition result; According to the control command that the cloud computing platform server returns, export local recognition result; Cloud computing platform server 802 also is used to obtain the confidence value of far-end recognition result; If the confidence value of far-end recognition result smaller or equal to the confidence value of local recognition result, is sent the control command of the local recognition result of indication output to speech recognition equipment.

The speech recognition system that the embodiment of the invention provides combines Embedded Speech Recognition System with the high in the clouds speech recognition, if the confidence value of local recognition result greater than preset reliable degree thresholding, output should this locality recognition result; Otherwise, send voice messaging and export the far-end recognition result that it returns to the cloud computing platform server.Because the technical scheme that the embodiment of the invention provides combines Embedded Speech Recognition System with the high in the clouds speech recognition; Make that need not each speech recognition all carries out alternately with network side; Thereby under the prerequisite of the accuracy rate that guarantees speech recognition; Reduce the reciprocal process with network side, reduced network delay; And, when network condition is relatively poor, can reduce packet loss, thereby improve the accuracy rate of speech recognition; Solved prior art owing to the speech recognition server through network side carries out speech recognition, made each speech recognition all need carry out alternately, produced network delay with network side; And, when network condition is relatively poor, carry out may producing packet loss in the mutual process with network side, make that the accuracy rate of speech recognition is lower.

The audio recognition method that the embodiment of the invention provides, Apparatus and system can be applied in as in the information service systems such as navigation, requesting song and contact person's inquiry.

The above; Be merely embodiment of the present invention, but protection scope of the present invention is not limited thereto, any technician who is familiar with the present technique field is in the technical scope that the present invention discloses; Can expect easily changing or replacement, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion by said protection domain with claim.

Claims

1. an audio recognition method is characterized in that, comprising:

Receive the voice messaging that the user sends;

Through the Embedded Speech Recognition System database said voice messaging is discerned, resolved, obtain the corresponding local recognition result of said voice messaging and the confidence value of said local recognition result;

If the confidence value of said local recognition result greater than preset reliable degree thresholding, is exported said local recognition result;

Otherwise, send said voice messaging to the cloud computing platform server, make said cloud computing platform server discern, resolve said voice messaging through the far-end speech identification database, obtain the corresponding far-end recognition result of said voice messaging;

Export the far-end recognition result that said cloud computing platform server returns.

2. audio recognition method according to claim 1 is characterized in that, also comprises:

The confidence value of sending said local recognition result and local recognition result to said cloud computing platform server;

The far-end recognition result that the said cloud computing platform server of then said output returns replaces with:

If the confidence value of said far-end recognition result smaller or equal to the confidence value of local recognition result, according to the control command that the cloud computing platform server returns, is exported local recognition result, said control command is used for the local recognition result of indication output.

3. audio recognition method according to claim 1 is characterized in that, also comprises:

Obtain database update information from said cloud computing platform server;

According to the said Embedded Speech Recognition System database of said database update information updating.

4. according to any described audio recognition method among the claim 1-3, it is characterized in that said Embedded Speech Recognition System database is used for control store instruction.

5. a speech recognition equipment is characterized in that, comprising:

6. speech recognition equipment according to claim 5 is characterized in that, also comprises:

The recognition result sending module is used for the confidence value of sending said local recognition result and local recognition result to said cloud computing platform server;

Said second output module; If also be used for the confidence value of the confidence value of said far-end recognition result smaller or equal to local recognition result; According to the control command that the cloud computing platform server returns, export local recognition result, said control command is used for the local recognition result of indication output.

7. speech recognition equipment according to claim 5 is characterized in that, also comprises:

The lastest imformation acquisition module is used for obtaining database update information from said cloud computing platform server;

Update module is used for according to the said Embedded Speech Recognition System database of said database update information updating.

8. according to any described speech recognition equipment among the claim 5-7, it is characterized in that said Embedded Speech Recognition System database is used for control store instruction.

9. a speech recognition system is characterized in that, comprising:

10. speech recognition system according to claim 9 is characterized in that,

Said speech recognition equipment also is used for the confidence value of sending said local recognition result and local recognition result to said cloud computing platform server; According to the control command that the cloud computing platform server returns, export local recognition result;

Said cloud computing platform server also is used to obtain the confidence value of said far-end recognition result; If the confidence value of said far-end recognition result smaller or equal to the confidence value of local recognition result, is sent the control command of the local recognition result of indication output to said speech recognition equipment.