CN106782561A

CN106782561A - Audio recognition method and system

Info

Publication number: CN106782561A
Application number: CN201611135597.6A
Authority: CN
Inventors: 李鑫伟
Original assignee: Shenzhen TCL New Technology Co Ltd
Current assignee: Shenzhen TCL New Technology Co Ltd; Shenzhen TCL Digital Technology Co Ltd
Priority date: 2016-12-09
Filing date: 2016-12-09
Publication date: 2017-05-31

Abstract

The invention discloses a kind of audio recognition method and system, the method comprising the steps of：When voice flow is received, the voice flow is recognized by local voice recognizer component and third party's speech recognition component；When the local voice recognizer component successfully recognizes the voice flow, the semanteme of the voice flow is recognized by the first speech engine corresponding with the local voice recognizer component, the semanteme that first speech engine is recognized is designated as the first semanteme, is operated accordingly according to first semantic execution；The voice flow is recognized when the local voice recognizer component fails, but when third party's speech recognition component successfully recognizes the voice flow, the semanteme of the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component, the semanteme that second speech engine is recognized is designated as the second semanteme, is operated accordingly according to second semantic execution.The present invention improves success rate and the flexibility of TV speech identification.

Description

Audio recognition method and system

Technical field

The present invention relates to TV domain, more particularly to a kind of audio recognition method and system.

Background technology

With the development of science and technology, intelligent television has been popularized substantially, but the interaction of people and intelligent television, such as text Word is input into, and the experience effect of the function such as content search is still bad.With the development of speech recognition technology, people are known by voice Other engine can manipulate intelligent television, perform zapping, search and the operation such as content of viewing is wanted in program request oneself.

But, due to technology, major application developers are difficult oneself to realize speech engine, most application developers All it is that the operation that voice is converted into word is completed by integrated third party's speech engine.But it is good at due to each speech engine Field of speech recognition is different, and languages, dialect variation, it is difficult to finding a speech engine can meet the side of all demands Case, so as to cause the phonetic function flexibility of intelligent television low, Consumer's Experience effect is poor.

The content of the invention

It is a primary object of the present invention to provide a kind of audio recognition method and system, it is intended to solve the voice of existing TV The low technical problem of identification function flexibility.

To achieve the above object, the present invention provides a kind of audio recognition method, and the audio recognition method includes step：

When the voice flow transmitted by the voice-input device connected with TV is received, by local voice recognizer component The voice flow is recognized with third party's speech recognition component；

When the local voice recognizer component successfully recognizes the voice flow, by with the local voice recognizer component Corresponding first speech engine recognizes the semanteme of the voice flow, and the semanteme that first speech engine is recognized is designated as into the first language Justice, operates accordingly according to first semantic execution；

The voice flow is recognized when the local voice recognizer component fails, but third party's speech recognition component When success recognizes the voice flow, institute's predicate is recognized by the second speech engine corresponding with third party's speech recognition component The semanteme of sound stream, the second semanteme is designated as by the semanteme that second speech engine is recognized, corresponding according to second semantic execution Operation.

Preferably, first speech engine includes that languages switch vocabulary, remote-controller function vocabulary and preset scene vocabulary；

Second speech engine includes a kind of Default sound engine and various other speech engines.

Preferably, it is described that the voice is recognized by the second speech engine corresponding with third party's speech recognition component The semantic step of stream includes：

The semanteme of the voice flow is recognized by the Default sound engine in second speech engine；

When the Default sound engine fail to recognize the voice flow it is semantic when, determine in second speech engine it The priority of its speech engine；

Recognize the semanteme of the voice flow by described other speech engines from high to low successively according to the priority.

Preferably, include the step of the priority of other speech engines in determination second speech engine：

Obtain access times of other speech engines in Preset Time in second speech engine；

The access times are sorted according to order from big to small, ranking results are obtained；

The priority of other speech engines according to the ranking results determine.

Preferably, it is described when the voice flow transmitted by the voice-input device connected with TV is received, by local After the step of speech recognition component and third party's speech recognition component recognize the voice flow, also include：

The voice is recognized when the local voice recognizer component and third party's speech recognition component all fail During stream, prompt message is exported, point out voice flow recognition failures described in user.

Additionally, to achieve the above object, the present invention also provides a kind of speech recognition system, the speech recognition system bag Include：

First identification module, for when the voice flow transmitted by the voice-input device connected with TV is received, leading to Cross local voice recognizer component and third party's speech recognition component recognizes the voice flow；

Second identification module, for when the local voice recognizer component successfully recognizes the voice flow, by with institute The semanteme that corresponding first speech engine of local voice recognizer component recognizes the voice flow is stated, first speech engine is known Other semanteme is designated as the first semanteme, is operated accordingly according to first semantic execution；When the local voice recognizer component not Can successfully recognize the voice flow, but third party's speech recognition component is when successfully recognizing the voice flow, by with it is described Corresponding second speech engine of third party's speech recognition component recognizes the semanteme of the voice flow, and second speech engine is known Other semanteme is designated as the second semanteme, is operated accordingly according to second semantic execution.

Preferably, second identification module includes：

Recognition unit, the language for recognizing the voice flow by the Default sound engine in second speech engine Justice；

Determining unit, for when the Default sound engine fail to recognize the voice flow it is semantic when, determine described The priority of other speech engines in two speech engines；

The recognition unit is additionally operable to be recognized by described other speech engines from high to low successively according to the priority The semanteme of the voice flow.

Preferably, other speech engines are in Preset Time during the determining unit is additionally operable to obtain second speech engine Interior access times；The access times are sorted according to order from big to small, ranking results are obtained；According to the sequence knot Fruit determines the priority of other speech engines.

Preferably, the speech recognition system also includes output module, for when the local voice recognizer component and institute State third party's speech recognition component all fail identification the voice flow when, export prompt message, point out user described in voice Stream recognition failures.

The present invention by when receive with TV connect voice-input device transmitted by voice flow when, by local language Sound recognizer component and third party's speech recognition component recognize the voice flow；When the local voice recognizer component successfully recognizes institute When stating voice flow, the semanteme of the voice flow is recognized by the first speech engine corresponding with the local voice recognizer component, The semanteme that first speech engine is recognized is designated as the first semanteme, is operated accordingly according to first semantic execution；Work as institute Local voice recognizer component is stated to fail the identification voice flow, but third party's speech recognition component successfully recognize it is described During voice flow, the semanteme of the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component, The semanteme that second speech engine is recognized is designated as the second semanteme, is operated accordingly according to second semantic execution.Realize Integrated in addition to the first local speech engine on television, also integrated various third party's speech engines, improve TV The success rate of speech recognition and flexibility.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of audio recognition method preferred embodiment of the present invention；

Fig. 2 be the embodiment of the present invention in by the second speech engine corresponding with third party's speech recognition component recognize A kind of semantic schematic flow sheet of the voice flow；

Fig. 3 is the high-level schematic functional block diagram of speech recognition system preferred embodiment of the present invention；

Fig. 4 is a kind of high-level schematic functional block diagram of the second identification module in the embodiment of the present invention.

The realization of the object of the invention, functional characteristics and advantage will be described further referring to the drawings in conjunction with the embodiments.

Specific embodiment

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

The present invention provides a kind of audio recognition method.

Reference picture 1, Fig. 1 is the schematic flow sheet of the preferred embodiment of audio recognition method of the present invention.

In the present embodiment, the audio recognition method includes：

Step S10, when the voice flow transmitted by the voice-input device connected with TV is received, by local voice Recognizer component and third party's speech recognition component recognize the voice flow；

In the present embodiment, TV and voice-input device are connected, and the voice-input device is independently of the TV. In other embodiments, the voice-input device can be also built in the TV, the CPU (Central with the TV Processing Unit, central processing unit) connection.The voice-input device includes but is not limited to microphone and bluetooth earphone.

After the TV electrifying startup, when user needs to manipulate the TV, user is by being connected with the TV Voice-input device sends voice flow to the TV.When the voice transmitted by the television reception to the voice-input device During stream, the voice flow is sent into local voice recognizer component and third party's speech recognition component, by the local language Sound recognizer component and third party's recognizer component recognize the voice flow.

It is understood that the local voice recognizer component is built in the TV.The TV is by SDK (Software Development Kit, SDK) calls third party's speech recognition component.Further, In the present embodiment, the speed of the voice flow is recognized to improve, when the television reception is to the voice flow, will be described Voice flow is sent to the local voice recognizer component and third party's speech recognition component simultaneously.In other embodiments, The voice flow can be first sent to the local voice recognizer component by the TV, when the local voice recognizer component fails When success recognizes the voice flow, the voice flow is sent to third party's speech recognition component by the TV again.

Step S20, when the local voice recognizer component successfully recognizes the voice flow, by with the local voice Corresponding first speech engine of recognizer component recognizes the semanteme of the voice flow, the semantic note that first speech engine is recognized For first is semantic, operated accordingly according to first semantic execution；

When the TV determines that the local voice recognizer component successfully recognizes the voice flow, you can with find with it is described The corresponding speech engine of local voice recognizer component come recognize the voice flow it is semantic when, the TV is by default system voice Engine switches to first speech engine, is known by first speech engine corresponding with the local voice recognizer component The semanteme of not described voice flow, the first semanteme is designated as by the semanteme that first speech engine is recognized, semantic according to described first Perform corresponding operation.It should be noted that in the present embodiment, when the firm electrifying startup of the TV, the TV institute is right The speech engine answered is default system speech engine.And in other embodiments, it is corresponding during the firm electrifying startup of the TV The speech engine that speech engine can be used by TV last time shutdown.The first speech engine as described in passing through when the TV When the semanteme for recognizing the voice flow is " zapping ", the TV performs zapping operation；When the TV is first as described in When speech engine recognizes the semanteme of the voice flow for " by volume adjusting to 25 ", the TV then adjusts current volume and is 25。

It should be noted that pre-set various conventional voices corresponding with the local voice recognizer component drawing Hold up, be designated as the first speech engine.

Further, it is described when the TV not can determine that the concrete operations to be performed according to first semanteme The exportable prompt message of TV, points out user to re-enter voice flow by voice-input device, the voice flow institute re-entered Corresponding semanteme should be that specifically, semanteme of the TV according to corresponding to the voice flow for re-entering can determine to be performed Concrete operations.When the voice flow that the television reception to user is re-entered, the TV directly passes through first language The semanteme corresponding to voice flow that the determination of sound engine is re-entered.Semanteme as corresponding to the voice flow being input into as user is When " zapping ", the TV not can determine that will specifically change to that TV station, and now, the TV can point out user to pass through voice Input equipment input will specifically change to the voice flow of that TV station.As user, the voice-input device input as described in " is changed To one, center " voice flow when, the TV directly determines the voice flow in " center one " by first speech engine Corresponding first is semantic, and TV platform is switched into one, center according to first semanteme.

Further, when the TV not can determine that the concrete operations to be performed according to first semanteme, the electricity Depending on corresponding operation can be performed according to the object according to the described first semantic determination frequency of use highest object.As worked as When stating TV and not can determine that user specifically wants to change to that TV station according to " zapping ", the TV determines that user watches frequency Highest TV station, user's viewing frequency highest TV station is switched to by TV platform.

Step S30, the voice flow is recognized when the local voice recognizer component fails, but third party's voice When recognizer component successfully recognizes the voice flow, known by the second speech engine corresponding with third party's speech recognition component The semanteme of not described voice flow, the second semanteme is designated as by the semanteme that second speech engine is recognized, semantic according to described second Perform corresponding operation.

The voice flow is recognized when the TV determines that the local voice recognizer component fails, but the third party Speech recognition component successfully recognizes the voice flow, you can drawn with finding voice corresponding with third party's speech recognition component Hold up recognize the voice flow it is semantic when, the TV determines and the second language corresponding to third party's speech recognition component Sound engine, the semanteme of the voice flow is recognized by second speech engine, the semanteme that second speech engine is recognized The second semanteme is designated as, is operated accordingly according to second semantic execution.

When the TV, the semanteme according to determined by second speech engine not can determine that the concrete operations to be performed When, held when semanteme not can determine that the concrete operations to be performed according to determined by first speech engine with the TV Capable operation is similar, will not be repeated here.

Third party's speech recognition component includes multiple voice engine.In the present embodiment, by third party's language Multiple voice engine included by sound recognizer component is designated as the second speech engine.

Further, the audio recognition method also includes：

Step a, institute is recognized when the local voice recognizer component and third party's speech recognition component all fail When stating voice flow, prompt message is exported, point out voice flow recognition failures described in user.

Further, when the TV determines the local voice recognizer component and third party's speech recognition component all Fail the identification voice flow when, the TV exports prompt message, points out voice flow recognition failures described in user.It is described The mode of voice flow recognition failures described in television reminding user includes but is not limited to the output character in video screen and points out, passes through The built-in voice-output device of the TV is exported corresponding voice and is pointed out or pointed out in the form of warning light.

Further, in the present embodiment, first speech engine includes languages switching vocabulary, remote-controller function word Converge and preset scene vocabulary.In other embodiments, first speech engine can also include other vocabulary.As described first " hello " is provided with speech engine and switches vocabulary as languages, when first speech engine determines what the TV was received During corresponding semantic " hello " for Chinese of voice flow, the TV in the subsequently received voice flow, by Chinese institute Corresponding speech engine determines the semanteme of the voice flow；When the first speech engine determines the voice flow that the TV is received During corresponding semantic " hello " for English, the TV in the subsequently received voice flow, by corresponding to English Speech engine determines the semanteme of the voice flow；When the first speech engine determines that the voice flow that the TV is received is corresponding When semanteme is " hello " of Guangdong language, the TV is drawn in the subsequently received voice flow by the voice corresponding to Guangdong language Hold up to determine the semanteme of the voice flow.The distant control function vocabulary is vocabulary corresponding with remote control common function, such as with increasing The corresponding vocabulary of function such as summation tone amount, reduction volume and zapping.The preset scene vocabulary is often with tv scene pair with user The vocabulary answered, such as weather, shopping vocabulary.The corresponding semanteme of voice flow that the TV as described in the first speech engine determines is received During for " shopping ", the TV in the subsequently received voice flow, by the speech engine corresponding to shopping to determine State the semanteme of voice flow.It should be noted that the TV is in first speech engine, by advance training, storage Semantic mapping table corresponding to various voice flows.

Second speech engine includes a kind of Default sound engine and various other speech engines.The Default sound draws It is the most frequently used speech engine in third party's speech recognition component to hold up, and can according to specific needs be set by user, such as may be used Baidu's speech engine is set to Default sound engine；Described other speech engines include but is not limited to Ali's speech engine Homeway.com Fly speech engine.

Further, the voice flow is recognized when the local voice recognizer component fails, or first language Sound engine fail the identification voice flow it is semantic when, the TV exports prompt message, points out voice flow described in user Recognition failures.Or the voice flow is recognized when the local voice recognizer component fails, the TV is then by described Third party's speech recognition component recognizes the voice flow；The language of the voice flow is recognized when first speech engine fails When adopted, the TV recognizes the semanteme of the voice flow by second speech engine.Only when third party's voice is known Other component fails and recognizes the voice flow, or second speech engine fails and recognizes the semanteme of the voice flow When, the TV just exports prompt message, points out voice flow recognition failures described in user.

The present embodiment by when receive with TV connect voice-input device transmitted by voice flow when, by local Speech recognition component and third party's speech recognition component recognize the voice flow；When the local voice recognizer component is successfully recognized During the voice flow, the language of the voice flow is recognized by the first speech engine corresponding with the local voice recognizer component Justice, the first semanteme is designated as by the semanteme that first speech engine is recognized, is operated accordingly according to first semantic execution；When The local voice recognizer component fails and recognizes the voice flow, but third party's speech recognition component successfully recognizes institute When stating voice flow, the language of the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component Justice, the second semanteme is designated as by the semanteme that second speech engine is recognized, is operated accordingly according to second semantic execution.It is real Show integrated in addition to the first local speech engine on television, also integrated various third party's speech engines, improve electricity Success rate and flexibility depending on speech recognition.

Further, the first embodiment based on audio recognition method of the present invention proposes the second embodiment of the present invention, ginseng It is in the present embodiment, described described by the second speech engine corresponding with third party's speech recognition component identification according to Fig. 2 The semantic step of voice flow includes：

Step S31, the semanteme of the voice flow is recognized by the Default sound engine in second speech engine；

Step S32, when the Default sound engine fail to recognize the voice flow it is semantic when, determine second voice The priority of other speech engines in engine；

Step S33, the voice flow is recognized according to the priority by described other speech engines from high to low successively Semanteme.

In the present embodiment, second speech engine includes a kind of Default sound engine and various other speech engines. In the TV by the language with the second speech engine identification voice flow corresponding to third party's speech recognition component In the process of justice, the Default sound engine that the TV is first passed through in second speech engine recognizes the language of the voice flow Justice.When the Default sound engine successfully recognize the voice flow it is semantic when, the TV is corresponding according to the semantic execution Operation.When the Default sound engine fail to recognize corresponding to the speech engine it is semantic when, the TV determines described The priority of other speech engines in second speech engine, and according to the priority height successively from high to low by described Other speech engines recognize the semanteme of the voice flow.

Further, include the step of the priority of other speech engines in determination second speech engine：

Step b, obtains access times of other speech engines in Preset Time in second speech engine；

Step c, the access times are sorted according to order from big to small, obtain ranking results；

Step d, the priority of other speech engines according to the ranking results determine.

Further, it is determined that the process of the priority of other speech engines is in second speech engine：The TV Obtain access times of other speech engines in Preset Time in second speech engine, by the access times according to from Small order sequence is arrived greatly, obtains ranking results.The TV other speech engines according to the ranking results determine Priority, that is, the priority for being arranged in speech engine above is higher than the priority for being arranged in speech engine below.

The present embodiment is recognizing the voice by the second speech engine corresponding with third party's speech recognition component During the semanteme of stream, the language of the voice flow is preferentially recognized by the Default sound engine in second speech engine Justice, when the Default sound engine fail to recognize the voice flow it is semantic when, based on the priority of other speech engines, The semanteme of the voice flow is recognized by described other speech engines.On the basis of TV speech recognition success rate is improved, carry The efficiency of the TV speech identification high.

The present invention further provides a kind of speech recognition system.

Reference picture 3, Fig. 3 is the high-level schematic functional block diagram of the preferred embodiment of speech recognition system of the present invention.

It is emphasized that it will be apparent to those skilled in the art that module map shown in Fig. 3 is only a preferred embodiment Exemplary plot, those skilled in the art can easily carry out new module around the module of the speech recognition system shown in Fig. 3 Supplement；The title of each module is self-defined title, is only used for aiding in each program function block for understanding the speech recognition system, no For limiting technical scheme, the core of technical solution of the present invention is, what the module of each self-defined title to be reached Function.

In the present embodiment, the speech recognition system includes：

First identification module 10, for when receive with TV connect voice-input device transmitted by voice flow when, The voice flow is recognized by local voice recognizer component and third party's speech recognition component；

Second identification module 20, for when the local voice recognizer component successfully recognizes the voice flow, by with The semanteme of the local voice recognizer component corresponding first speech engine identification voice flow, by first speech engine The semanteme of identification is designated as the first semanteme, is operated accordingly according to first semantic execution；

Second identification module 20 is additionally operable to be failed the identification voice flow when the local voice recognizer component, But when third party's speech recognition component successfully recognizes the voice flow, by corresponding with third party's speech recognition component The second speech engine recognize the semanteme of the voice flow, the semanteme that second speech engine is recognized is designated as it is second semantic, Operated accordingly according to second semantic execution.

Further, the speech recognition system also includes：

Output module, for all being failed when the local voice recognizer component and third party's speech recognition component When recognizing the voice flow, prompt message is exported, point out voice flow recognition failures described in user.

Further, the preferred embodiment based on speech recognition system of the present invention proposes the second embodiment of the present invention, ginseng Fig. 4 is examined, in the present embodiment, second identification module 20 includes：

Recognition unit 21, the language for recognizing the voice flow by the Default sound engine in second speech engine Justice；

Determining unit 22, for when the Default sound engine fail to recognize the voice flow it is semantic when, it is determined that described The priority of other speech engines in second speech engine；

The recognition unit 21, for being recognized by described other speech engines from high to low successively according to the priority The semanteme of the voice flow.

Further, other speech engines are default during the determining unit 22 is additionally operable to obtain second speech engine Access times in time；The access times are sorted according to order from big to small, ranking results are obtained；According to the row Sequence result determines the priority of other speech engines.

It should be noted that herein, term " including ", "comprising" or its any other variant be intended to non-row His property is included, so that process, method, article or system including a series of key elements not only include those key elements, and And also include other key elements being not expressly set out, or also include for this process, method, article or system institute are intrinsic Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this Also there is other identical element in the process of key element, method, article or system.

The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably implementation method.Based on such understanding, technical scheme is substantially done to prior art in other words The part for going out contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions are used to so that a station terminal equipment (can be mobile phone, computer, clothes Business device, air-conditioner, or network equipment etc.) perform method described in each embodiment of the invention.

The preferred embodiments of the present invention are these are only, the scope of the claims of the invention is not thereby limited, it is every to utilize this hair Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims

1. a kind of audio recognition method, it is characterised in that the audio recognition method is comprised the following steps：

When the voice flow transmitted by the voice-input device connected with TV is received, by local voice recognizer component and the Tripartite's speech recognition component recognizes the voice flow；

When the local voice recognizer component successfully recognizes the voice flow, by corresponding with the local voice recognizer component The first speech engine recognize the semanteme of the voice flow, the semanteme that first speech engine is recognized is designated as it is first semantic, Operated accordingly according to first semantic execution；

The voice flow is recognized when the local voice recognizer component fails, but third party's speech recognition component success When recognizing the voice flow, the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component Semanteme, the semanteme that second speech engine is recognized is designated as second semantic, grasped accordingly according to second semantic execution Make.

2. audio recognition method as claimed in claim 1, it is characterised in that first speech engine includes that languages switch word Remittance, remote-controller function vocabulary and preset scene vocabulary；

3. audio recognition method as claimed in claim 2, it is characterised in that it is described by with third party's speech recognition group Corresponding second speech engine of part recognizes that the semantic step of the voice flow includes：

When the Default sound engine fail to recognize the voice flow it is semantic when, determine other languages in second speech engine The priority of sound engine；

4. audio recognition method as claimed in claim 3, it is characterised in that in determination second speech engine other The step of priority of speech engine, includes：

5. the audio recognition method as described in any one of Claims 1-4, it is characterised in that described when receiving and TV connects During voice flow transmitted by the voice-input device for connecing, recognized by local voice recognizer component and third party's speech recognition component After the step of voice flow, also include：

When the local voice recognizer component and third party's speech recognition component all fail the identification voice flow, Output prompt message, points out voice flow recognition failures described in user.

6. a kind of speech recognition system, it is characterised in that the speech recognition system includes：

First identification module, for when the voice flow transmitted by the voice-input device connected with TV is received, by this Ground speech recognition component and third party's speech recognition component recognize the voice flow；

Second identification module, for when the local voice recognizer component successfully recognizes the voice flow, by with described Speech recognition component corresponding first speech engine in ground recognizes the semanteme of the voice flow, and first speech engine is recognized Semanteme is designated as the first semanteme, is operated accordingly according to first semantic execution；When the local voice recognizer component fails into Work(recognizes the voice flow, but third party's speech recognition component is when successfully recognizing the voice flow, by with the described 3rd The semanteme of the square speech recognition component corresponding second speech engine identification voice flow, second speech engine is recognized Semanteme is designated as the second semanteme, is operated accordingly according to second semantic execution.

7. speech recognition system as claimed in claim 6, it is characterised in that first speech engine includes that languages switch word Remittance, remote-controller function vocabulary and preset scene vocabulary；

8. speech recognition system as claimed in claim 7, it is characterised in that second identification module includes：

Recognition unit, the semanteme for recognizing the voice flow by the Default sound engine in second speech engine；

Determining unit, for when the Default sound engine fail to recognize the voice flow it is semantic when, determine second language The priority of other speech engines in sound engine；

The recognition unit is additionally operable to described by other speech engines identification from high to low successively according to the priority The semanteme of voice flow.

9. speech recognition system as claimed in claim 8, it is characterised in that the determining unit is additionally operable to obtain described second Access times of other speech engines in Preset Time in speech engine；By the access times according to order from big to small Sequence, obtains ranking results；The priority of other speech engines according to the ranking results determine.

10. the speech recognition system as described in any one of claim 6 to 9, it is characterised in that the speech recognition system is also wrapped Output module is included, for recognizing institute when the local voice recognizer component and third party's speech recognition component all fail When stating voice flow, prompt message is exported, point out voice flow recognition failures described in user.