CN106782561A - Audio recognition method and system - Google Patents
Audio recognition method and system Download PDFInfo
- Publication number
- CN106782561A CN106782561A CN201611135597.6A CN201611135597A CN106782561A CN 106782561 A CN106782561 A CN 106782561A CN 201611135597 A CN201611135597 A CN 201611135597A CN 106782561 A CN106782561 A CN 106782561A
- Authority
- CN
- China
- Prior art keywords
- speech
- voice
- engine
- voice flow
- semanteme
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000006870 function Effects 0.000 claims description 17
- 230000008569 process Effects 0.000 description 7
- 230000008859 change Effects 0.000 description 6
- 230000005611 electricity Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 241000208340 Araliaceae Species 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42203—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42204—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Acoustics & Sound (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention discloses a kind of audio recognition method and system, the method comprising the steps of:When voice flow is received, the voice flow is recognized by local voice recognizer component and third party's speech recognition component;When the local voice recognizer component successfully recognizes the voice flow, the semanteme of the voice flow is recognized by the first speech engine corresponding with the local voice recognizer component, the semanteme that first speech engine is recognized is designated as the first semanteme, is operated accordingly according to first semantic execution;The voice flow is recognized when the local voice recognizer component fails, but when third party's speech recognition component successfully recognizes the voice flow, the semanteme of the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component, the semanteme that second speech engine is recognized is designated as the second semanteme, is operated accordingly according to second semantic execution.The present invention improves success rate and the flexibility of TV speech identification.
Description
Technical field
The present invention relates to TV domain, more particularly to a kind of audio recognition method and system.
Background technology
With the development of science and technology, intelligent television has been popularized substantially, but the interaction of people and intelligent television, such as text
Word is input into, and the experience effect of the function such as content search is still bad.With the development of speech recognition technology, people are known by voice
Other engine can manipulate intelligent television, perform zapping, search and the operation such as content of viewing is wanted in program request oneself.
But, due to technology, major application developers are difficult oneself to realize speech engine, most application developers
All it is that the operation that voice is converted into word is completed by integrated third party's speech engine.But it is good at due to each speech engine
Field of speech recognition is different, and languages, dialect variation, it is difficult to finding a speech engine can meet the side of all demands
Case, so as to cause the phonetic function flexibility of intelligent television low, Consumer's Experience effect is poor.
The content of the invention
It is a primary object of the present invention to provide a kind of audio recognition method and system, it is intended to solve the voice of existing TV
The low technical problem of identification function flexibility.
To achieve the above object, the present invention provides a kind of audio recognition method, and the audio recognition method includes step:
When the voice flow transmitted by the voice-input device connected with TV is received, by local voice recognizer component
The voice flow is recognized with third party's speech recognition component;
When the local voice recognizer component successfully recognizes the voice flow, by with the local voice recognizer component
Corresponding first speech engine recognizes the semanteme of the voice flow, and the semanteme that first speech engine is recognized is designated as into the first language
Justice, operates accordingly according to first semantic execution;
The voice flow is recognized when the local voice recognizer component fails, but third party's speech recognition component
When success recognizes the voice flow, institute's predicate is recognized by the second speech engine corresponding with third party's speech recognition component
The semanteme of sound stream, the second semanteme is designated as by the semanteme that second speech engine is recognized, corresponding according to second semantic execution
Operation.
Preferably, first speech engine includes that languages switch vocabulary, remote-controller function vocabulary and preset scene vocabulary;
Second speech engine includes a kind of Default sound engine and various other speech engines.
Preferably, it is described that the voice is recognized by the second speech engine corresponding with third party's speech recognition component
The semantic step of stream includes:
The semanteme of the voice flow is recognized by the Default sound engine in second speech engine;
When the Default sound engine fail to recognize the voice flow it is semantic when, determine in second speech engine it
The priority of its speech engine;
Recognize the semanteme of the voice flow by described other speech engines from high to low successively according to the priority.
Preferably, include the step of the priority of other speech engines in determination second speech engine:
Obtain access times of other speech engines in Preset Time in second speech engine;
The access times are sorted according to order from big to small, ranking results are obtained;
The priority of other speech engines according to the ranking results determine.
Preferably, it is described when the voice flow transmitted by the voice-input device connected with TV is received, by local
After the step of speech recognition component and third party's speech recognition component recognize the voice flow, also include:
The voice is recognized when the local voice recognizer component and third party's speech recognition component all fail
During stream, prompt message is exported, point out voice flow recognition failures described in user.
Additionally, to achieve the above object, the present invention also provides a kind of speech recognition system, the speech recognition system bag
Include:
First identification module, for when the voice flow transmitted by the voice-input device connected with TV is received, leading to
Cross local voice recognizer component and third party's speech recognition component recognizes the voice flow;
Second identification module, for when the local voice recognizer component successfully recognizes the voice flow, by with institute
The semanteme that corresponding first speech engine of local voice recognizer component recognizes the voice flow is stated, first speech engine is known
Other semanteme is designated as the first semanteme, is operated accordingly according to first semantic execution;When the local voice recognizer component not
Can successfully recognize the voice flow, but third party's speech recognition component is when successfully recognizing the voice flow, by with it is described
Corresponding second speech engine of third party's speech recognition component recognizes the semanteme of the voice flow, and second speech engine is known
Other semanteme is designated as the second semanteme, is operated accordingly according to second semantic execution.
Preferably, first speech engine includes that languages switch vocabulary, remote-controller function vocabulary and preset scene vocabulary;
Second speech engine includes a kind of Default sound engine and various other speech engines.
Preferably, second identification module includes:
Recognition unit, the language for recognizing the voice flow by the Default sound engine in second speech engine
Justice;
Determining unit, for when the Default sound engine fail to recognize the voice flow it is semantic when, determine described
The priority of other speech engines in two speech engines;
The recognition unit is additionally operable to be recognized by described other speech engines from high to low successively according to the priority
The semanteme of the voice flow.
Preferably, other speech engines are in Preset Time during the determining unit is additionally operable to obtain second speech engine
Interior access times;The access times are sorted according to order from big to small, ranking results are obtained;According to the sequence knot
Fruit determines the priority of other speech engines.
Preferably, the speech recognition system also includes output module, for when the local voice recognizer component and institute
State third party's speech recognition component all fail identification the voice flow when, export prompt message, point out user described in voice
Stream recognition failures.
The present invention by when receive with TV connect voice-input device transmitted by voice flow when, by local language
Sound recognizer component and third party's speech recognition component recognize the voice flow;When the local voice recognizer component successfully recognizes institute
When stating voice flow, the semanteme of the voice flow is recognized by the first speech engine corresponding with the local voice recognizer component,
The semanteme that first speech engine is recognized is designated as the first semanteme, is operated accordingly according to first semantic execution;Work as institute
Local voice recognizer component is stated to fail the identification voice flow, but third party's speech recognition component successfully recognize it is described
During voice flow, the semanteme of the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component,
The semanteme that second speech engine is recognized is designated as the second semanteme, is operated accordingly according to second semantic execution.Realize
Integrated in addition to the first local speech engine on television, also integrated various third party's speech engines, improve TV
The success rate of speech recognition and flexibility.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of audio recognition method preferred embodiment of the present invention;
Fig. 2 be the embodiment of the present invention in by the second speech engine corresponding with third party's speech recognition component recognize
A kind of semantic schematic flow sheet of the voice flow;
Fig. 3 is the high-level schematic functional block diagram of speech recognition system preferred embodiment of the present invention;
Fig. 4 is a kind of high-level schematic functional block diagram of the second identification module in the embodiment of the present invention.
The realization of the object of the invention, functional characteristics and advantage will be described further referring to the drawings in conjunction with the embodiments.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of audio recognition method.
Reference picture 1, Fig. 1 is the schematic flow sheet of the preferred embodiment of audio recognition method of the present invention.
In the present embodiment, the audio recognition method includes:
Step S10, when the voice flow transmitted by the voice-input device connected with TV is received, by local voice
Recognizer component and third party's speech recognition component recognize the voice flow;
In the present embodiment, TV and voice-input device are connected, and the voice-input device is independently of the TV.
In other embodiments, the voice-input device can be also built in the TV, the CPU (Central with the TV
Processing Unit, central processing unit) connection.The voice-input device includes but is not limited to microphone and bluetooth earphone.
After the TV electrifying startup, when user needs to manipulate the TV, user is by being connected with the TV
Voice-input device sends voice flow to the TV.When the voice transmitted by the television reception to the voice-input device
During stream, the voice flow is sent into local voice recognizer component and third party's speech recognition component, by the local language
Sound recognizer component and third party's recognizer component recognize the voice flow.
It is understood that the local voice recognizer component is built in the TV.The TV is by SDK
(Software Development Kit, SDK) calls third party's speech recognition component.Further,
In the present embodiment, the speed of the voice flow is recognized to improve, when the television reception is to the voice flow, will be described
Voice flow is sent to the local voice recognizer component and third party's speech recognition component simultaneously.In other embodiments,
The voice flow can be first sent to the local voice recognizer component by the TV, when the local voice recognizer component fails
When success recognizes the voice flow, the voice flow is sent to third party's speech recognition component by the TV again.
Step S20, when the local voice recognizer component successfully recognizes the voice flow, by with the local voice
Corresponding first speech engine of recognizer component recognizes the semanteme of the voice flow, the semantic note that first speech engine is recognized
For first is semantic, operated accordingly according to first semantic execution;
When the TV determines that the local voice recognizer component successfully recognizes the voice flow, you can with find with it is described
The corresponding speech engine of local voice recognizer component come recognize the voice flow it is semantic when, the TV is by default system voice
Engine switches to first speech engine, is known by first speech engine corresponding with the local voice recognizer component
The semanteme of not described voice flow, the first semanteme is designated as by the semanteme that first speech engine is recognized, semantic according to described first
Perform corresponding operation.It should be noted that in the present embodiment, when the firm electrifying startup of the TV, the TV institute is right
The speech engine answered is default system speech engine.And in other embodiments, it is corresponding during the firm electrifying startup of the TV
The speech engine that speech engine can be used by TV last time shutdown.The first speech engine as described in passing through when the TV
When the semanteme for recognizing the voice flow is " zapping ", the TV performs zapping operation;When the TV is first as described in
When speech engine recognizes the semanteme of the voice flow for " by volume adjusting to 25 ", the TV then adjusts current volume and is
25。
It should be noted that pre-set various conventional voices corresponding with the local voice recognizer component drawing
Hold up, be designated as the first speech engine.
Further, it is described when the TV not can determine that the concrete operations to be performed according to first semanteme
The exportable prompt message of TV, points out user to re-enter voice flow by voice-input device, the voice flow institute re-entered
Corresponding semanteme should be that specifically, semanteme of the TV according to corresponding to the voice flow for re-entering can determine to be performed
Concrete operations.When the voice flow that the television reception to user is re-entered, the TV directly passes through first language
The semanteme corresponding to voice flow that the determination of sound engine is re-entered.Semanteme as corresponding to the voice flow being input into as user is
When " zapping ", the TV not can determine that will specifically change to that TV station, and now, the TV can point out user to pass through voice
Input equipment input will specifically change to the voice flow of that TV station.As user, the voice-input device input as described in " is changed
To one, center " voice flow when, the TV directly determines the voice flow in " center one " by first speech engine
Corresponding first is semantic, and TV platform is switched into one, center according to first semanteme.
Further, when the TV not can determine that the concrete operations to be performed according to first semanteme, the electricity
Depending on corresponding operation can be performed according to the object according to the described first semantic determination frequency of use highest object.As worked as
When stating TV and not can determine that user specifically wants to change to that TV station according to " zapping ", the TV determines that user watches frequency
Highest TV station, user's viewing frequency highest TV station is switched to by TV platform.
Step S30, the voice flow is recognized when the local voice recognizer component fails, but third party's voice
When recognizer component successfully recognizes the voice flow, known by the second speech engine corresponding with third party's speech recognition component
The semanteme of not described voice flow, the second semanteme is designated as by the semanteme that second speech engine is recognized, semantic according to described second
Perform corresponding operation.
The voice flow is recognized when the TV determines that the local voice recognizer component fails, but the third party
Speech recognition component successfully recognizes the voice flow, you can drawn with finding voice corresponding with third party's speech recognition component
Hold up recognize the voice flow it is semantic when, the TV determines and the second language corresponding to third party's speech recognition component
Sound engine, the semanteme of the voice flow is recognized by second speech engine, the semanteme that second speech engine is recognized
The second semanteme is designated as, is operated accordingly according to second semantic execution.
When the TV, the semanteme according to determined by second speech engine not can determine that the concrete operations to be performed
When, held when semanteme not can determine that the concrete operations to be performed according to determined by first speech engine with the TV
Capable operation is similar, will not be repeated here.
Third party's speech recognition component includes multiple voice engine.In the present embodiment, by third party's language
Multiple voice engine included by sound recognizer component is designated as the second speech engine.
Further, the audio recognition method also includes:
Step a, institute is recognized when the local voice recognizer component and third party's speech recognition component all fail
When stating voice flow, prompt message is exported, point out voice flow recognition failures described in user.
Further, when the TV determines the local voice recognizer component and third party's speech recognition component all
Fail the identification voice flow when, the TV exports prompt message, points out voice flow recognition failures described in user.It is described
The mode of voice flow recognition failures described in television reminding user includes but is not limited to the output character in video screen and points out, passes through
The built-in voice-output device of the TV is exported corresponding voice and is pointed out or pointed out in the form of warning light.
Further, in the present embodiment, first speech engine includes languages switching vocabulary, remote-controller function word
Converge and preset scene vocabulary.In other embodiments, first speech engine can also include other vocabulary.As described first
" hello " is provided with speech engine and switches vocabulary as languages, when first speech engine determines what the TV was received
During corresponding semantic " hello " for Chinese of voice flow, the TV in the subsequently received voice flow, by Chinese institute
Corresponding speech engine determines the semanteme of the voice flow;When the first speech engine determines the voice flow that the TV is received
During corresponding semantic " hello " for English, the TV in the subsequently received voice flow, by corresponding to English
Speech engine determines the semanteme of the voice flow;When the first speech engine determines that the voice flow that the TV is received is corresponding
When semanteme is " hello " of Guangdong language, the TV is drawn in the subsequently received voice flow by the voice corresponding to Guangdong language
Hold up to determine the semanteme of the voice flow.The distant control function vocabulary is vocabulary corresponding with remote control common function, such as with increasing
The corresponding vocabulary of function such as summation tone amount, reduction volume and zapping.The preset scene vocabulary is often with tv scene pair with user
The vocabulary answered, such as weather, shopping vocabulary.The corresponding semanteme of voice flow that the TV as described in the first speech engine determines is received
During for " shopping ", the TV in the subsequently received voice flow, by the speech engine corresponding to shopping to determine
State the semanteme of voice flow.It should be noted that the TV is in first speech engine, by advance training, storage
Semantic mapping table corresponding to various voice flows.
Second speech engine includes a kind of Default sound engine and various other speech engines.The Default sound draws
It is the most frequently used speech engine in third party's speech recognition component to hold up, and can according to specific needs be set by user, such as may be used
Baidu's speech engine is set to Default sound engine;Described other speech engines include but is not limited to Ali's speech engine Homeway.com
Fly speech engine.
Further, the voice flow is recognized when the local voice recognizer component fails, or first language
Sound engine fail the identification voice flow it is semantic when, the TV exports prompt message, points out voice flow described in user
Recognition failures.Or the voice flow is recognized when the local voice recognizer component fails, the TV is then by described
Third party's speech recognition component recognizes the voice flow;The language of the voice flow is recognized when first speech engine fails
When adopted, the TV recognizes the semanteme of the voice flow by second speech engine.Only when third party's voice is known
Other component fails and recognizes the voice flow, or second speech engine fails and recognizes the semanteme of the voice flow
When, the TV just exports prompt message, points out voice flow recognition failures described in user.
The present embodiment by when receive with TV connect voice-input device transmitted by voice flow when, by local
Speech recognition component and third party's speech recognition component recognize the voice flow;When the local voice recognizer component is successfully recognized
During the voice flow, the language of the voice flow is recognized by the first speech engine corresponding with the local voice recognizer component
Justice, the first semanteme is designated as by the semanteme that first speech engine is recognized, is operated accordingly according to first semantic execution;When
The local voice recognizer component fails and recognizes the voice flow, but third party's speech recognition component successfully recognizes institute
When stating voice flow, the language of the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component
Justice, the second semanteme is designated as by the semanteme that second speech engine is recognized, is operated accordingly according to second semantic execution.It is real
Show integrated in addition to the first local speech engine on television, also integrated various third party's speech engines, improve electricity
Success rate and flexibility depending on speech recognition.
Further, the first embodiment based on audio recognition method of the present invention proposes the second embodiment of the present invention, ginseng
It is in the present embodiment, described described by the second speech engine corresponding with third party's speech recognition component identification according to Fig. 2
The semantic step of voice flow includes:
Step S31, the semanteme of the voice flow is recognized by the Default sound engine in second speech engine;
Step S32, when the Default sound engine fail to recognize the voice flow it is semantic when, determine second voice
The priority of other speech engines in engine;
Step S33, the voice flow is recognized according to the priority by described other speech engines from high to low successively
Semanteme.
In the present embodiment, second speech engine includes a kind of Default sound engine and various other speech engines.
In the TV by the language with the second speech engine identification voice flow corresponding to third party's speech recognition component
In the process of justice, the Default sound engine that the TV is first passed through in second speech engine recognizes the language of the voice flow
Justice.When the Default sound engine successfully recognize the voice flow it is semantic when, the TV is corresponding according to the semantic execution
Operation.When the Default sound engine fail to recognize corresponding to the speech engine it is semantic when, the TV determines described
The priority of other speech engines in second speech engine, and according to the priority height successively from high to low by described
Other speech engines recognize the semanteme of the voice flow.
Further, include the step of the priority of other speech engines in determination second speech engine:
Step b, obtains access times of other speech engines in Preset Time in second speech engine;
Step c, the access times are sorted according to order from big to small, obtain ranking results;
Step d, the priority of other speech engines according to the ranking results determine.
Further, it is determined that the process of the priority of other speech engines is in second speech engine:The TV
Obtain access times of other speech engines in Preset Time in second speech engine, by the access times according to from
Small order sequence is arrived greatly, obtains ranking results.The TV other speech engines according to the ranking results determine
Priority, that is, the priority for being arranged in speech engine above is higher than the priority for being arranged in speech engine below.
The present embodiment is recognizing the voice by the second speech engine corresponding with third party's speech recognition component
During the semanteme of stream, the language of the voice flow is preferentially recognized by the Default sound engine in second speech engine
Justice, when the Default sound engine fail to recognize the voice flow it is semantic when, based on the priority of other speech engines,
The semanteme of the voice flow is recognized by described other speech engines.On the basis of TV speech recognition success rate is improved, carry
The efficiency of the TV speech identification high.
The present invention further provides a kind of speech recognition system.
Reference picture 3, Fig. 3 is the high-level schematic functional block diagram of the preferred embodiment of speech recognition system of the present invention.
It is emphasized that it will be apparent to those skilled in the art that module map shown in Fig. 3 is only a preferred embodiment
Exemplary plot, those skilled in the art can easily carry out new module around the module of the speech recognition system shown in Fig. 3
Supplement;The title of each module is self-defined title, is only used for aiding in each program function block for understanding the speech recognition system, no
For limiting technical scheme, the core of technical solution of the present invention is, what the module of each self-defined title to be reached
Function.
In the present embodiment, the speech recognition system includes:
First identification module 10, for when receive with TV connect voice-input device transmitted by voice flow when,
The voice flow is recognized by local voice recognizer component and third party's speech recognition component;
In the present embodiment, TV and voice-input device are connected, and the voice-input device is independently of the TV.
In other embodiments, the voice-input device can be also built in the TV, the CPU (Central with the TV
Processing Unit, central processing unit) connection.The voice-input device includes but is not limited to microphone and bluetooth earphone.
After the TV electrifying startup, when user needs to manipulate the TV, user is by being connected with the TV
Voice-input device sends voice flow to the TV.When the voice transmitted by the television reception to the voice-input device
During stream, the voice flow is sent into local voice recognizer component and third party's speech recognition component, by the local language
Sound recognizer component and third party's recognizer component recognize the voice flow.
It is understood that the local voice recognizer component is built in the TV.The TV is by SDK
(Software Development Kit, SDK) calls third party's speech recognition component.Further,
In the present embodiment, the speed of the voice flow is recognized to improve, when the television reception is to the voice flow, will be described
Voice flow is sent to the local voice recognizer component and third party's speech recognition component simultaneously.In other embodiments,
The voice flow can be first sent to the local voice recognizer component by the TV, when the local voice recognizer component fails
When success recognizes the voice flow, the voice flow is sent to third party's speech recognition component by the TV again.
Second identification module 20, for when the local voice recognizer component successfully recognizes the voice flow, by with
The semanteme of the local voice recognizer component corresponding first speech engine identification voice flow, by first speech engine
The semanteme of identification is designated as the first semanteme, is operated accordingly according to first semantic execution;
When the TV determines that the local voice recognizer component successfully recognizes the voice flow, you can with find with it is described
The corresponding speech engine of local voice recognizer component come recognize the voice flow it is semantic when, the TV is by default system voice
Engine switches to first speech engine, is known by first speech engine corresponding with the local voice recognizer component
The semanteme of not described voice flow, the first semanteme is designated as by the semanteme that first speech engine is recognized, semantic according to described first
Perform corresponding operation.It should be noted that in the present embodiment, when the firm electrifying startup of the TV, the TV institute is right
The speech engine answered is default system speech engine.And in other embodiments, it is corresponding during the firm electrifying startup of the TV
The speech engine that speech engine can be used by TV last time shutdown.The first speech engine as described in passing through when the TV
When the semanteme for recognizing the voice flow is " zapping ", the TV performs zapping operation;When the TV is first as described in
When speech engine recognizes the semanteme of the voice flow for " by volume adjusting to 25 ", the TV then adjusts current volume and is
25。
It should be noted that pre-set various conventional voices corresponding with the local voice recognizer component drawing
Hold up, be designated as the first speech engine.
Further, it is described when the TV not can determine that the concrete operations to be performed according to first semanteme
The exportable prompt message of TV, points out user to re-enter voice flow by voice-input device, the voice flow institute re-entered
Corresponding semanteme should be that specifically, semanteme of the TV according to corresponding to the voice flow for re-entering can determine to be performed
Concrete operations.When the voice flow that the television reception to user is re-entered, the TV directly passes through first language
The semanteme corresponding to voice flow that the determination of sound engine is re-entered.Semanteme as corresponding to the voice flow being input into as user is
When " zapping ", the TV not can determine that will specifically change to that TV station, and now, the TV can point out user to pass through voice
Input equipment input will specifically change to the voice flow of that TV station.As user, the voice-input device input as described in " is changed
To one, center " voice flow when, the TV directly determines the voice flow in " center one " by first speech engine
Corresponding first is semantic, and TV platform is switched into one, center according to first semanteme.
Further, when the TV not can determine that the concrete operations to be performed according to first semanteme, the electricity
Depending on corresponding operation can be performed according to the object according to the described first semantic determination frequency of use highest object.As worked as
When stating TV and not can determine that user specifically wants to change to that TV station according to " zapping ", the TV determines that user watches frequency
Highest TV station, user's viewing frequency highest TV station is switched to by TV platform.
Second identification module 20 is additionally operable to be failed the identification voice flow when the local voice recognizer component,
But when third party's speech recognition component successfully recognizes the voice flow, by corresponding with third party's speech recognition component
The second speech engine recognize the semanteme of the voice flow, the semanteme that second speech engine is recognized is designated as it is second semantic,
Operated accordingly according to second semantic execution.
The voice flow is recognized when the TV determines that the local voice recognizer component fails, but the third party
Speech recognition component successfully recognizes the voice flow, you can drawn with finding voice corresponding with third party's speech recognition component
Hold up recognize the voice flow it is semantic when, the TV determines and the second language corresponding to third party's speech recognition component
Sound engine, the semanteme of the voice flow is recognized by second speech engine, the semanteme that second speech engine is recognized
The second semanteme is designated as, is operated accordingly according to second semantic execution.
When the TV, the semanteme according to determined by second speech engine not can determine that the concrete operations to be performed
When, held when semanteme not can determine that the concrete operations to be performed according to determined by first speech engine with the TV
Capable operation is similar, will not be repeated here.
Third party's speech recognition component includes multiple voice engine.In the present embodiment, by third party's language
Multiple voice engine included by sound recognizer component is designated as the second speech engine.
Further, the speech recognition system also includes:
Output module, for all being failed when the local voice recognizer component and third party's speech recognition component
When recognizing the voice flow, prompt message is exported, point out voice flow recognition failures described in user.
Further, when the TV determines the local voice recognizer component and third party's speech recognition component all
Fail the identification voice flow when, the TV exports prompt message, points out voice flow recognition failures described in user.It is described
The mode of voice flow recognition failures described in television reminding user includes but is not limited to the output character in video screen and points out, passes through
The built-in voice-output device of the TV is exported corresponding voice and is pointed out or pointed out in the form of warning light.
Further, in the present embodiment, first speech engine includes languages switching vocabulary, remote-controller function word
Converge and preset scene vocabulary.In other embodiments, first speech engine can also include other vocabulary.As described first
" hello " is provided with speech engine and switches vocabulary as languages, when first speech engine determines what the TV was received
During corresponding semantic " hello " for Chinese of voice flow, the TV in the subsequently received voice flow, by Chinese institute
Corresponding speech engine determines the semanteme of the voice flow;When the first speech engine determines the voice flow that the TV is received
During corresponding semantic " hello " for English, the TV in the subsequently received voice flow, by corresponding to English
Speech engine determines the semanteme of the voice flow;When the first speech engine determines that the voice flow that the TV is received is corresponding
When semanteme is " hello " of Guangdong language, the TV is drawn in the subsequently received voice flow by the voice corresponding to Guangdong language
Hold up to determine the semanteme of the voice flow.The distant control function vocabulary is vocabulary corresponding with remote control common function, such as with increasing
The corresponding vocabulary of function such as summation tone amount, reduction volume and zapping.The preset scene vocabulary is often with tv scene pair with user
The vocabulary answered, such as weather, shopping vocabulary.The corresponding semanteme of voice flow that the TV as described in the first speech engine determines is received
During for " shopping ", the TV in the subsequently received voice flow, by the speech engine corresponding to shopping to determine
State the semanteme of voice flow.It should be noted that the TV is in first speech engine, by advance training, storage
Semantic mapping table corresponding to various voice flows.
Second speech engine includes a kind of Default sound engine and various other speech engines.The Default sound draws
It is the most frequently used speech engine in third party's speech recognition component to hold up, and can according to specific needs be set by user, such as may be used
Baidu's speech engine is set to Default sound engine;Described other speech engines include but is not limited to Ali's speech engine Homeway.com
Fly speech engine.
Further, the voice flow is recognized when the local voice recognizer component fails, or first language
Sound engine fail the identification voice flow it is semantic when, the TV exports prompt message, points out voice flow described in user
Recognition failures.Or the voice flow is recognized when the local voice recognizer component fails, the TV is then by described
Third party's speech recognition component recognizes the voice flow;The language of the voice flow is recognized when first speech engine fails
When adopted, the TV recognizes the semanteme of the voice flow by second speech engine.Only when third party's voice is known
Other component fails and recognizes the voice flow, or second speech engine fails and recognizes the semanteme of the voice flow
When, the TV just exports prompt message, points out voice flow recognition failures described in user.
The present embodiment by when receive with TV connect voice-input device transmitted by voice flow when, by local
Speech recognition component and third party's speech recognition component recognize the voice flow;When the local voice recognizer component is successfully recognized
During the voice flow, the language of the voice flow is recognized by the first speech engine corresponding with the local voice recognizer component
Justice, the first semanteme is designated as by the semanteme that first speech engine is recognized, is operated accordingly according to first semantic execution;When
The local voice recognizer component fails and recognizes the voice flow, but third party's speech recognition component successfully recognizes institute
When stating voice flow, the language of the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component
Justice, the second semanteme is designated as by the semanteme that second speech engine is recognized, is operated accordingly according to second semantic execution.It is real
Show integrated in addition to the first local speech engine on television, also integrated various third party's speech engines, improve electricity
Success rate and flexibility depending on speech recognition.
Further, the preferred embodiment based on speech recognition system of the present invention proposes the second embodiment of the present invention, ginseng
Fig. 4 is examined, in the present embodiment, second identification module 20 includes:
Recognition unit 21, the language for recognizing the voice flow by the Default sound engine in second speech engine
Justice;
Determining unit 22, for when the Default sound engine fail to recognize the voice flow it is semantic when, it is determined that described
The priority of other speech engines in second speech engine;
The recognition unit 21, for being recognized by described other speech engines from high to low successively according to the priority
The semanteme of the voice flow.
In the present embodiment, second speech engine includes a kind of Default sound engine and various other speech engines.
In the TV by the language with the second speech engine identification voice flow corresponding to third party's speech recognition component
In the process of justice, the Default sound engine that the TV is first passed through in second speech engine recognizes the language of the voice flow
Justice.When the Default sound engine successfully recognize the voice flow it is semantic when, the TV is corresponding according to the semantic execution
Operation.When the Default sound engine fail to recognize corresponding to the speech engine it is semantic when, the TV determines described
The priority of other speech engines in second speech engine, and according to the priority height successively from high to low by described
Other speech engines recognize the semanteme of the voice flow.
Further, other speech engines are default during the determining unit 22 is additionally operable to obtain second speech engine
Access times in time;The access times are sorted according to order from big to small, ranking results are obtained;According to the row
Sequence result determines the priority of other speech engines.
Further, it is determined that the process of the priority of other speech engines is in second speech engine:The TV
Obtain access times of other speech engines in Preset Time in second speech engine, by the access times according to from
Small order sequence is arrived greatly, obtains ranking results.The TV other speech engines according to the ranking results determine
Priority, that is, the priority for being arranged in speech engine above is higher than the priority for being arranged in speech engine below.
The present embodiment is recognizing the voice by the second speech engine corresponding with third party's speech recognition component
During the semanteme of stream, the language of the voice flow is preferentially recognized by the Default sound engine in second speech engine
Justice, when the Default sound engine fail to recognize the voice flow it is semantic when, based on the priority of other speech engines,
The semanteme of the voice flow is recognized by described other speech engines.On the basis of TV speech recognition success rate is improved, carry
The efficiency of the TV speech identification high.
It should be noted that herein, term " including ", "comprising" or its any other variant be intended to non-row
His property is included, so that process, method, article or system including a series of key elements not only include those key elements, and
And also include other key elements being not expressly set out, or also include for this process, method, article or system institute are intrinsic
Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this
Also there is other identical element in the process of key element, method, article or system.
The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably implementation method.Based on such understanding, technical scheme is substantially done to prior art in other words
The part for going out contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium
In (such as ROM/RAM, magnetic disc, CD), including some instructions are used to so that a station terminal equipment (can be mobile phone, computer, clothes
Business device, air-conditioner, or network equipment etc.) perform method described in each embodiment of the invention.
The preferred embodiments of the present invention are these are only, the scope of the claims of the invention is not thereby limited, it is every to utilize this hair
Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of audio recognition method, it is characterised in that the audio recognition method is comprised the following steps:
When the voice flow transmitted by the voice-input device connected with TV is received, by local voice recognizer component and the
Tripartite's speech recognition component recognizes the voice flow;
When the local voice recognizer component successfully recognizes the voice flow, by corresponding with the local voice recognizer component
The first speech engine recognize the semanteme of the voice flow, the semanteme that first speech engine is recognized is designated as it is first semantic,
Operated accordingly according to first semantic execution;
The voice flow is recognized when the local voice recognizer component fails, but third party's speech recognition component success
When recognizing the voice flow, the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component
Semanteme, the semanteme that second speech engine is recognized is designated as second semantic, grasped accordingly according to second semantic execution
Make.
2. audio recognition method as claimed in claim 1, it is characterised in that first speech engine includes that languages switch word
Remittance, remote-controller function vocabulary and preset scene vocabulary;
Second speech engine includes a kind of Default sound engine and various other speech engines.
3. audio recognition method as claimed in claim 2, it is characterised in that it is described by with third party's speech recognition group
Corresponding second speech engine of part recognizes that the semantic step of the voice flow includes:
The semanteme of the voice flow is recognized by the Default sound engine in second speech engine;
When the Default sound engine fail to recognize the voice flow it is semantic when, determine other languages in second speech engine
The priority of sound engine;
Recognize the semanteme of the voice flow by described other speech engines from high to low successively according to the priority.
4. audio recognition method as claimed in claim 3, it is characterised in that in determination second speech engine other
The step of priority of speech engine, includes:
Obtain access times of other speech engines in Preset Time in second speech engine;
The access times are sorted according to order from big to small, ranking results are obtained;
The priority of other speech engines according to the ranking results determine.
5. the audio recognition method as described in any one of Claims 1-4, it is characterised in that described when receiving and TV connects
During voice flow transmitted by the voice-input device for connecing, recognized by local voice recognizer component and third party's speech recognition component
After the step of voice flow, also include:
When the local voice recognizer component and third party's speech recognition component all fail the identification voice flow,
Output prompt message, points out voice flow recognition failures described in user.
6. a kind of speech recognition system, it is characterised in that the speech recognition system includes:
First identification module, for when the voice flow transmitted by the voice-input device connected with TV is received, by this
Ground speech recognition component and third party's speech recognition component recognize the voice flow;
Second identification module, for when the local voice recognizer component successfully recognizes the voice flow, by with described
Speech recognition component corresponding first speech engine in ground recognizes the semanteme of the voice flow, and first speech engine is recognized
Semanteme is designated as the first semanteme, is operated accordingly according to first semantic execution;When the local voice recognizer component fails into
Work(recognizes the voice flow, but third party's speech recognition component is when successfully recognizing the voice flow, by with the described 3rd
The semanteme of the square speech recognition component corresponding second speech engine identification voice flow, second speech engine is recognized
Semanteme is designated as the second semanteme, is operated accordingly according to second semantic execution.
7. speech recognition system as claimed in claim 6, it is characterised in that first speech engine includes that languages switch word
Remittance, remote-controller function vocabulary and preset scene vocabulary;
Second speech engine includes a kind of Default sound engine and various other speech engines.
8. speech recognition system as claimed in claim 7, it is characterised in that second identification module includes:
Recognition unit, the semanteme for recognizing the voice flow by the Default sound engine in second speech engine;
Determining unit, for when the Default sound engine fail to recognize the voice flow it is semantic when, determine second language
The priority of other speech engines in sound engine;
The recognition unit is additionally operable to described by other speech engines identification from high to low successively according to the priority
The semanteme of voice flow.
9. speech recognition system as claimed in claim 8, it is characterised in that the determining unit is additionally operable to obtain described second
Access times of other speech engines in Preset Time in speech engine;By the access times according to order from big to small
Sequence, obtains ranking results;The priority of other speech engines according to the ranking results determine.
10. the speech recognition system as described in any one of claim 6 to 9, it is characterised in that the speech recognition system is also wrapped
Output module is included, for recognizing institute when the local voice recognizer component and third party's speech recognition component all fail
When stating voice flow, prompt message is exported, point out voice flow recognition failures described in user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611135597.6A CN106782561A (en) | 2016-12-09 | 2016-12-09 | Audio recognition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611135597.6A CN106782561A (en) | 2016-12-09 | 2016-12-09 | Audio recognition method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106782561A true CN106782561A (en) | 2017-05-31 |
Family
ID=58879830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611135597.6A Pending CN106782561A (en) | 2016-12-09 | 2016-12-09 | Audio recognition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106782561A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107526512A (en) * | 2017-08-31 | 2017-12-29 | 联想(北京)有限公司 | Switching method and system for electronic equipment |
CN107945797A (en) * | 2017-12-07 | 2018-04-20 | 携程旅游信息技术(上海)有限公司 | Monitoring system based on speech recognition |
CN109410926A (en) * | 2018-11-27 | 2019-03-01 | 恒大法拉第未来智能汽车(广东)有限公司 | Voice method for recognizing semantics and system |
CN109859755A (en) * | 2019-03-13 | 2019-06-07 | 深圳市同行者科技有限公司 | A kind of audio recognition method, storage medium and terminal |
CN110223672A (en) * | 2019-05-16 | 2019-09-10 | 九牧厨卫股份有限公司 | A kind of multilingual audio recognition method of off-line type |
US11527240B2 (en) | 2018-11-21 | 2022-12-13 | Industrial Technology Research Institute | Speech recognition system, speech recognition method and computer program product |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1512484A (en) * | 2002-12-27 | 2004-07-14 | 联想(北京)有限公司 | Organizing and identifying method for natural language |
CN203340238U (en) * | 2012-09-28 | 2013-12-11 | 三星电子株式会社 | Image processing device |
CN103811006A (en) * | 2012-11-06 | 2014-05-21 | 三星电子株式会社 | Method and apparatus for voice recognition |
CN105206275A (en) * | 2015-08-31 | 2015-12-30 | 小米科技有限责任公司 | Device control method, apparatus and terminal |
CN105393302A (en) * | 2013-07-17 | 2016-03-09 | 三星电子株式会社 | Multi-level speech recognition |
CN105513593A (en) * | 2015-11-24 | 2016-04-20 | 南京师范大学 | Intelligent human-computer interaction method drove by voice |
CN105825853A (en) * | 2015-01-07 | 2016-08-03 | 中兴通讯股份有限公司 | Speech recognition device speech switching method and speech recognition device speech switching device |
CN105869636A (en) * | 2016-03-29 | 2016-08-17 | 上海斐讯数据通信技术有限公司 | Speech recognition apparatus and method thereof, smart television set and control method thereof |
CN106101789A (en) * | 2016-07-06 | 2016-11-09 | 深圳Tcl数字技术有限公司 | The voice interactive method of terminal and device |
CN106162285A (en) * | 2016-08-17 | 2016-11-23 | 合肥申目电子科技有限公司 | Speech interactive Multi-functional TV remote control |
CN106205615A (en) * | 2016-08-26 | 2016-12-07 | 王峥嵘 | A kind of control method based on interactive voice and system |
-
2016
- 2016-12-09 CN CN201611135597.6A patent/CN106782561A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1512484A (en) * | 2002-12-27 | 2004-07-14 | 联想(北京)有限公司 | Organizing and identifying method for natural language |
CN203340238U (en) * | 2012-09-28 | 2013-12-11 | 三星电子株式会社 | Image processing device |
CN103811006A (en) * | 2012-11-06 | 2014-05-21 | 三星电子株式会社 | Method and apparatus for voice recognition |
CN105393302A (en) * | 2013-07-17 | 2016-03-09 | 三星电子株式会社 | Multi-level speech recognition |
CN105825853A (en) * | 2015-01-07 | 2016-08-03 | 中兴通讯股份有限公司 | Speech recognition device speech switching method and speech recognition device speech switching device |
CN105206275A (en) * | 2015-08-31 | 2015-12-30 | 小米科技有限责任公司 | Device control method, apparatus and terminal |
CN105513593A (en) * | 2015-11-24 | 2016-04-20 | 南京师范大学 | Intelligent human-computer interaction method drove by voice |
CN105869636A (en) * | 2016-03-29 | 2016-08-17 | 上海斐讯数据通信技术有限公司 | Speech recognition apparatus and method thereof, smart television set and control method thereof |
CN106101789A (en) * | 2016-07-06 | 2016-11-09 | 深圳Tcl数字技术有限公司 | The voice interactive method of terminal and device |
CN106162285A (en) * | 2016-08-17 | 2016-11-23 | 合肥申目电子科技有限公司 | Speech interactive Multi-functional TV remote control |
CN106205615A (en) * | 2016-08-26 | 2016-12-07 | 王峥嵘 | A kind of control method based on interactive voice and system |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107526512A (en) * | 2017-08-31 | 2017-12-29 | 联想(北京)有限公司 | Switching method and system for electronic equipment |
CN107945797A (en) * | 2017-12-07 | 2018-04-20 | 携程旅游信息技术(上海)有限公司 | Monitoring system based on speech recognition |
CN107945797B (en) * | 2017-12-07 | 2021-12-31 | 携程旅游信息技术(上海)有限公司 | Monitoring system based on speech recognition |
US11527240B2 (en) | 2018-11-21 | 2022-12-13 | Industrial Technology Research Institute | Speech recognition system, speech recognition method and computer program product |
CN109410926A (en) * | 2018-11-27 | 2019-03-01 | 恒大法拉第未来智能汽车(广东)有限公司 | Voice method for recognizing semantics and system |
CN109859755A (en) * | 2019-03-13 | 2019-06-07 | 深圳市同行者科技有限公司 | A kind of audio recognition method, storage medium and terminal |
CN110223672A (en) * | 2019-05-16 | 2019-09-10 | 九牧厨卫股份有限公司 | A kind of multilingual audio recognition method of off-line type |
CN110223672B (en) * | 2019-05-16 | 2021-04-23 | 九牧厨卫股份有限公司 | Offline multi-language voice recognition method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106782561A (en) | Audio recognition method and system | |
CN108000526B (en) | Dialogue interaction method and system for intelligent robot | |
EP3039531B1 (en) | Display apparatus and controlling method thereof | |
KR102211595B1 (en) | Speech recognition apparatus and control method thereof | |
CN105161106A (en) | Voice control method of intelligent terminal, voice control device and television system | |
CN101576901B (en) | Method for generating search request and mobile communication equipment | |
CN109637548A (en) | Voice interactive method and device based on Application on Voiceprint Recognition | |
CN106796496A (en) | Display device and its operating method | |
CN105531758B (en) | Use the speech recognition of foreign words grammer | |
JP2016095383A (en) | Voice recognition client device and server-type voice recognition device | |
CN103491411A (en) | Method and device based on language recommending channels | |
CN107040452B (en) | Information processing method and device and computer readable storage medium | |
WO2014181524A1 (en) | Conversation processing system and program | |
WO2017128775A1 (en) | Voice control system, voice processing method and terminal device | |
CN106205615A (en) | A kind of control method based on interactive voice and system | |
KR20150093482A (en) | System for Speaker Diarization based Multilateral Automatic Speech Translation System and its operating Method, and Apparatus supporting the same | |
CN110704590B (en) | Method and apparatus for augmenting training samples | |
CN107155121B (en) | Voice control text display method and device | |
CN105898525A (en) | Method of searching videos in specific video database, and video terminal thereof | |
CN102316361A (en) | Audio-frequency / video-frequency on demand method based on natural speech recognition and system thereof | |
CN106027752A (en) | Self-adaption method and device for mobile terminal call background sounds | |
CN103546790A (en) | Language interaction method and language interaction system on basis of mobile terminal and interactive television | |
CN108806688A (en) | Sound control method, smart television, system and the storage medium of smart television | |
JP6625772B2 (en) | Search method and electronic device using the same | |
WO2023184942A1 (en) | Voice interaction method and apparatus and electric appliance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |
|
RJ01 | Rejection of invention patent application after publication |