CN106782561A - Audio recognition method and system - Google Patents

Audio recognition method and system Download PDF

Info

Publication number
CN106782561A
CN106782561A CN201611135597.6A CN201611135597A CN106782561A CN 106782561 A CN106782561 A CN 106782561A CN 201611135597 A CN201611135597 A CN 201611135597A CN 106782561 A CN106782561 A CN 106782561A
Authority
CN
China
Prior art keywords
speech
voice
engine
voice flow
semanteme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611135597.6A
Other languages
Chinese (zh)
Inventor
李鑫伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen TCL New Technology Co Ltd
Shenzhen TCL Digital Technology Co Ltd
Original Assignee
Shenzhen TCL New Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen TCL New Technology Co Ltd filed Critical Shenzhen TCL New Technology Co Ltd
Priority to CN201611135597.6A priority Critical patent/CN106782561A/en
Publication of CN106782561A publication Critical patent/CN106782561A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a kind of audio recognition method and system, the method comprising the steps of:When voice flow is received, the voice flow is recognized by local voice recognizer component and third party's speech recognition component;When the local voice recognizer component successfully recognizes the voice flow, the semanteme of the voice flow is recognized by the first speech engine corresponding with the local voice recognizer component, the semanteme that first speech engine is recognized is designated as the first semanteme, is operated accordingly according to first semantic execution;The voice flow is recognized when the local voice recognizer component fails, but when third party's speech recognition component successfully recognizes the voice flow, the semanteme of the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component, the semanteme that second speech engine is recognized is designated as the second semanteme, is operated accordingly according to second semantic execution.The present invention improves success rate and the flexibility of TV speech identification.

Description

Audio recognition method and system
Technical field
The present invention relates to TV domain, more particularly to a kind of audio recognition method and system.
Background technology
With the development of science and technology, intelligent television has been popularized substantially, but the interaction of people and intelligent television, such as text Word is input into, and the experience effect of the function such as content search is still bad.With the development of speech recognition technology, people are known by voice Other engine can manipulate intelligent television, perform zapping, search and the operation such as content of viewing is wanted in program request oneself.
But, due to technology, major application developers are difficult oneself to realize speech engine, most application developers All it is that the operation that voice is converted into word is completed by integrated third party's speech engine.But it is good at due to each speech engine Field of speech recognition is different, and languages, dialect variation, it is difficult to finding a speech engine can meet the side of all demands Case, so as to cause the phonetic function flexibility of intelligent television low, Consumer's Experience effect is poor.
The content of the invention
It is a primary object of the present invention to provide a kind of audio recognition method and system, it is intended to solve the voice of existing TV The low technical problem of identification function flexibility.
To achieve the above object, the present invention provides a kind of audio recognition method, and the audio recognition method includes step:
When the voice flow transmitted by the voice-input device connected with TV is received, by local voice recognizer component The voice flow is recognized with third party's speech recognition component;
When the local voice recognizer component successfully recognizes the voice flow, by with the local voice recognizer component Corresponding first speech engine recognizes the semanteme of the voice flow, and the semanteme that first speech engine is recognized is designated as into the first language Justice, operates accordingly according to first semantic execution;
The voice flow is recognized when the local voice recognizer component fails, but third party's speech recognition component When success recognizes the voice flow, institute's predicate is recognized by the second speech engine corresponding with third party's speech recognition component The semanteme of sound stream, the second semanteme is designated as by the semanteme that second speech engine is recognized, corresponding according to second semantic execution Operation.
Preferably, first speech engine includes that languages switch vocabulary, remote-controller function vocabulary and preset scene vocabulary;
Second speech engine includes a kind of Default sound engine and various other speech engines.
Preferably, it is described that the voice is recognized by the second speech engine corresponding with third party's speech recognition component The semantic step of stream includes:
The semanteme of the voice flow is recognized by the Default sound engine in second speech engine;
When the Default sound engine fail to recognize the voice flow it is semantic when, determine in second speech engine it The priority of its speech engine;
Recognize the semanteme of the voice flow by described other speech engines from high to low successively according to the priority.
Preferably, include the step of the priority of other speech engines in determination second speech engine:
Obtain access times of other speech engines in Preset Time in second speech engine;
The access times are sorted according to order from big to small, ranking results are obtained;
The priority of other speech engines according to the ranking results determine.
Preferably, it is described when the voice flow transmitted by the voice-input device connected with TV is received, by local After the step of speech recognition component and third party's speech recognition component recognize the voice flow, also include:
The voice is recognized when the local voice recognizer component and third party's speech recognition component all fail During stream, prompt message is exported, point out voice flow recognition failures described in user.
Additionally, to achieve the above object, the present invention also provides a kind of speech recognition system, the speech recognition system bag Include:
First identification module, for when the voice flow transmitted by the voice-input device connected with TV is received, leading to Cross local voice recognizer component and third party's speech recognition component recognizes the voice flow;
Second identification module, for when the local voice recognizer component successfully recognizes the voice flow, by with institute The semanteme that corresponding first speech engine of local voice recognizer component recognizes the voice flow is stated, first speech engine is known Other semanteme is designated as the first semanteme, is operated accordingly according to first semantic execution;When the local voice recognizer component not Can successfully recognize the voice flow, but third party's speech recognition component is when successfully recognizing the voice flow, by with it is described Corresponding second speech engine of third party's speech recognition component recognizes the semanteme of the voice flow, and second speech engine is known Other semanteme is designated as the second semanteme, is operated accordingly according to second semantic execution.
Preferably, first speech engine includes that languages switch vocabulary, remote-controller function vocabulary and preset scene vocabulary;
Second speech engine includes a kind of Default sound engine and various other speech engines.
Preferably, second identification module includes:
Recognition unit, the language for recognizing the voice flow by the Default sound engine in second speech engine Justice;
Determining unit, for when the Default sound engine fail to recognize the voice flow it is semantic when, determine described The priority of other speech engines in two speech engines;
The recognition unit is additionally operable to be recognized by described other speech engines from high to low successively according to the priority The semanteme of the voice flow.
Preferably, other speech engines are in Preset Time during the determining unit is additionally operable to obtain second speech engine Interior access times;The access times are sorted according to order from big to small, ranking results are obtained;According to the sequence knot Fruit determines the priority of other speech engines.
Preferably, the speech recognition system also includes output module, for when the local voice recognizer component and institute State third party's speech recognition component all fail identification the voice flow when, export prompt message, point out user described in voice Stream recognition failures.
The present invention by when receive with TV connect voice-input device transmitted by voice flow when, by local language Sound recognizer component and third party's speech recognition component recognize the voice flow;When the local voice recognizer component successfully recognizes institute When stating voice flow, the semanteme of the voice flow is recognized by the first speech engine corresponding with the local voice recognizer component, The semanteme that first speech engine is recognized is designated as the first semanteme, is operated accordingly according to first semantic execution;Work as institute Local voice recognizer component is stated to fail the identification voice flow, but third party's speech recognition component successfully recognize it is described During voice flow, the semanteme of the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component, The semanteme that second speech engine is recognized is designated as the second semanteme, is operated accordingly according to second semantic execution.Realize Integrated in addition to the first local speech engine on television, also integrated various third party's speech engines, improve TV The success rate of speech recognition and flexibility.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of audio recognition method preferred embodiment of the present invention;
Fig. 2 be the embodiment of the present invention in by the second speech engine corresponding with third party's speech recognition component recognize A kind of semantic schematic flow sheet of the voice flow;
Fig. 3 is the high-level schematic functional block diagram of speech recognition system preferred embodiment of the present invention;
Fig. 4 is a kind of high-level schematic functional block diagram of the second identification module in the embodiment of the present invention.
The realization of the object of the invention, functional characteristics and advantage will be described further referring to the drawings in conjunction with the embodiments.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of audio recognition method.
Reference picture 1, Fig. 1 is the schematic flow sheet of the preferred embodiment of audio recognition method of the present invention.
In the present embodiment, the audio recognition method includes:
Step S10, when the voice flow transmitted by the voice-input device connected with TV is received, by local voice Recognizer component and third party's speech recognition component recognize the voice flow;
In the present embodiment, TV and voice-input device are connected, and the voice-input device is independently of the TV. In other embodiments, the voice-input device can be also built in the TV, the CPU (Central with the TV Processing Unit, central processing unit) connection.The voice-input device includes but is not limited to microphone and bluetooth earphone.
After the TV electrifying startup, when user needs to manipulate the TV, user is by being connected with the TV Voice-input device sends voice flow to the TV.When the voice transmitted by the television reception to the voice-input device During stream, the voice flow is sent into local voice recognizer component and third party's speech recognition component, by the local language Sound recognizer component and third party's recognizer component recognize the voice flow.
It is understood that the local voice recognizer component is built in the TV.The TV is by SDK (Software Development Kit, SDK) calls third party's speech recognition component.Further, In the present embodiment, the speed of the voice flow is recognized to improve, when the television reception is to the voice flow, will be described Voice flow is sent to the local voice recognizer component and third party's speech recognition component simultaneously.In other embodiments, The voice flow can be first sent to the local voice recognizer component by the TV, when the local voice recognizer component fails When success recognizes the voice flow, the voice flow is sent to third party's speech recognition component by the TV again.
Step S20, when the local voice recognizer component successfully recognizes the voice flow, by with the local voice Corresponding first speech engine of recognizer component recognizes the semanteme of the voice flow, the semantic note that first speech engine is recognized For first is semantic, operated accordingly according to first semantic execution;
When the TV determines that the local voice recognizer component successfully recognizes the voice flow, you can with find with it is described The corresponding speech engine of local voice recognizer component come recognize the voice flow it is semantic when, the TV is by default system voice Engine switches to first speech engine, is known by first speech engine corresponding with the local voice recognizer component The semanteme of not described voice flow, the first semanteme is designated as by the semanteme that first speech engine is recognized, semantic according to described first Perform corresponding operation.It should be noted that in the present embodiment, when the firm electrifying startup of the TV, the TV institute is right The speech engine answered is default system speech engine.And in other embodiments, it is corresponding during the firm electrifying startup of the TV The speech engine that speech engine can be used by TV last time shutdown.The first speech engine as described in passing through when the TV When the semanteme for recognizing the voice flow is " zapping ", the TV performs zapping operation;When the TV is first as described in When speech engine recognizes the semanteme of the voice flow for " by volume adjusting to 25 ", the TV then adjusts current volume and is 25。
It should be noted that pre-set various conventional voices corresponding with the local voice recognizer component drawing Hold up, be designated as the first speech engine.
Further, it is described when the TV not can determine that the concrete operations to be performed according to first semanteme The exportable prompt message of TV, points out user to re-enter voice flow by voice-input device, the voice flow institute re-entered Corresponding semanteme should be that specifically, semanteme of the TV according to corresponding to the voice flow for re-entering can determine to be performed Concrete operations.When the voice flow that the television reception to user is re-entered, the TV directly passes through first language The semanteme corresponding to voice flow that the determination of sound engine is re-entered.Semanteme as corresponding to the voice flow being input into as user is When " zapping ", the TV not can determine that will specifically change to that TV station, and now, the TV can point out user to pass through voice Input equipment input will specifically change to the voice flow of that TV station.As user, the voice-input device input as described in " is changed To one, center " voice flow when, the TV directly determines the voice flow in " center one " by first speech engine Corresponding first is semantic, and TV platform is switched into one, center according to first semanteme.
Further, when the TV not can determine that the concrete operations to be performed according to first semanteme, the electricity Depending on corresponding operation can be performed according to the object according to the described first semantic determination frequency of use highest object.As worked as When stating TV and not can determine that user specifically wants to change to that TV station according to " zapping ", the TV determines that user watches frequency Highest TV station, user's viewing frequency highest TV station is switched to by TV platform.
Step S30, the voice flow is recognized when the local voice recognizer component fails, but third party's voice When recognizer component successfully recognizes the voice flow, known by the second speech engine corresponding with third party's speech recognition component The semanteme of not described voice flow, the second semanteme is designated as by the semanteme that second speech engine is recognized, semantic according to described second Perform corresponding operation.
The voice flow is recognized when the TV determines that the local voice recognizer component fails, but the third party Speech recognition component successfully recognizes the voice flow, you can drawn with finding voice corresponding with third party's speech recognition component Hold up recognize the voice flow it is semantic when, the TV determines and the second language corresponding to third party's speech recognition component Sound engine, the semanteme of the voice flow is recognized by second speech engine, the semanteme that second speech engine is recognized The second semanteme is designated as, is operated accordingly according to second semantic execution.
When the TV, the semanteme according to determined by second speech engine not can determine that the concrete operations to be performed When, held when semanteme not can determine that the concrete operations to be performed according to determined by first speech engine with the TV Capable operation is similar, will not be repeated here.
Third party's speech recognition component includes multiple voice engine.In the present embodiment, by third party's language Multiple voice engine included by sound recognizer component is designated as the second speech engine.
Further, the audio recognition method also includes:
Step a, institute is recognized when the local voice recognizer component and third party's speech recognition component all fail When stating voice flow, prompt message is exported, point out voice flow recognition failures described in user.
Further, when the TV determines the local voice recognizer component and third party's speech recognition component all Fail the identification voice flow when, the TV exports prompt message, points out voice flow recognition failures described in user.It is described The mode of voice flow recognition failures described in television reminding user includes but is not limited to the output character in video screen and points out, passes through The built-in voice-output device of the TV is exported corresponding voice and is pointed out or pointed out in the form of warning light.
Further, in the present embodiment, first speech engine includes languages switching vocabulary, remote-controller function word Converge and preset scene vocabulary.In other embodiments, first speech engine can also include other vocabulary.As described first " hello " is provided with speech engine and switches vocabulary as languages, when first speech engine determines what the TV was received During corresponding semantic " hello " for Chinese of voice flow, the TV in the subsequently received voice flow, by Chinese institute Corresponding speech engine determines the semanteme of the voice flow;When the first speech engine determines the voice flow that the TV is received During corresponding semantic " hello " for English, the TV in the subsequently received voice flow, by corresponding to English Speech engine determines the semanteme of the voice flow;When the first speech engine determines that the voice flow that the TV is received is corresponding When semanteme is " hello " of Guangdong language, the TV is drawn in the subsequently received voice flow by the voice corresponding to Guangdong language Hold up to determine the semanteme of the voice flow.The distant control function vocabulary is vocabulary corresponding with remote control common function, such as with increasing The corresponding vocabulary of function such as summation tone amount, reduction volume and zapping.The preset scene vocabulary is often with tv scene pair with user The vocabulary answered, such as weather, shopping vocabulary.The corresponding semanteme of voice flow that the TV as described in the first speech engine determines is received During for " shopping ", the TV in the subsequently received voice flow, by the speech engine corresponding to shopping to determine State the semanteme of voice flow.It should be noted that the TV is in first speech engine, by advance training, storage Semantic mapping table corresponding to various voice flows.
Second speech engine includes a kind of Default sound engine and various other speech engines.The Default sound draws It is the most frequently used speech engine in third party's speech recognition component to hold up, and can according to specific needs be set by user, such as may be used Baidu's speech engine is set to Default sound engine;Described other speech engines include but is not limited to Ali's speech engine Homeway.com Fly speech engine.
Further, the voice flow is recognized when the local voice recognizer component fails, or first language Sound engine fail the identification voice flow it is semantic when, the TV exports prompt message, points out voice flow described in user Recognition failures.Or the voice flow is recognized when the local voice recognizer component fails, the TV is then by described Third party's speech recognition component recognizes the voice flow;The language of the voice flow is recognized when first speech engine fails When adopted, the TV recognizes the semanteme of the voice flow by second speech engine.Only when third party's voice is known Other component fails and recognizes the voice flow, or second speech engine fails and recognizes the semanteme of the voice flow When, the TV just exports prompt message, points out voice flow recognition failures described in user.
The present embodiment by when receive with TV connect voice-input device transmitted by voice flow when, by local Speech recognition component and third party's speech recognition component recognize the voice flow;When the local voice recognizer component is successfully recognized During the voice flow, the language of the voice flow is recognized by the first speech engine corresponding with the local voice recognizer component Justice, the first semanteme is designated as by the semanteme that first speech engine is recognized, is operated accordingly according to first semantic execution;When The local voice recognizer component fails and recognizes the voice flow, but third party's speech recognition component successfully recognizes institute When stating voice flow, the language of the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component Justice, the second semanteme is designated as by the semanteme that second speech engine is recognized, is operated accordingly according to second semantic execution.It is real Show integrated in addition to the first local speech engine on television, also integrated various third party's speech engines, improve electricity Success rate and flexibility depending on speech recognition.
Further, the first embodiment based on audio recognition method of the present invention proposes the second embodiment of the present invention, ginseng It is in the present embodiment, described described by the second speech engine corresponding with third party's speech recognition component identification according to Fig. 2 The semantic step of voice flow includes:
Step S31, the semanteme of the voice flow is recognized by the Default sound engine in second speech engine;
Step S32, when the Default sound engine fail to recognize the voice flow it is semantic when, determine second voice The priority of other speech engines in engine;
Step S33, the voice flow is recognized according to the priority by described other speech engines from high to low successively Semanteme.
In the present embodiment, second speech engine includes a kind of Default sound engine and various other speech engines. In the TV by the language with the second speech engine identification voice flow corresponding to third party's speech recognition component In the process of justice, the Default sound engine that the TV is first passed through in second speech engine recognizes the language of the voice flow Justice.When the Default sound engine successfully recognize the voice flow it is semantic when, the TV is corresponding according to the semantic execution Operation.When the Default sound engine fail to recognize corresponding to the speech engine it is semantic when, the TV determines described The priority of other speech engines in second speech engine, and according to the priority height successively from high to low by described Other speech engines recognize the semanteme of the voice flow.
Further, include the step of the priority of other speech engines in determination second speech engine:
Step b, obtains access times of other speech engines in Preset Time in second speech engine;
Step c, the access times are sorted according to order from big to small, obtain ranking results;
Step d, the priority of other speech engines according to the ranking results determine.
Further, it is determined that the process of the priority of other speech engines is in second speech engine:The TV Obtain access times of other speech engines in Preset Time in second speech engine, by the access times according to from Small order sequence is arrived greatly, obtains ranking results.The TV other speech engines according to the ranking results determine Priority, that is, the priority for being arranged in speech engine above is higher than the priority for being arranged in speech engine below.
The present embodiment is recognizing the voice by the second speech engine corresponding with third party's speech recognition component During the semanteme of stream, the language of the voice flow is preferentially recognized by the Default sound engine in second speech engine Justice, when the Default sound engine fail to recognize the voice flow it is semantic when, based on the priority of other speech engines, The semanteme of the voice flow is recognized by described other speech engines.On the basis of TV speech recognition success rate is improved, carry The efficiency of the TV speech identification high.
The present invention further provides a kind of speech recognition system.
Reference picture 3, Fig. 3 is the high-level schematic functional block diagram of the preferred embodiment of speech recognition system of the present invention.
It is emphasized that it will be apparent to those skilled in the art that module map shown in Fig. 3 is only a preferred embodiment Exemplary plot, those skilled in the art can easily carry out new module around the module of the speech recognition system shown in Fig. 3 Supplement;The title of each module is self-defined title, is only used for aiding in each program function block for understanding the speech recognition system, no For limiting technical scheme, the core of technical solution of the present invention is, what the module of each self-defined title to be reached Function.
In the present embodiment, the speech recognition system includes:
First identification module 10, for when receive with TV connect voice-input device transmitted by voice flow when, The voice flow is recognized by local voice recognizer component and third party's speech recognition component;
In the present embodiment, TV and voice-input device are connected, and the voice-input device is independently of the TV. In other embodiments, the voice-input device can be also built in the TV, the CPU (Central with the TV Processing Unit, central processing unit) connection.The voice-input device includes but is not limited to microphone and bluetooth earphone.
After the TV electrifying startup, when user needs to manipulate the TV, user is by being connected with the TV Voice-input device sends voice flow to the TV.When the voice transmitted by the television reception to the voice-input device During stream, the voice flow is sent into local voice recognizer component and third party's speech recognition component, by the local language Sound recognizer component and third party's recognizer component recognize the voice flow.
It is understood that the local voice recognizer component is built in the TV.The TV is by SDK (Software Development Kit, SDK) calls third party's speech recognition component.Further, In the present embodiment, the speed of the voice flow is recognized to improve, when the television reception is to the voice flow, will be described Voice flow is sent to the local voice recognizer component and third party's speech recognition component simultaneously.In other embodiments, The voice flow can be first sent to the local voice recognizer component by the TV, when the local voice recognizer component fails When success recognizes the voice flow, the voice flow is sent to third party's speech recognition component by the TV again.
Second identification module 20, for when the local voice recognizer component successfully recognizes the voice flow, by with The semanteme of the local voice recognizer component corresponding first speech engine identification voice flow, by first speech engine The semanteme of identification is designated as the first semanteme, is operated accordingly according to first semantic execution;
When the TV determines that the local voice recognizer component successfully recognizes the voice flow, you can with find with it is described The corresponding speech engine of local voice recognizer component come recognize the voice flow it is semantic when, the TV is by default system voice Engine switches to first speech engine, is known by first speech engine corresponding with the local voice recognizer component The semanteme of not described voice flow, the first semanteme is designated as by the semanteme that first speech engine is recognized, semantic according to described first Perform corresponding operation.It should be noted that in the present embodiment, when the firm electrifying startup of the TV, the TV institute is right The speech engine answered is default system speech engine.And in other embodiments, it is corresponding during the firm electrifying startup of the TV The speech engine that speech engine can be used by TV last time shutdown.The first speech engine as described in passing through when the TV When the semanteme for recognizing the voice flow is " zapping ", the TV performs zapping operation;When the TV is first as described in When speech engine recognizes the semanteme of the voice flow for " by volume adjusting to 25 ", the TV then adjusts current volume and is 25。
It should be noted that pre-set various conventional voices corresponding with the local voice recognizer component drawing Hold up, be designated as the first speech engine.
Further, it is described when the TV not can determine that the concrete operations to be performed according to first semanteme The exportable prompt message of TV, points out user to re-enter voice flow by voice-input device, the voice flow institute re-entered Corresponding semanteme should be that specifically, semanteme of the TV according to corresponding to the voice flow for re-entering can determine to be performed Concrete operations.When the voice flow that the television reception to user is re-entered, the TV directly passes through first language The semanteme corresponding to voice flow that the determination of sound engine is re-entered.Semanteme as corresponding to the voice flow being input into as user is When " zapping ", the TV not can determine that will specifically change to that TV station, and now, the TV can point out user to pass through voice Input equipment input will specifically change to the voice flow of that TV station.As user, the voice-input device input as described in " is changed To one, center " voice flow when, the TV directly determines the voice flow in " center one " by first speech engine Corresponding first is semantic, and TV platform is switched into one, center according to first semanteme.
Further, when the TV not can determine that the concrete operations to be performed according to first semanteme, the electricity Depending on corresponding operation can be performed according to the object according to the described first semantic determination frequency of use highest object.As worked as When stating TV and not can determine that user specifically wants to change to that TV station according to " zapping ", the TV determines that user watches frequency Highest TV station, user's viewing frequency highest TV station is switched to by TV platform.
Second identification module 20 is additionally operable to be failed the identification voice flow when the local voice recognizer component, But when third party's speech recognition component successfully recognizes the voice flow, by corresponding with third party's speech recognition component The second speech engine recognize the semanteme of the voice flow, the semanteme that second speech engine is recognized is designated as it is second semantic, Operated accordingly according to second semantic execution.
The voice flow is recognized when the TV determines that the local voice recognizer component fails, but the third party Speech recognition component successfully recognizes the voice flow, you can drawn with finding voice corresponding with third party's speech recognition component Hold up recognize the voice flow it is semantic when, the TV determines and the second language corresponding to third party's speech recognition component Sound engine, the semanteme of the voice flow is recognized by second speech engine, the semanteme that second speech engine is recognized The second semanteme is designated as, is operated accordingly according to second semantic execution.
When the TV, the semanteme according to determined by second speech engine not can determine that the concrete operations to be performed When, held when semanteme not can determine that the concrete operations to be performed according to determined by first speech engine with the TV Capable operation is similar, will not be repeated here.
Third party's speech recognition component includes multiple voice engine.In the present embodiment, by third party's language Multiple voice engine included by sound recognizer component is designated as the second speech engine.
Further, the speech recognition system also includes:
Output module, for all being failed when the local voice recognizer component and third party's speech recognition component When recognizing the voice flow, prompt message is exported, point out voice flow recognition failures described in user.
Further, when the TV determines the local voice recognizer component and third party's speech recognition component all Fail the identification voice flow when, the TV exports prompt message, points out voice flow recognition failures described in user.It is described The mode of voice flow recognition failures described in television reminding user includes but is not limited to the output character in video screen and points out, passes through The built-in voice-output device of the TV is exported corresponding voice and is pointed out or pointed out in the form of warning light.
Further, in the present embodiment, first speech engine includes languages switching vocabulary, remote-controller function word Converge and preset scene vocabulary.In other embodiments, first speech engine can also include other vocabulary.As described first " hello " is provided with speech engine and switches vocabulary as languages, when first speech engine determines what the TV was received During corresponding semantic " hello " for Chinese of voice flow, the TV in the subsequently received voice flow, by Chinese institute Corresponding speech engine determines the semanteme of the voice flow;When the first speech engine determines the voice flow that the TV is received During corresponding semantic " hello " for English, the TV in the subsequently received voice flow, by corresponding to English Speech engine determines the semanteme of the voice flow;When the first speech engine determines that the voice flow that the TV is received is corresponding When semanteme is " hello " of Guangdong language, the TV is drawn in the subsequently received voice flow by the voice corresponding to Guangdong language Hold up to determine the semanteme of the voice flow.The distant control function vocabulary is vocabulary corresponding with remote control common function, such as with increasing The corresponding vocabulary of function such as summation tone amount, reduction volume and zapping.The preset scene vocabulary is often with tv scene pair with user The vocabulary answered, such as weather, shopping vocabulary.The corresponding semanteme of voice flow that the TV as described in the first speech engine determines is received During for " shopping ", the TV in the subsequently received voice flow, by the speech engine corresponding to shopping to determine State the semanteme of voice flow.It should be noted that the TV is in first speech engine, by advance training, storage Semantic mapping table corresponding to various voice flows.
Second speech engine includes a kind of Default sound engine and various other speech engines.The Default sound draws It is the most frequently used speech engine in third party's speech recognition component to hold up, and can according to specific needs be set by user, such as may be used Baidu's speech engine is set to Default sound engine;Described other speech engines include but is not limited to Ali's speech engine Homeway.com Fly speech engine.
Further, the voice flow is recognized when the local voice recognizer component fails, or first language Sound engine fail the identification voice flow it is semantic when, the TV exports prompt message, points out voice flow described in user Recognition failures.Or the voice flow is recognized when the local voice recognizer component fails, the TV is then by described Third party's speech recognition component recognizes the voice flow;The language of the voice flow is recognized when first speech engine fails When adopted, the TV recognizes the semanteme of the voice flow by second speech engine.Only when third party's voice is known Other component fails and recognizes the voice flow, or second speech engine fails and recognizes the semanteme of the voice flow When, the TV just exports prompt message, points out voice flow recognition failures described in user.
The present embodiment by when receive with TV connect voice-input device transmitted by voice flow when, by local Speech recognition component and third party's speech recognition component recognize the voice flow;When the local voice recognizer component is successfully recognized During the voice flow, the language of the voice flow is recognized by the first speech engine corresponding with the local voice recognizer component Justice, the first semanteme is designated as by the semanteme that first speech engine is recognized, is operated accordingly according to first semantic execution;When The local voice recognizer component fails and recognizes the voice flow, but third party's speech recognition component successfully recognizes institute When stating voice flow, the language of the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component Justice, the second semanteme is designated as by the semanteme that second speech engine is recognized, is operated accordingly according to second semantic execution.It is real Show integrated in addition to the first local speech engine on television, also integrated various third party's speech engines, improve electricity Success rate and flexibility depending on speech recognition.
Further, the preferred embodiment based on speech recognition system of the present invention proposes the second embodiment of the present invention, ginseng Fig. 4 is examined, in the present embodiment, second identification module 20 includes:
Recognition unit 21, the language for recognizing the voice flow by the Default sound engine in second speech engine Justice;
Determining unit 22, for when the Default sound engine fail to recognize the voice flow it is semantic when, it is determined that described The priority of other speech engines in second speech engine;
The recognition unit 21, for being recognized by described other speech engines from high to low successively according to the priority The semanteme of the voice flow.
In the present embodiment, second speech engine includes a kind of Default sound engine and various other speech engines. In the TV by the language with the second speech engine identification voice flow corresponding to third party's speech recognition component In the process of justice, the Default sound engine that the TV is first passed through in second speech engine recognizes the language of the voice flow Justice.When the Default sound engine successfully recognize the voice flow it is semantic when, the TV is corresponding according to the semantic execution Operation.When the Default sound engine fail to recognize corresponding to the speech engine it is semantic when, the TV determines described The priority of other speech engines in second speech engine, and according to the priority height successively from high to low by described Other speech engines recognize the semanteme of the voice flow.
Further, other speech engines are default during the determining unit 22 is additionally operable to obtain second speech engine Access times in time;The access times are sorted according to order from big to small, ranking results are obtained;According to the row Sequence result determines the priority of other speech engines.
Further, it is determined that the process of the priority of other speech engines is in second speech engine:The TV Obtain access times of other speech engines in Preset Time in second speech engine, by the access times according to from Small order sequence is arrived greatly, obtains ranking results.The TV other speech engines according to the ranking results determine Priority, that is, the priority for being arranged in speech engine above is higher than the priority for being arranged in speech engine below.
The present embodiment is recognizing the voice by the second speech engine corresponding with third party's speech recognition component During the semanteme of stream, the language of the voice flow is preferentially recognized by the Default sound engine in second speech engine Justice, when the Default sound engine fail to recognize the voice flow it is semantic when, based on the priority of other speech engines, The semanteme of the voice flow is recognized by described other speech engines.On the basis of TV speech recognition success rate is improved, carry The efficiency of the TV speech identification high.
It should be noted that herein, term " including ", "comprising" or its any other variant be intended to non-row His property is included, so that process, method, article or system including a series of key elements not only include those key elements, and And also include other key elements being not expressly set out, or also include for this process, method, article or system institute are intrinsic Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this Also there is other identical element in the process of key element, method, article or system.
The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably implementation method.Based on such understanding, technical scheme is substantially done to prior art in other words The part for going out contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions are used to so that a station terminal equipment (can be mobile phone, computer, clothes Business device, air-conditioner, or network equipment etc.) perform method described in each embodiment of the invention.
The preferred embodiments of the present invention are these are only, the scope of the claims of the invention is not thereby limited, it is every to utilize this hair Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of audio recognition method, it is characterised in that the audio recognition method is comprised the following steps:
When the voice flow transmitted by the voice-input device connected with TV is received, by local voice recognizer component and the Tripartite's speech recognition component recognizes the voice flow;
When the local voice recognizer component successfully recognizes the voice flow, by corresponding with the local voice recognizer component The first speech engine recognize the semanteme of the voice flow, the semanteme that first speech engine is recognized is designated as it is first semantic, Operated accordingly according to first semantic execution;
The voice flow is recognized when the local voice recognizer component fails, but third party's speech recognition component success When recognizing the voice flow, the voice flow is recognized by the second speech engine corresponding with third party's speech recognition component Semanteme, the semanteme that second speech engine is recognized is designated as second semantic, grasped accordingly according to second semantic execution Make.
2. audio recognition method as claimed in claim 1, it is characterised in that first speech engine includes that languages switch word Remittance, remote-controller function vocabulary and preset scene vocabulary;
Second speech engine includes a kind of Default sound engine and various other speech engines.
3. audio recognition method as claimed in claim 2, it is characterised in that it is described by with third party's speech recognition group Corresponding second speech engine of part recognizes that the semantic step of the voice flow includes:
The semanteme of the voice flow is recognized by the Default sound engine in second speech engine;
When the Default sound engine fail to recognize the voice flow it is semantic when, determine other languages in second speech engine The priority of sound engine;
Recognize the semanteme of the voice flow by described other speech engines from high to low successively according to the priority.
4. audio recognition method as claimed in claim 3, it is characterised in that in determination second speech engine other The step of priority of speech engine, includes:
Obtain access times of other speech engines in Preset Time in second speech engine;
The access times are sorted according to order from big to small, ranking results are obtained;
The priority of other speech engines according to the ranking results determine.
5. the audio recognition method as described in any one of Claims 1-4, it is characterised in that described when receiving and TV connects During voice flow transmitted by the voice-input device for connecing, recognized by local voice recognizer component and third party's speech recognition component After the step of voice flow, also include:
When the local voice recognizer component and third party's speech recognition component all fail the identification voice flow, Output prompt message, points out voice flow recognition failures described in user.
6. a kind of speech recognition system, it is characterised in that the speech recognition system includes:
First identification module, for when the voice flow transmitted by the voice-input device connected with TV is received, by this Ground speech recognition component and third party's speech recognition component recognize the voice flow;
Second identification module, for when the local voice recognizer component successfully recognizes the voice flow, by with described Speech recognition component corresponding first speech engine in ground recognizes the semanteme of the voice flow, and first speech engine is recognized Semanteme is designated as the first semanteme, is operated accordingly according to first semantic execution;When the local voice recognizer component fails into Work(recognizes the voice flow, but third party's speech recognition component is when successfully recognizing the voice flow, by with the described 3rd The semanteme of the square speech recognition component corresponding second speech engine identification voice flow, second speech engine is recognized Semanteme is designated as the second semanteme, is operated accordingly according to second semantic execution.
7. speech recognition system as claimed in claim 6, it is characterised in that first speech engine includes that languages switch word Remittance, remote-controller function vocabulary and preset scene vocabulary;
Second speech engine includes a kind of Default sound engine and various other speech engines.
8. speech recognition system as claimed in claim 7, it is characterised in that second identification module includes:
Recognition unit, the semanteme for recognizing the voice flow by the Default sound engine in second speech engine;
Determining unit, for when the Default sound engine fail to recognize the voice flow it is semantic when, determine second language The priority of other speech engines in sound engine;
The recognition unit is additionally operable to described by other speech engines identification from high to low successively according to the priority The semanteme of voice flow.
9. speech recognition system as claimed in claim 8, it is characterised in that the determining unit is additionally operable to obtain described second Access times of other speech engines in Preset Time in speech engine;By the access times according to order from big to small Sequence, obtains ranking results;The priority of other speech engines according to the ranking results determine.
10. the speech recognition system as described in any one of claim 6 to 9, it is characterised in that the speech recognition system is also wrapped Output module is included, for recognizing institute when the local voice recognizer component and third party's speech recognition component all fail When stating voice flow, prompt message is exported, point out voice flow recognition failures described in user.
CN201611135597.6A 2016-12-09 2016-12-09 Audio recognition method and system Pending CN106782561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611135597.6A CN106782561A (en) 2016-12-09 2016-12-09 Audio recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611135597.6A CN106782561A (en) 2016-12-09 2016-12-09 Audio recognition method and system

Publications (1)

Publication Number Publication Date
CN106782561A true CN106782561A (en) 2017-05-31

Family

ID=58879830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611135597.6A Pending CN106782561A (en) 2016-12-09 2016-12-09 Audio recognition method and system

Country Status (1)

Country Link
CN (1) CN106782561A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107526512A (en) * 2017-08-31 2017-12-29 联想(北京)有限公司 Switching method and system for electronic equipment
CN107945797A (en) * 2017-12-07 2018-04-20 携程旅游信息技术(上海)有限公司 Monitoring system based on speech recognition
CN109410926A (en) * 2018-11-27 2019-03-01 恒大法拉第未来智能汽车(广东)有限公司 Voice method for recognizing semantics and system
CN109859755A (en) * 2019-03-13 2019-06-07 深圳市同行者科技有限公司 A kind of audio recognition method, storage medium and terminal
CN110223672A (en) * 2019-05-16 2019-09-10 九牧厨卫股份有限公司 A kind of multilingual audio recognition method of off-line type
US11527240B2 (en) 2018-11-21 2022-12-13 Industrial Technology Research Institute Speech recognition system, speech recognition method and computer program product

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1512484A (en) * 2002-12-27 2004-07-14 联想(北京)有限公司 Organizing and identifying method for natural language
CN203340238U (en) * 2012-09-28 2013-12-11 三星电子株式会社 Image processing device
CN103811006A (en) * 2012-11-06 2014-05-21 三星电子株式会社 Method and apparatus for voice recognition
CN105206275A (en) * 2015-08-31 2015-12-30 小米科技有限责任公司 Device control method, apparatus and terminal
CN105393302A (en) * 2013-07-17 2016-03-09 三星电子株式会社 Multi-level speech recognition
CN105513593A (en) * 2015-11-24 2016-04-20 南京师范大学 Intelligent human-computer interaction method drove by voice
CN105825853A (en) * 2015-01-07 2016-08-03 中兴通讯股份有限公司 Speech recognition device speech switching method and speech recognition device speech switching device
CN105869636A (en) * 2016-03-29 2016-08-17 上海斐讯数据通信技术有限公司 Speech recognition apparatus and method thereof, smart television set and control method thereof
CN106101789A (en) * 2016-07-06 2016-11-09 深圳Tcl数字技术有限公司 The voice interactive method of terminal and device
CN106162285A (en) * 2016-08-17 2016-11-23 合肥申目电子科技有限公司 Speech interactive Multi-functional TV remote control
CN106205615A (en) * 2016-08-26 2016-12-07 王峥嵘 A kind of control method based on interactive voice and system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1512484A (en) * 2002-12-27 2004-07-14 联想(北京)有限公司 Organizing and identifying method for natural language
CN203340238U (en) * 2012-09-28 2013-12-11 三星电子株式会社 Image processing device
CN103811006A (en) * 2012-11-06 2014-05-21 三星电子株式会社 Method and apparatus for voice recognition
CN105393302A (en) * 2013-07-17 2016-03-09 三星电子株式会社 Multi-level speech recognition
CN105825853A (en) * 2015-01-07 2016-08-03 中兴通讯股份有限公司 Speech recognition device speech switching method and speech recognition device speech switching device
CN105206275A (en) * 2015-08-31 2015-12-30 小米科技有限责任公司 Device control method, apparatus and terminal
CN105513593A (en) * 2015-11-24 2016-04-20 南京师范大学 Intelligent human-computer interaction method drove by voice
CN105869636A (en) * 2016-03-29 2016-08-17 上海斐讯数据通信技术有限公司 Speech recognition apparatus and method thereof, smart television set and control method thereof
CN106101789A (en) * 2016-07-06 2016-11-09 深圳Tcl数字技术有限公司 The voice interactive method of terminal and device
CN106162285A (en) * 2016-08-17 2016-11-23 合肥申目电子科技有限公司 Speech interactive Multi-functional TV remote control
CN106205615A (en) * 2016-08-26 2016-12-07 王峥嵘 A kind of control method based on interactive voice and system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107526512A (en) * 2017-08-31 2017-12-29 联想(北京)有限公司 Switching method and system for electronic equipment
CN107945797A (en) * 2017-12-07 2018-04-20 携程旅游信息技术(上海)有限公司 Monitoring system based on speech recognition
CN107945797B (en) * 2017-12-07 2021-12-31 携程旅游信息技术(上海)有限公司 Monitoring system based on speech recognition
US11527240B2 (en) 2018-11-21 2022-12-13 Industrial Technology Research Institute Speech recognition system, speech recognition method and computer program product
CN109410926A (en) * 2018-11-27 2019-03-01 恒大法拉第未来智能汽车(广东)有限公司 Voice method for recognizing semantics and system
CN109859755A (en) * 2019-03-13 2019-06-07 深圳市同行者科技有限公司 A kind of audio recognition method, storage medium and terminal
CN110223672A (en) * 2019-05-16 2019-09-10 九牧厨卫股份有限公司 A kind of multilingual audio recognition method of off-line type
CN110223672B (en) * 2019-05-16 2021-04-23 九牧厨卫股份有限公司 Offline multi-language voice recognition method

Similar Documents

Publication Publication Date Title
CN106782561A (en) Audio recognition method and system
CN108000526B (en) Dialogue interaction method and system for intelligent robot
EP3039531B1 (en) Display apparatus and controlling method thereof
KR102211595B1 (en) Speech recognition apparatus and control method thereof
CN105161106A (en) Voice control method of intelligent terminal, voice control device and television system
CN101576901B (en) Method for generating search request and mobile communication equipment
CN109637548A (en) Voice interactive method and device based on Application on Voiceprint Recognition
CN106796496A (en) Display device and its operating method
CN105531758B (en) Use the speech recognition of foreign words grammer
JP2016095383A (en) Voice recognition client device and server-type voice recognition device
CN103491411A (en) Method and device based on language recommending channels
CN107040452B (en) Information processing method and device and computer readable storage medium
WO2014181524A1 (en) Conversation processing system and program
WO2017128775A1 (en) Voice control system, voice processing method and terminal device
CN106205615A (en) A kind of control method based on interactive voice and system
KR20150093482A (en) System for Speaker Diarization based Multilateral Automatic Speech Translation System and its operating Method, and Apparatus supporting the same
CN110704590B (en) Method and apparatus for augmenting training samples
CN107155121B (en) Voice control text display method and device
CN105898525A (en) Method of searching videos in specific video database, and video terminal thereof
CN102316361A (en) Audio-frequency / video-frequency on demand method based on natural speech recognition and system thereof
CN106027752A (en) Self-adaption method and device for mobile terminal call background sounds
CN103546790A (en) Language interaction method and language interaction system on basis of mobile terminal and interactive television
CN108806688A (en) Sound control method, smart television, system and the storage medium of smart television
JP6625772B2 (en) Search method and electronic device using the same
WO2023184942A1 (en) Voice interaction method and apparatus and electric appliance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170531

RJ01 Rejection of invention patent application after publication