CN110299136A - A kind of processing method and its system for speech recognition - Google Patents

A kind of processing method and its system for speech recognition Download PDF

Info

Publication number
CN110299136A
CN110299136A CN201810240495.3A CN201810240495A CN110299136A CN 110299136 A CN110299136 A CN 110299136A CN 201810240495 A CN201810240495 A CN 201810240495A CN 110299136 A CN110299136 A CN 110299136A
Authority
CN
China
Prior art keywords
engine
speech recognition
speech
confidence level
meaning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810240495.3A
Other languages
Chinese (zh)
Inventor
刘根华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qinggan Intelligent Technology Co Ltd
Original Assignee
Shanghai Qinggan Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qinggan Intelligent Technology Co Ltd filed Critical Shanghai Qinggan Intelligent Technology Co Ltd
Priority to CN201810240495.3A priority Critical patent/CN110299136A/en
Publication of CN110299136A publication Critical patent/CN110299136A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention provides a kind of processing methods of speech recognition, comprising: obtains the voice input of user;The usage scenario of user is judged based on the current application interface of user;The speech recognition engine used is determined based on above-mentioned usage scenario;The input of above-mentioned voice is identified using above-mentioned speech recognition engine, to obtain speech intention result;The highest speech intention result of confidence level is selected based on preset selection strategy;And the above-mentioned highest speech intention result of confidence level of output.The speech recognition processing system that the present invention also provides a kind of to realize above-mentioned processing method, the processing method and system of provided speech recognition according to the present invention, the result of speech recognition can be made more targetedly and more accurate, can also promote the response speed of speech recognition.

Description

A kind of processing method and its system for speech recognition
Technical field
The present invention relates to speech recognition technology more particularly to the processing methods and its system of speech recognition.
Background technique
Universal with smart machine, the mode of human-computer interaction develops towards more and more convenient and fast direction, interactive voice compared with Typewriting, mouse or touch screen control be a kind of more convenient mode, allow machine to understand human language, and respond and can make Machine is preferably mankind's service.
But current interactive voice still has the problems such as response speed is slow, and identification is not accurate, and the voice for influencing user is known Therefore other Experience Degree needs a kind of processing method for speech recognition, can strengthen the system layer to the judgement of speech recognition and Distribution capability, to improve the response speed and precision of speech recognition.
Summary of the invention
A brief summary of one or more aspects is given below to provide to the basic comprehension in terms of these.This general introduction is not The extensive overview of all aspects contemplated, and be both not intended to identify critical or decisive element in all aspects also non- Attempt to define the range in terms of any or all.Its unique purpose is to provide the one of one or more aspects in simplified form A little concepts are with the sequence for more detailed description given later.
It is slower to the processing response of speech recognition in the prior art in order to solve, and identify the not high problem of precision, this Invention provides a kind of processing method for speech recognition, specifically includes: obtaining the voice input of user;It is current based on user Application interface judge the usage scenario of user;The speech recognition engine used is determined based on above-mentioned usage scenario;Using above-mentioned Speech recognition engine identifies the input of above-mentioned voice, to obtain speech intention result;It is selected based on preset selection strategy The highest speech intention result of confidence level;And the above-mentioned highest speech intention result of confidence level of output.
Optionally, the usage scenario of above-mentioned judgement user further comprises: based on above-mentioned user currently without in specific Application interface judge the usage scenario of above-mentioned user for generic scenario;Above-mentioned determining speech recognition engine further comprises: base It is generic scenario in above-mentioned usage scenario, determines and use multiple speech recognition engines;And the step of above-mentioned identification, further wraps It includes: the input of above-mentioned voice being identified using above-mentioned multiple speech recognition engines, to obtain multiple speech intention results.
Optionally, the step of above-mentioned selection speech intention result based on preset selection strategy further comprises: based on upper State the meaning of one's words classification that multiple speech intention results determine above-mentioned voice input;Based on preset identification engine-meaning of one's words classification confidence It spends table and determines the highest speech recognition engine of confidence level under above-mentioned meaning of one's words classification;And the above-mentioned highest voice of confidence level of selection The speech intention result that identification engine is identified is as the highest speech intention result of confidence level.
Optionally, the step of meaning of one's words classification of above-mentioned determining voice input further comprises: determining each above-mentioned voice It is intended to the meaning of one's words classification of result;The input of predicate sound is counted in each language based on above-mentioned identification engine-meaning of one's words classification confidence level meter Confidence level total value under classification of anticipating;And determine that the meaning of one's words classification of above-mentioned voice input is the highest meaning of one's words class of confidence level total value Not.
Optionally, the step of above-mentioned selection speech intention result based on preset selection strategy further comprises: according to pre- If identification engine weight, successively judge the speech intention result and speech recognition engine that each speech recognition engine is identified Matching degree;And matched in response to speech intention result with speech recognition engine, select above-mentioned speech recognition engine to be identified Speech intention result be the highest speech intention result of confidence level.
Optionally, the step of judging matching degree further comprises: determining the meaning of one's words class of the speech intention result currently judged Not;Determine the speech recognition engine that currently judges in the knowledge of above-mentioned meaning of one's words classification based on identification engine-meaning of one's words classification confidence level table Other confidence level;And above-mentioned speech intention result matches with speech recognition engine and specifically includes above-mentioned recognition confidence higher than default Matching confidence level.
Optionally, the usage scenario of above-mentioned judgement user further comprises: being currently at based on above-mentioned user and is specifically answered The usage scenario that above-mentioned user is judged with interface is special scenes;Above-mentioned determining speech recognition engine further comprises: based on upper Stating usage scenario is special scenes, and speech recognition engine used by determining draws to identify corresponding to the special sound of special scenes It holds up;And above-mentioned the step of selecting speech intention result based on preset selection strategy, further comprises: in response to above-mentioned specific Speech recognition engine identifies speech intention as a result, selecting above-mentioned speech intention result for the highest speech intention knot of confidence level Fruit.
Optionally, above-mentioned specific application interface includes that user is in voice input interface, voice wakes up interface, navigation ground Location input interface, selection contact person interface.
Optionally, above-mentioned speech recognition engine includes online engine, and/or, offline engine, the above method further wraps It includes: obtaining the current network state of user;And the above-mentioned determining speech recognition engine used further comprises being made based on above-mentioned Speech recognition engine is determined with scene and above-mentioned network state.
The present invention also provides a kind of processing systems of speech recognition, comprising: obtain module, judgment module, selecting module, Output module and speech recognition engine;Wherein, above-mentioned acquisition module is inputted to obtain the voice of user;Above-mentioned judgment module is used To judge the usage scenario of user based on the current application interface of user;And the voice used is determined based on above-mentioned usage scenario Identify engine;Above-mentioned speech recognition engine is to identify the input of above-mentioned voice, to obtain speech intention result;Above-mentioned choosing Module is selected to select the highest speech intention result of confidence level based on preset selection strategy;And above-mentioned output module to Export the highest speech intention result of above-mentioned confidence level.
Optionally, above-mentioned judgment module judges above-mentioned user currently without in specific application interface based on above-mentioned user Usage scenario be generic scenario;Above-mentioned judgment module determines that speech recognition engine further comprises: being based on above-mentioned usage scenario For generic scenario, determines and use multiple speech recognition engines;And above-mentioned speech recognition engine identify and further comprises: on It states multiple speech recognition engines to identify the input of above-mentioned voice, to obtain multiple speech intention results.
Optionally, above system further includes preset identification engine-meaning of one's words classification confidence level table;Wherein, above-mentioned judgement mould Block further includes the meaning of one's words classification to determine above-mentioned voice input based on above-mentioned multiple speech intention results;Above-mentioned selecting module choosing Selecting speech intention result further comprises: being determined based on above-mentioned identification engine-meaning of one's words classification confidence level table in above-mentioned meaning of one's words classification The lower highest speech recognition engine of confidence level;And the voice meaning that the above-mentioned highest speech recognition engine of confidence level of selection is identified Figure result is as the highest speech intention result of confidence level.
Optionally, above-mentioned judgment module determine voice input the meaning of one's words classification the step of further comprise: determine each The meaning of one's words classification of above-mentioned speech intention result;The input of predicate sound is counted in based on above-mentioned identification engine-meaning of one's words classification confidence level meter Confidence level total value under each meaning of one's words classification;And determine that the meaning of one's words classification of above-mentioned voice input is that confidence level total value is highest Meaning of one's words classification.
Optionally, above system further includes preset identification engine weight table, wherein above-mentioned judgment module further includes basis Above-mentioned identification engine weight table successively judges the speech intention that each speech recognition engine is identified by the weight of identification engine As a result with the matching degree of speech recognition engine;And above-mentioned selecting module selection speech intention result further comprises: in response to Speech intention result is matched with speech recognition engine, and the speech intention result for selecting above-mentioned speech recognition engine to be identified is confidence Spend highest speech intention result.
Optionally, above system further includes preset identification engine-meaning of one's words classification confidence level table, above-mentioned judgment module judgement Matching degree further comprises: determining the meaning of one's words classification of the speech intention result currently judged;Based on above-mentioned identification engine-meaning of one's words class Recognition confidence of the determining speech recognition engine currently judged of other confidence level table in above-mentioned meaning of one's words classification;And above-mentioned voice meaning Figure result is matched with speech recognition engine specifically includes above-mentioned recognition confidence higher than preset matching confidence level.
Optionally, above-mentioned judgment module is currently at specific application interface based on above-mentioned user and judges making for above-mentioned user It is special scenes with scene;Above-mentioned judgment module determines that above-mentioned determining speech recognition engine further comprises: being based on above-mentioned use Scene is special scenes, and speech recognition engine used by determining is the special sound identification engine corresponding to special scenes;With And above-mentioned selecting module selection speech intention result further comprises: identifying voice in response to above-mentioned special sound identification engine It is intended to as a result, selecting above-mentioned speech intention result for the highest speech intention result of confidence level.
Optionally, above-mentioned specific application interface includes that user is in voice input interface, voice wakes up interface, navigation ground Location input interface, selection contact person interface.
Optionally, above-mentioned speech recognition engine includes online engine, and/or, offline engine, above-mentioned acquisition module further includes Obtain the current network state of user;And above-mentioned judgment module determines that the speech recognition engine used further comprises being based on Above-mentioned usage scenario and above-mentioned network state determine speech recognition engine.
The present invention also provides a kind of computer equipment, including memory, processor and storage are on a memory and can be The computer program run on processor, wherein realize that the present invention such as mentions when above-mentioned processor executes above-mentioned computer program For speech recognition processing method the step of.
The processing method of provided speech recognition according to the present invention, can judge the usage scenario of user, and select with The relevant speech recognition engine of usage scenario reduces the time spent by speech recognition, and combines preset selection strategy So that the result identified confidence level with higher, more targetedly and more accurate, effectively promote user and known using voice Other Experience Degree.
Detailed description of the invention
Fig. 1 shows an embodiment flow chart of institute's providing method according to the present invention.
Fig. 2 shows preset identification one implementations of engine-meaning of one's words classification confidence level table of institute's providing method according to the present invention It illustrates and is intended to.
Specific embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.Note that below in conjunction with attached drawing and specifically real The aspects for applying example description is merely exemplary, and is understood not to carry out any restrictions to protection scope of the present invention.
It is slower to the processing response of speech recognition in the prior art in order to solve, and identify the not high problem of precision, this Invention provides a kind of processing method for speech recognition, and Fig. 1 shows an embodiment of institute's providing method according to the present invention Flow chart.As shown in Figure 1, method provided by the present invention specifically includes the voice input that step 110 obtains user, step 120 Judge that the usage scenario of user, step 130 determine that the voice used is known based on usage scenario based on the current application interface of user Other engine, step 140 identifies voice input using speech recognition engine, to obtain speech intention as a result, step 150, The highest voice of confidence level is exported based on the highest speech intention result of preset selection strategy selection confidence level and step 160 It is intended to result.
Specifically, being used in the usage scenario that step 120 application interface current based on user judges user by judgement Family is currently either with or without the usage scenario for judging user in specific application interface.Further, specific application interface It can be user and be currently at voice input interface, voice wake-up interface, navigation address input interface or selection contact person interface Etc. preset specific application interface.
In one embodiment, step 120 judges that user is not on specific application interface, therefore user is thought in judgement Usage scenario be generic scenario.In this embodiment, step 130 determines the speech recognition engine used based on usage scenario, It further include being generic scenario based on usage scenario, determining and use multiple speech recognition engines, specifically, multiple voices Identify that engine is preset correspondence not specific application scene, the multiple speech recognition engines to cover a wide range.In this embodiment In, step 140 further comprises being identified voice input to obtain multiple speech intentions using multiple speech recognition engines As a result, then executing step 150 is based on the preset selection strategy selection highest speech intention of confidence level as a result, then executing step The rapid 160 output highest speech intention result of confidence level.
In the above-described embodiments, step 150 further comprises the language that voice input is determined based on multiple speech intention results Meaning classification, then determines the confidence level under identified meaning of one's words classification according to preset identification engine-meaning of one's words classification confidence level table Highest speech recognition engine, it is therefore contemplated that the speech intention knot that the highest speech recognition engine of above-mentioned confidence level is identified Fruit is the highest speech intention of confidence level as a result, the final output speech intention result.
Further, the step of meaning of one's words classification of voice input is determined in above-mentioned steps 150 may include, according to multiple languages Sound is intended to the meaning of one's words classification that result determines each speech intention result respectively, is set according to preset identification engine-meaning of one's words classification Reliability meter calculates voice and inputs the confidence level total value under each meaning of one's words classification, and determines the highest meaning of one's words of confidence level total value Classification is the meaning of one's words classification of voice input.
Fig. 2 shows preset identification one implementations of engine-meaning of one's words classification confidence level table of institute's providing method according to the present invention It illustrates and is intended to.The specific processing method of above-described embodiment is illustrated below in conjunction with Fig. 2.As shown in Fig. 2, when judging user When currently without being in specific application scenarios, using multiple identification engines: A, B, C, D are come to the voice input under generic scenario It is identified.The meaning of one's words classification of each speech intention result is determined respectively, for example, after identifying again, obtained knowledge Not the result is that the speech intention that identifies of identification engine A is the result is that meaning of one's words classification 5, confidence level 9;Identification engine B is identified Speech intention is the result is that meaning of one's words classification 7, confidence level 7;The speech intention that identification engine C is identified is the result is that meaning of one's words classification 5, sets Reliability is 10;The speech intention that identification engine D is identified is the result is that meaning of one's words classification 3, confidence level 7.Therefore the voice identified Confidence level of the speech intention result of input in meaning of one's words classification 3 is 7 points, and the confidence level in meaning of one's words classification 5 is 19 points, in language Confidence level in classification 7 of anticipating is 7 points, the confidence score highest in meaning of one's words classification 5, therefore, it is determined that the voice input identified Speech intention result belongs to meaning of one's words classification 5, further, after determining meaning of one's words classification, determines the confidence level in meaning of one's words classification 5 The identification engine of highest scoring is identification engine C, therefore the speech intention result for just selecting identification engine C to be identified is as language The output result of sound input is to export.
Provided method according to the present invention can judge the usage scenario of user in advance, thus the suitable identification of selection Engine is classified under generic scenario further via to speech intention result, and carries out the side of confidence level judgement Formula improves the accuracy of identification.
Judge that user is not in the embodiment of specific application interface in step 120, there is also another kinds to implement Example judges to think the usage scenario of user as generic scenario in another embodiment.It further, is logical based on usage scenario It with scene, determines and uses multiple speech recognition engines, specifically, multiple speech recognition engines are preset correspondence not specific application Scene, the multiple speech recognition engines to cover a wide range.In this embodiment, step 140 further comprises using multiple languages Sound identifies that engine identifies to obtain multiple speech intentions voice input as a result, then executing step 150 based on preset Selection strategy selects the highest speech intention of confidence level as a result, then executing step 160 exports the highest speech intention of confidence level As a result.
In the above-described embodiments, step 150 is based on preset selection strategy and selects the highest speech intention result of confidence level It include further that the speech intention that each speech recognition engine is identified successively is judged according to preset identification engine weight As a result it with the matching degree of speech recognition engine, and is matched with speech recognition engine in response to speech intention result, selects the language The speech intention result that sound identification engine is identified is the highest speech intention result of confidence level.
Further, the step of judging matching degree further comprises, according to preset identification engine weight, successively judging The meaning of one's words classification for the speech intention result that current identification engine is identified, is determined based on identification engine-meaning of one's words classification confidence level table Recognition confidence of the current identification engine under the meaning of one's words classification judged, and, it is preset when the recognition confidence is more than or equal to Matching confidence value when, it is believed that speech intention result is matched with speech recognition engine.
Fig. 2 shows preset identification one implementations of engine-meaning of one's words classification confidence level table of institute's providing method according to the present invention It illustrates and is intended to.The specific processing method of above-described embodiment is illustrated below in conjunction with Fig. 2.As shown in Fig. 2, when judging user When currently without being in specific application scenarios, using multiple identification engines: A, B, C, D are come to the voice input under generic scenario It is identified.The default identification engine weight of the present invention is followed successively by engine A, engine B, engine C and engine D, preset matching Confidence level be 8 points, therefore, successively judge the speech intention result that engine A is identified for classification 5, confidence level 9.Engine A at this time The confidence level 9 of the classification 5 identified is greater than matching confidence level 8, therefore, it is considered that engine A and its speech intention result identified Be it is matched, no longer judge the matching degree situation of other engines and speech intention result that directly output engine A is identified.
Provided method according to the present invention can judge the usage scenario of user in advance, thus the suitable identification of selection Engine is classified under generic scenario further via to speech intention result, and carries out the side of confidence level judgement Formula improves the accuracy of identification.And by the way of presetting weight to identification engine, by the way that suitable matching confidence level is arranged Value, can guarantee the precision of speech recognition, while can accelerate the response time again.
Step 120 judged in the usage scenario of user based on the current application interface of user, when judging that user is in spy Fixed application interface, such as can be user and be currently at voice input interface, voice wake-up interface, navigation address input interface Or when the preset specific application interface such as selection contact person interface, step 130 further comprises being corresponded to based on usage scenario selection Engine is identified in the special sound of the special scenes.Since user is in specific usage scenario, such as the input of navigation address Interface, therefore identification user can be gone by the default specific speech recognition engine for being suitble to and being good at identification address class data Voice input, identify as long as special sound identification engine can input the voice of user as a result, to default this specific It is that speech recognition engine is identified the result is that confidence level is high, can correspondingly export the speech intention result.
Provided method according to the present invention selects special sound identification engine by presetting specific usage scenario Mode can make the speech recognition result for corresponding to specific usage scenario more accurate, specifically use field judging that user is in Jing Hou, can quick calling special sound identification engine identified, effectively improve efficiency.
Provided method according to the present invention, more specifically, identification engine further include presence online engine and/or The offline engine of off-line state, more specifically, identification engine include at least online engine.Method provided by the present invention is further Including obtaining the current network state of user, if current network state be it is online, according to above-mentioned method, preferentially select online Engine carries out speech recognition.If current network state be it is offline, voice knowledge is carried out according to the offline engine of above-mentioned method choice Not.
The present invention also provides a kind of processing system of speech recognition, including obtain module, judgment module, selecting module, Output module and speech recognition engine input wherein obtaining module to obtain the voice of user;Judgment module is to based on use The current application interface in family judges the usage scenario of user, and the speech recognition engine used is determined based on usage scenario;Language Sound identifies engine to identify to voice input, to obtain speech intention result;Selecting module is to be based on preset choosing Select the highest speech intention result of policy selection confidence level;And output module is to export the highest speech intention knot of confidence level Fruit.
More specifically, acquisition module, judgment module, selecting module, output module and language that above-mentioned processing system is included Sound identifies that the operating mode of engine illustrates that details are not described herein in method part provided by the present invention.
The processing system of provided speech recognition according to the present invention, can judge the usage scenario of user, and select with The relevant speech recognition engine of usage scenario reduces the time spent by speech recognition, and combines preset selection strategy So that the result identified confidence level with higher, more targetedly and more accurate, effectively promote user and known using voice Other Experience Degree.
The present invention also provides a kind of computer equipment, including memory, processor and storage are on a memory and can be The computer program run on processor, the processor realize the step in the above method when executing the computer program.Its In, the specific implementation and technical effect of computer equipment can be found in the embodiment of above-mentioned voice recognition processing method, This is repeated no more.
Those skilled in the art will further appreciate that, the various illustratives described in conjunction with the embodiments described herein Logic plate, module, circuit and algorithm steps can be realized as electronic hardware, computer software or combination of the two.It is clear Explain to Chu this interchangeability of hardware and software, various illustrative components, frame, module, circuit and step be above with Its functional form makees generalization description.Such functionality be implemented as hardware or software depend on concrete application and It is applied to the design constraint of total system.Technical staff can realize every kind of specific application described with different modes Functionality, but such realization decision should not be interpreted to cause departing from the scope of the present invention.
In conjunction with presently disclosed embodiment describe various illustrative logic modules and circuit can with general processor, Digital signal processor (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic Device, discrete door or transistor logic, discrete hardware component or its be designed to carry out any group of function described herein It closes to realize or execute.General processor can be microprocessor, but in alternative, which can be any routine Processor, controller, microcontroller or state machine.Processor is also implemented as calculating the combination of equipment, such as DSP With the combination of microprocessor, multi-microprocessor, one or more microprocessors to cooperate with DSP core or any other this Class configuration.
The step of method or algorithm for describing in conjunction with embodiment disclosed herein, can be embodied directly in hardware, in by processor It is embodied in the software module of execution or in combination of the two.Software module can reside in RAM memory, flash memory, ROM and deposit Reservoir, eprom memory, eeprom memory, register, hard disk, removable disk, CD-ROM or known in the art appoint In the storage medium of what other forms.Exemplary storage medium is coupled to processor so that the processor can be from/to the storage Medium reads and writees information.In alternative, storage medium can be integrated into processor.Pocessor and storage media can It resides in ASIC.ASIC can reside in user terminal.In alternative, pocessor and storage media can be used as discrete sets Part is resident in the user terminal.
In one or more exemplary embodiments, described function can be in hardware, software, firmware, or any combination thereof Middle realization.If being embodied as computer program product in software, each function can be used as one or more item instructions or generation Code may be stored on the computer-readable medium or be transmitted by it.Computer-readable medium includes computer storage medium and communication Both media comprising any medium for facilitating computer program to shift from one place to another.Storage medium can be can quilt Any usable medium of computer access.It is non-limiting as example, such computer-readable medium may include RAM, ROM, EEPROM, CD-ROM or other optical disc storages, disk storage or other magnetic storage apparatus can be used to carrying or store instruction Or data structure form desirable program code and any other medium that can be accessed by a computer.Any connection is also by by rights Referred to as computer-readable medium.For example, if software is using coaxial cable, fiber optic cables, twisted pair, digital subscriber line (DSL) or the wireless technology of such as infrared, radio and microwave etc is passed from web site, server or other remote sources It send, then the coaxial cable, fiber optic cables, twisted pair, DSL or such as infrared, radio and microwave etc is wireless Technology is just included among the definition of medium.Disk (disk) and dish (disc) as used herein include compression dish (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc, which disk (disk) are often reproduced in a manner of magnetic Data, and dish (disc) with laser reproduce data optically.Combinations of the above should also be included in computer-readable medium In the range of.
Offer is to make any person skilled in the art all and can make or use this public affairs to the previous description of the disclosure It opens.The various modifications of the disclosure all will be apparent for a person skilled in the art, and as defined herein general Suitable principle can be applied to other variants without departing from the spirit or scope of the disclosure.The disclosure is not intended to be limited as a result, Due to example described herein and design, but should be awarded and principle disclosed herein and novel features phase one The widest scope of cause.

Claims (19)

1. a kind of processing method of speech recognition, comprising:
Obtain the voice input of user;
The usage scenario of user is judged based on the current application interface of user;
The speech recognition engine used is determined based on the usage scenario;
Voice input is identified using the speech recognition engine, to obtain speech intention result;
The highest speech intention result of confidence level is selected based on preset selection strategy;And
Export the highest speech intention result of the confidence level.
2. processing method as described in claim 1, which is characterized in that the usage scenario of the judgement user further comprises: Judge the usage scenario of the user for generic scenario currently without in specific application interface based on the user;
The determining speech recognition engine further comprises: being generic scenario based on the usage scenario, determines and use multiple languages Sound identifies engine;And
The step of identification, further comprises: voice input identified using the multiple speech recognition engine, To obtain multiple speech intention results.
3. processing method as claimed in claim 2, which is characterized in that described to select speech intention based on preset selection strategy As a result the step of, further comprises:
The meaning of one's words classification of the voice input is determined based on the multiple speech intention result;
Determine that the highest voice of confidence level is known under the meaning of one's words classification based on preset identification engine-meaning of one's words classification confidence level table Other engine;And
The speech intention result for selecting the highest speech recognition engine of the confidence level to be identified is as the highest voice of confidence level It is intended to result.
4. processing method as claimed in claim 3, which is characterized in that the step of the meaning of one's words classification of the determining voice input into One step includes:
Determine the meaning of one's words classification of each speech intention result;
The voice, which is calculated, based on the identification engine-meaning of one's words classification confidence level meter inputs the confidence level under each meaning of one's words classification Total value;And
The meaning of one's words classification for determining the voice input is the highest meaning of one's words classification of confidence level total value.
5. processing method as claimed in claim 2, which is characterized in that described to select speech intention based on preset selection strategy As a result the step of, further comprises:
According to preset identification engine weight, the speech intention result and language that each speech recognition engine is identified successively are judged The matching degree of sound identification engine;And
It is matched in response to speech intention result with speech recognition engine, the speech intention for selecting the speech recognition engine to be identified It as a result is the highest speech intention result of confidence level.
6. processing method as claimed in claim 5, which is characterized in that the step of judging matching degree further comprises:
Determine the meaning of one's words classification of the speech intention result currently judged;
Determine the speech recognition engine that currently judges in the knowledge of the meaning of one's words classification based on identification engine-meaning of one's words classification confidence level table Other confidence level;And
The speech intention result matched with speech recognition engine specifically include the recognition confidence be higher than it is preset matching set Reliability.
7. processing method as described in claim 1, which is characterized in that the usage scenario of the judgement user further comprises: Being currently at specific application interface based on the user judges the usage scenario of the user for special scenes;
The determining speech recognition engine further comprises: being special scenes, language used by determining based on the usage scenario Sound identifies that engine is the special sound identification engine corresponding to special scenes;And
The step of selection speech intention result based on preset selection strategy, further comprises: in response to the special sound Identification engine identifies speech intention as a result, selecting the speech intention result for the highest speech intention result of confidence level.
8. processing method as claimed in claim 7, which is characterized in that the specific application interface includes that user is in voice Input interface, voice wake up interface, navigation address input interface, selection contact person interface.
9. such as processing method of any of claims 1-8, which is characterized in that the speech recognition engine includes online Engine, and/or, offline engine, the method further includes:
Obtain the current network state of user;And
The determining speech recognition engine used further comprises determining language based on the usage scenario and the network state Sound identifies engine.
10. a kind of processing system of speech recognition, comprising: obtain module, judgment module, selecting module, output module and voice Identify engine;Wherein,
The voice input that module is obtained to obtain user;
Usage scenario of the judgment module to judge user based on the current application interface of user;And it is based on the use Scene determines the speech recognition engine used;
The speech recognition engine is to identify voice input, to obtain speech intention result;
The selecting module is to select the highest speech intention result of confidence level based on preset selection strategy;And
The output module is to export the highest speech intention result of the confidence level.
11. processing system as claimed in claim 10, which is characterized in that the judgment module be based on the user currently without Judge the usage scenario of the user for generic scenario in specific application interface;
The judgment module determines that speech recognition engine further comprises: being generic scenario based on the usage scenario, determination is adopted With multiple speech recognition engines;And
The speech recognition engine carries out identification: the multiple speech recognition engine inputs the voice and carries out Identification, to obtain multiple speech intention results.
12. processing system as claimed in claim 11, which is characterized in that the system also includes preset identification engine-languages Meaning classification confidence level table;Wherein,
The judgment module further includes the meaning of one's words classification to determine the voice input based on the multiple speech intention result;
The selecting module selection speech intention result further comprises:
The highest speech recognition of confidence level under the meaning of one's words classification is determined based on the identification engine-meaning of one's words classification confidence level table Engine;And
The speech intention result for selecting the highest speech recognition engine of the confidence level to be identified is as the highest voice of confidence level It is intended to result.
13. processing system as claimed in claim 12, which is characterized in that the judgment module determines the meaning of one's words class of voice input Other step further comprises:
Determine the meaning of one's words classification of each speech intention result;
The voice, which is calculated, based on the identification engine-meaning of one's words classification confidence level meter inputs the confidence level under each meaning of one's words classification Total value;And
The meaning of one's words classification for determining the voice input is the highest meaning of one's words classification of confidence level total value.
14. processing system as claimed in claim 11, which is characterized in that the system also includes preset identification engine weights Table, wherein
The judgment module further includes successively judging each language by the weight of identification engine according to the identification engine weight table The matching degree of speech intention result and speech recognition engine that sound identification engine is identified;And
The selecting module selection speech intention result further comprises: in response to speech intention result and speech recognition engine Match, the speech intention result for selecting the speech recognition engine to be identified is the highest speech intention result of confidence level.
15. processing system as claimed in claim 14, which is characterized in that the system also includes preset identification engine-languages Meaning classification confidence level table, the judgment module judge that matching degree further comprises:
Determine the meaning of one's words classification of the speech intention result currently judged;
Determine the speech recognition engine currently judged in the meaning of one's words classification based on the identification engine-meaning of one's words classification confidence level table Recognition confidence;And
The speech intention result matched with speech recognition engine specifically include the recognition confidence be higher than it is preset matching set Reliability.
16. processing system as claimed in claim 10, which is characterized in that the judgment module is based on the user and is currently at Specific application interface judges the usage scenario of the user for special scenes;
The judgment module determines that the determining speech recognition engine further comprises: being specific field based on the usage scenario Scape, speech recognition engine used by determining are the special sound identification engine corresponding to special scenes;And
The selecting module selection speech intention result further comprises: identifying language in response to special sound identification engine Sound is intended to as a result, selecting the speech intention result for the highest speech intention result of confidence level.
17. processing system as claimed in claim 16, it is characterised in that: the specific application interface includes that user is in language Sound input interface, voice wake up interface, navigation address input interface, selection contact person interface.
18. the processing system as described in any one of claim 10-17, which is characterized in that the speech recognition engine includes Online engine, and/or, offline engine, the module that obtains further includes obtaining the current network state of user;And
The judgment module determines that the speech recognition engine used further comprises, based on the usage scenario and described network-like State determines speech recognition engine.
19. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor is realized when executing the computer program such as any one of claim 1-9 institute The step of stating method.
CN201810240495.3A 2018-03-22 2018-03-22 A kind of processing method and its system for speech recognition Pending CN110299136A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810240495.3A CN110299136A (en) 2018-03-22 2018-03-22 A kind of processing method and its system for speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810240495.3A CN110299136A (en) 2018-03-22 2018-03-22 A kind of processing method and its system for speech recognition

Publications (1)

Publication Number Publication Date
CN110299136A true CN110299136A (en) 2019-10-01

Family

ID=68025589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810240495.3A Pending CN110299136A (en) 2018-03-22 2018-03-22 A kind of processing method and its system for speech recognition

Country Status (1)

Country Link
CN (1) CN110299136A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956955A (en) * 2019-12-10 2020-04-03 苏州思必驰信息科技有限公司 Voice interaction method and device
CN111049996A (en) * 2019-12-26 2020-04-21 苏州思必驰信息科技有限公司 Multi-scene voice recognition method and device and intelligent customer service system applying same
CN111933118A (en) * 2020-08-17 2020-11-13 苏州思必驰信息科技有限公司 Method and device for optimizing voice recognition and intelligent voice dialogue system applying same
CN112787899A (en) * 2021-01-08 2021-05-11 青岛海尔特种电冰箱有限公司 Equipment voice interaction method, computer readable storage medium and refrigerator
CN112786055A (en) * 2020-12-25 2021-05-11 北京百度网讯科技有限公司 Resource mounting method, device, equipment, storage medium and computer program product
CN112861542A (en) * 2020-12-31 2021-05-28 思必驰科技股份有限公司 Method and device for limiting scene voice interaction
WO2021135561A1 (en) * 2019-12-31 2021-07-08 思必驰科技股份有限公司 Skill voice wake-up method and apparatus
CN113327602A (en) * 2021-05-13 2021-08-31 北京百度网讯科技有限公司 Method and device for speech recognition, electronic equipment and readable storage medium
CN113380254A (en) * 2021-06-21 2021-09-10 紫优科技(深圳)有限公司 Voice recognition method, device and medium based on cloud computing and edge computing
CN113380253A (en) * 2021-06-21 2021-09-10 紫优科技(深圳)有限公司 Voice recognition system, device and medium based on cloud computing and edge computing
WO2022057152A1 (en) * 2020-09-18 2022-03-24 广州橙行智动汽车科技有限公司 Voice interaction method, server, and computer-readable storage medium
CN113380254B (en) * 2021-06-21 2024-05-24 枣庄福缘网络科技有限公司 Voice recognition method, device and medium based on cloud computing and edge computing

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0407900D0 (en) * 2004-04-07 2004-05-12 Mitel Networks Corp Method and apparatus for improving hands-free speech recognition using beamforming technology
WO2006037219A1 (en) * 2004-10-05 2006-04-13 Inago Corporation System and methods for improving accuracy of speech recognition
CN102074231A (en) * 2010-12-30 2011-05-25 万音达有限公司 Voice recognition method and system
US20110144986A1 (en) * 2009-12-10 2011-06-16 Microsoft Corporation Confidence calibration in automatic speech recognition systems
CN102708865A (en) * 2012-04-25 2012-10-03 北京车音网科技有限公司 Method, device and system for voice recognition
CN102800313A (en) * 2011-05-25 2012-11-28 上海先先信息科技有限公司 Method for supporting multi-voice recognition engine in Voice extensible markup language (XML) 2.0
CN102831157A (en) * 2012-07-04 2012-12-19 四川长虹电器股份有限公司 Semanteme recognition and search method and system
US20130132086A1 (en) * 2011-11-21 2013-05-23 Robert Bosch Gmbh Methods and systems for adapting grammars in hybrid speech recognition engines for enhancing local sr performance
CN103440867A (en) * 2013-08-02 2013-12-11 安徽科大讯飞信息科技股份有限公司 Method and system for recognizing voice
CN104795069A (en) * 2014-01-21 2015-07-22 腾讯科技(深圳)有限公司 Speech recognition method and server
US20150348539A1 (en) * 2013-11-29 2015-12-03 Mitsubishi Electric Corporation Speech recognition system
CN105719649A (en) * 2016-01-19 2016-06-29 百度在线网络技术(北京)有限公司 Voice recognition method and device
US20160358606A1 (en) * 2015-06-06 2016-12-08 Apple Inc. Multi-Microphone Speech Recognition Systems and Related Techniques
CN106328148A (en) * 2016-08-19 2017-01-11 上汽通用汽车有限公司 Natural speech recognition method, natural speech recognition device and natural speech recognition system based on local and cloud hybrid recognition
US20170140759A1 (en) * 2015-11-13 2017-05-18 Microsoft Technology Licensing, Llc Confidence features for automated speech recognition arbitration
CN106710586A (en) * 2016-12-27 2017-05-24 北京智能管家科技有限公司 Speech recognition engine automatic switching method and device

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0407900D0 (en) * 2004-04-07 2004-05-12 Mitel Networks Corp Method and apparatus for improving hands-free speech recognition using beamforming technology
WO2006037219A1 (en) * 2004-10-05 2006-04-13 Inago Corporation System and methods for improving accuracy of speech recognition
US20110144986A1 (en) * 2009-12-10 2011-06-16 Microsoft Corporation Confidence calibration in automatic speech recognition systems
CN102074231A (en) * 2010-12-30 2011-05-25 万音达有限公司 Voice recognition method and system
CN102800313A (en) * 2011-05-25 2012-11-28 上海先先信息科技有限公司 Method for supporting multi-voice recognition engine in Voice extensible markup language (XML) 2.0
US20130132086A1 (en) * 2011-11-21 2013-05-23 Robert Bosch Gmbh Methods and systems for adapting grammars in hybrid speech recognition engines for enhancing local sr performance
CN102708865A (en) * 2012-04-25 2012-10-03 北京车音网科技有限公司 Method, device and system for voice recognition
CN102831157A (en) * 2012-07-04 2012-12-19 四川长虹电器股份有限公司 Semanteme recognition and search method and system
CN103440867A (en) * 2013-08-02 2013-12-11 安徽科大讯飞信息科技股份有限公司 Method and system for recognizing voice
US20150348539A1 (en) * 2013-11-29 2015-12-03 Mitsubishi Electric Corporation Speech recognition system
CN104795069A (en) * 2014-01-21 2015-07-22 腾讯科技(深圳)有限公司 Speech recognition method and server
US20160358606A1 (en) * 2015-06-06 2016-12-08 Apple Inc. Multi-Microphone Speech Recognition Systems and Related Techniques
US20170140759A1 (en) * 2015-11-13 2017-05-18 Microsoft Technology Licensing, Llc Confidence features for automated speech recognition arbitration
CN105719649A (en) * 2016-01-19 2016-06-29 百度在线网络技术(北京)有限公司 Voice recognition method and device
CN106328148A (en) * 2016-08-19 2017-01-11 上汽通用汽车有限公司 Natural speech recognition method, natural speech recognition device and natural speech recognition system based on local and cloud hybrid recognition
CN106710586A (en) * 2016-12-27 2017-05-24 北京智能管家科技有限公司 Speech recognition engine automatic switching method and device

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956955A (en) * 2019-12-10 2020-04-03 苏州思必驰信息科技有限公司 Voice interaction method and device
CN110956955B (en) * 2019-12-10 2022-08-05 思必驰科技股份有限公司 Voice interaction method and device
CN111049996A (en) * 2019-12-26 2020-04-21 苏州思必驰信息科技有限公司 Multi-scene voice recognition method and device and intelligent customer service system applying same
US11721328B2 (en) 2019-12-31 2023-08-08 Ai Speech Co., Ltd. Method and apparatus for awakening skills by speech
WO2021135561A1 (en) * 2019-12-31 2021-07-08 思必驰科技股份有限公司 Skill voice wake-up method and apparatus
CN111933118A (en) * 2020-08-17 2020-11-13 苏州思必驰信息科技有限公司 Method and device for optimizing voice recognition and intelligent voice dialogue system applying same
WO2022057152A1 (en) * 2020-09-18 2022-03-24 广州橙行智动汽车科技有限公司 Voice interaction method, server, and computer-readable storage medium
CN112786055A (en) * 2020-12-25 2021-05-11 北京百度网讯科技有限公司 Resource mounting method, device, equipment, storage medium and computer program product
CN112861542A (en) * 2020-12-31 2021-05-28 思必驰科技股份有限公司 Method and device for limiting scene voice interaction
CN112861542B (en) * 2020-12-31 2023-05-26 思必驰科技股份有限公司 Method and device for voice interaction in limited scene
CN112787899B (en) * 2021-01-08 2022-10-28 青岛海尔特种电冰箱有限公司 Equipment voice interaction method, computer readable storage medium and refrigerator
CN112787899A (en) * 2021-01-08 2021-05-11 青岛海尔特种电冰箱有限公司 Equipment voice interaction method, computer readable storage medium and refrigerator
CN113327602A (en) * 2021-05-13 2021-08-31 北京百度网讯科技有限公司 Method and device for speech recognition, electronic equipment and readable storage medium
CN113380253A (en) * 2021-06-21 2021-09-10 紫优科技(深圳)有限公司 Voice recognition system, device and medium based on cloud computing and edge computing
CN113380254A (en) * 2021-06-21 2021-09-10 紫优科技(深圳)有限公司 Voice recognition method, device and medium based on cloud computing and edge computing
CN113380254B (en) * 2021-06-21 2024-05-24 枣庄福缘网络科技有限公司 Voice recognition method, device and medium based on cloud computing and edge computing

Similar Documents

Publication Publication Date Title
CN110299136A (en) A kind of processing method and its system for speech recognition
JP7297836B2 (en) Voice user interface shortcuts for assistant applications
CN110770736B (en) Exporting dialog-driven applications to a digital communication platform
CN106486120B (en) Interactive voice response method and answering system
CN105575386B (en) Audio recognition method and device
CN103995716B (en) A kind of terminal applies startup method and terminal
CN110491383A (en) A kind of voice interactive method, device, system, storage medium and processor
US11727939B2 (en) Voice-controlled management of user profiles
CN109428719A (en) A kind of auth method, device and equipment
CN109522083A (en) A kind of intelligent page response interactive system and method
CN109658579A (en) A kind of access control method, system, equipment and storage medium
CN110459222A (en) Sound control method, phonetic controller and terminal device
CN107909998A (en) Phonetic order processing method, device, computer equipment and storage medium
CN111261151B (en) Voice processing method and device, electronic equipment and storage medium
CN109688036A (en) A kind of control method of intelligent appliance, device, intelligent appliance and storage medium
CN109065040A (en) A kind of voice information processing method and intelligent electric appliance
CN111312230B (en) Voice interaction monitoring method and device for voice conversation platform
US11538464B2 (en) Speech recognition using data analysis and dilation of speech content from separated audio input
CN109741737A (en) A kind of method and device of voice control
CN111161726B (en) Intelligent voice interaction method, device, medium and system
JP7436077B2 (en) Skill voice wake-up method and device
CN110797031A (en) Voice change detection method, system, mobile terminal and storage medium
CN112634897B (en) Equipment awakening method and device, storage medium and electronic device
CN111462741A (en) Voice data processing method, device and storage medium
US20200092322A1 (en) VALIDATING COMMANDS FOR HACKING AND SPOOFING PREVENTION IN AN INTERNET OF THINGS (IoT) COMPUTING ENVIRONMENT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination