CN105529030A - Speech recognition processing method and device - Google Patents

Speech recognition processing method and device Download PDF

Info

Publication number
CN105529030A
CN105529030A CN201511016852.0A CN201511016852A CN105529030A CN 105529030 A CN105529030 A CN 105529030A CN 201511016852 A CN201511016852 A CN 201511016852A CN 105529030 A CN105529030 A CN 105529030A
Authority
CN
China
Prior art keywords
voice
represent
user
speech recognition
feedback
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201511016852.0A
Other languages
Chinese (zh)
Other versions
CN105529030B (en
Inventor
吴世伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201511016852.0A priority Critical patent/CN105529030B/en
Publication of CN105529030A publication Critical patent/CN105529030A/en
Application granted granted Critical
Publication of CN105529030B publication Critical patent/CN105529030B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Abstract

The invention provides a speech recognition processing method and device. The speech recognition processing method comprises: receiving speech signals; extracting multiple pieces of feature information from the speech signals; calculating a feedback function according to the multiple pieces of feature information in the speech signals; and establishing a decision model of speech recognition according to the feedback function. By adopting the speech recognition processing method, the speech recognition accuracy can be improved, the smoothness of speech interaction between a user and a speech recognition system is improved, and the user experience is promoted.

Description

Voice recognition processing method and apparatus
Technical field
The present invention relates to technical field of voice recognition, particularly relate to a kind of voice recognition processing method and apparatus.
Background technology
In man machine language is mutual, speech recognition system needs to process diversified voice request, and the target of speech recognition system is exactly feed back to the most comfortable feedback result of user.But due to the diversity of voice signal and external environment, the feedback system of speech recognition system also need because of time and determine.
At present, speech recognition system, after receiving the voice request of user, can carry out the identification of corresponding phonetics and semantics usually to this voice request, when after identification user view, operates accordingly according to voice request.But, current Problems existing is, if speech recognition system does not identify user view according to the voice request of user, voice request is re-entered after needing user to operate, complex operation when causing user to use speech recognition system, the accuracy rate of speech recognition is low, and interactive voice process is level and smooth not, and the experience of user is also bad.
Summary of the invention
The present invention is intended to solve one of technical matters in correlation technique at least to a certain extent.
For this reason, first object of the present invention is to propose a kind of voice recognition processing method, this voice recognition processing method can improve the accuracy rate of speech recognition, and raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
Second object of the present invention is to propose a kind of voice recognition processing device.
For reaching above-mentioned purpose, first aspect present invention embodiment proposes a kind of voice recognition processing method, comprises the following steps: received speech signal; Extract the multiple characteristic informations in described voice signal; Feedback function is calculated according to the multiple characteristic informations in described voice signal; And the decision model of speech recognition is set up according to described feedback function.
The voice recognition processing method of the embodiment of the present invention, for the voice signal received, extract the recognition result of voice signal, result of voice analysis, the information structuring rejuction rulees such as dialogue state, the method that usage data drives carries out the training of decision model, make speech recognition system when carrying out speech recognition, can expect to carry out corresponding feedback according to the feedback after decision model process mutual, for the effective input assert after decision model process, all give clear and definite feedback, instead of be interpreted as noise, thus the accuracy rate of speech recognition can be improved, raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
For reaching above-mentioned purpose, second aspect present invention embodiment proposes a kind of voice recognition processing device, comprising: receiver module, for received speech signal; Extraction module, for extracting the multiple characteristic informations in described voice signal; Computing module, for calculating feedback function according to the multiple characteristic informations in described voice signal; And set up module, for setting up the decision model of speech recognition according to described feedback function.
The voice recognition processing device of the embodiment of the present invention, for the voice signal received, extract the recognition result of voice signal, result of voice analysis, the information structuring rejuction rulees such as dialogue state, the method that usage data drives carries out the training of decision model, make speech recognition system when carrying out speech recognition, can expect to carry out corresponding feedback according to the feedback after decision model process mutual, for the effective input assert after decision model process, all give clear and definite feedback, instead of be interpreted as noise, thus the accuracy rate of speech recognition can be improved, raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
The aspect that the present invention adds and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.
Accompanying drawing explanation
The present invention above-mentioned and/or additional aspect and advantage will become obvious and easy understand from the following description of the accompanying drawings of embodiments, wherein:
Fig. 1 is the process flow diagram of the voice recognition processing method of one embodiment of the invention;
Fig. 2 is the process flow diagram of the voice recognition processing method of another embodiment of the present invention;
Fig. 3 is the structural representation of the voice recognition processing device of one embodiment of the invention; And
Fig. 4 is the structural representation of the voice recognition processing device of another embodiment of the present invention.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Be exemplary below by the embodiment be described with reference to the drawings, be intended to for explaining the present invention, and can not limitation of the present invention be interpreted as.
In addition, term " first ", " second " only for describing object, and can not be interpreted as instruction or hint relative importance or imply the quantity indicating indicated technical characteristic.Thus, be limited with " first ", the feature of " second " can express or impliedly comprise one or more these features.In describing the invention, the implication of " multiple " is two or more, unless otherwise expressly limited specifically.
Describe and can be understood in process flow diagram or in this any process otherwise described or method, represent and comprise one or more for realizing the module of the code of the executable instruction of the step of specific logical function or process, fragment or part, and the scope of the preferred embodiment of the present invention comprises other realization, wherein can not according to order that is shown or that discuss, comprise according to involved function by the mode while of basic or by contrary order, carry out n-back test, this should understand by embodiments of the invention person of ordinary skill in the field.
Below with reference to the accompanying drawings voice recognition processing method and apparatus according to the embodiment of the present invention is described.
Fig. 1 is the process flow diagram of the voice recognition processing method of one embodiment of the invention.
As shown in Figure 1, voice recognition processing method comprises:
S101, received speech signal.
Particularly, receive the voice signal of user's input, wherein, user can send voice signal by equipment such as microphones.
S102, extracts the multiple characteristic informations in voice signal.
Wherein, multiple characteristic information comprises and refuses to know mark, semantic analysis result, semantic parsing degree of confidence and language model degree of confidence.
Particularly, first the voice signal that user inputs is divided into multiple phrase sound, and it is quiet to remove in these phrase sounds, more multiple phrase cent is not inputed to speech recognition engine.The context Dynamic Selection language model that speech recognition engine is talked with according to interactive voice processes phrase sound, obtain corresponding recognition result or refuse to know mark, and then, recognition result can input to semantic analyzer and carry out context-sensitive semanteme parsing, obtains corresponding semantic analysis result.Meanwhile, after Speech processing is completed, the characteristic informations such as the speech analysis degree of confidence also during acquisition speech analysis and language model degree of confidence.
S103, calculates feedback function according to the multiple characteristic informations in voice signal.
In one embodiment of the invention, according to following formulae discovery feedback function:
R=-(w in i+ w en e+ w fn f+ w rejn rej+ w s1n sem+ w s2f sem+ w lmf lm), wherein, R represents feedback function, n irepresent dialog turns, n erepresent error number, n frepresent known slot quantity, n rejrepresent and refuse to know mark, n semrepresent semantic analysis result, f semrepresent semantic and resolve degree of confidence, f lmrepresentation language model confidence, w represents parameter.
Particularly, feedback function is calculated in conjunction with all utilizable characteristic informations, that is, user feedback mark is carried out in the process that speech recognition system identifies the voice signal that user inputs, mutual input for user judges, such as, interactive dialogue performance level, whether user provides the expressing information of cooperation to mark etc.
In the process that speech recognition system identifies the voice signal that user inputs, in order to the feedback information that can accurately catch user to give, wherein feedback information comprises positive feedback and negative feedback, therefore feedback function reasonable in design is needed, such as the computing formula of above-mentioned shown feedback function.Wherein, n erepresenting error number, is give tacit consent in speech recognition system.N rejfor refusing to know mark, n rejcan be 1 or-1, n rejbe 1 represent voice signal and normally identified, and n rejfor-1 represent voice signal refused know.N semfor semantic analysis result, n semcan be 1 ,-1 or-2, n sembe that 1 representative is carried out obtaining meeting contextual correct parsing, n after semanteme is resolved to voice signal semfor-1 representative is carried out correctly being resolved after semanteme is resolved but not meeting context to voice signal, and n semfor-2 representatives carry out semantic parsing failure of resolving to voice signal.Thus, mark n is known according to refusing rej, semantic analysis result n sem, semantic resolve degree of confidence f semwith language model degree of confidence f lmfeedback function can be calculated etc. parameter with reference to above-mentioned formula, can judge that the feedback of user is positive feedback or negative feedback according to feedback function R.
S104, sets up the decision model of speech recognition according to feedback function.
In one embodiment of the invention, the decision model of speech recognition is set up according to following formula:
Q(s,a)=R(s,a)+r∑ s′P(s′|s,a)max d′Q(s′,a′),
Wherein, Q represents that feedback is expected, s and s ' represents system state node, a and a ' represents decision-making action, and P represents the redirect probability between state in decision-making action.
Particularly, after the feedback provided according to user calculates feedback function, bonus point is carried out to the positive feedback of user, deduction is carried out to the negative feedback of user, and then, use Markovian decision algorithm, namely set up decision model according to above-mentioned formula.For objective function, value iteration (valueiteration) algorithm of standard can be used to carry out parametric solution, the parameter of feedback function and the redirect probability of state can be obtained through training.
The voice recognition processing method of the embodiment of the present invention, for the voice signal received, extract the recognition result of voice signal, result of voice analysis, the information structuring rejuction rulees such as dialogue state, the method that usage data drives carries out the training of decision model, make speech recognition system when carrying out speech recognition, can expect to carry out corresponding feedback according to the feedback after decision model process mutual, for the effective input assert after decision model process, all give clear and definite feedback, instead of be interpreted as noise, thus the accuracy rate of speech recognition can be improved, raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
Fig. 2 is the process flow diagram of the voice recognition processing method of another embodiment of the present invention.
As shown in Figure 2, voice recognition processing method comprises:
S201, received speech signal.
Particularly, receive the voice signal of user's input, wherein, user can send voice signal by equipment such as microphones.
S202, extracts the multiple characteristic informations in voice signal.
Wherein, multiple characteristic information comprises and refuses to know mark, semantic analysis result, semantic parsing degree of confidence and language model degree of confidence.
Particularly, first the voice signal that user inputs is divided into multiple phrase sound, and it is quiet to remove in these phrase sounds, more multiple phrase cent is not inputed to speech recognition engine.The context Dynamic Selection language model that speech recognition engine is talked with according to interactive voice processes phrase sound, obtain corresponding recognition result or refuse to know mark, and then, recognition result can input to semantic analyzer and carry out context-sensitive semanteme parsing, obtains corresponding semantic analysis result.Meanwhile, after Speech processing is completed, the characteristic informations such as the speech analysis degree of confidence also during acquisition speech analysis and language model degree of confidence.
S203, calculates feedback function according to the multiple characteristic informations in voice signal.
In one embodiment of the invention, according to following formulae discovery feedback function:
R=-(w in i+ w en e+ w fn f+ w rejn rej+ w s1n sem+ w s2f sem+ w lmf lm), wherein, R represents feedback function, n irepresent dialog turns, n erepresent error number, n frepresent known slot quantity, n rejrepresent and refuse to know mark, n semrepresent semantic analysis result, f semrepresent semantic and resolve degree of confidence, f lmrepresentation language model confidence, w represents parameter.
Particularly, feedback function is calculated in conjunction with all utilizable characteristic informations, that is, user feedback mark is carried out in the process that speech recognition system identifies the voice signal that user inputs, mutual input for user judges, such as, interactive dialogue performance level, whether user provides the expressing information of cooperation to mark etc.
In the process that speech recognition system identifies the voice signal that user inputs, in order to the feedback information that can accurately catch user to give, wherein feedback information comprises positive feedback and negative feedback, therefore feedback function reasonable in design is needed, such as the computing formula of above-mentioned shown feedback function.Wherein, n erepresenting error number, is give tacit consent in speech recognition system.N rejfor refusing to know mark, n rejcan be 1 or-1, n rejbe 1 represent voice signal and normally identified, and n rejfor-1 represent voice signal refused know.N semfor semantic analysis result, n semcan be 1 ,-1 or-2, n sembe that 1 representative is carried out obtaining meeting contextual correct parsing, n after semanteme is resolved to voice signal semfor-1 representative is carried out correctly being resolved after semanteme is resolved but not meeting context to voice signal, and n semfor-2 representatives carry out semantic parsing failure of resolving to voice signal.Thus, mark n is known according to refusing rej, semantic analysis result n sem, semantic resolve degree of confidence f semwith language model degree of confidence f lmfeedback function can be calculated etc. parameter with reference to above-mentioned formula, can judge that the feedback of user is positive feedback or negative feedback according to feedback function R.
S204, sets up the decision model of speech recognition according to feedback function.
In one embodiment of the invention, the decision model of speech recognition is set up according to following formula:
Q(s,a)=R(s,a)+r∑ s′P(s′|s,a)max d′Q(s′,a′),
Wherein, Q represents that feedback is expected, s and s ' represents system state node, a and a ' represents decision-making action, and P represents the redirect probability between state in decision-making action.
Particularly, after the feedback provided according to user calculates feedback function, bonus point is carried out to the positive feedback of user, deduction is carried out to the negative feedback of user, and then, use Markovian decision algorithm, namely set up decision model according to above-mentioned formula.For objective function, value iteration (valueiteration) algorithm of standard can be used to carry out parametric solution, the parameter of feedback function and the redirect probability of state can be obtained through training.
S205, obtains the interactive voice information of user's input, and processes the interactive voice information that user inputs according to decision model, and selects corresponding interactive strategy and user to carry out interactive voice.
Wherein, interactive strategy can comprise such as boot policy, ignore strategy and clarification strategy etc., when the interactive voice information of speech recognition system identification user is noise, can the clear expression of positive guide user positive guide user, and when identifying that the interactive voice information of user has ambiguity or understands fuzzy, should confirm.That is, user and the mutual each dialogue of speech recognition system may have noise, unsharp answer, fuzzy semanteme or complete response, and several strategy such as speech recognition system can be selected to guide, ignores, clarification.
Such as, interactive voice engine exports voice " you will determine hotel in which city ", user input voice " En En; ... " assert it is noise after the speech recognition that interactive voice engine inputs user based on decision model, therefore select the strategy that user is guided, export voice and " the city title that you want to move in please be say ".Now, user input voice " Beijing weather how ", assert it is not noise data after the speech recognition that interactive voice engine inputs user based on decision model, that city title still has ambiguity, therefore select the strategy that user view is confirmed, export voice " could you tell me and want to order hotel in Beijing? "Now, user input voice " yes ", regarding as after the speech recognition that interactive voice engine inputs user based on decision model is the recognition result of affirmative, therefore continue to export voice " you want where order hotel in Pekinese ", thus continue to guide user and speech recognition system to carry out alternately according to user view.
The voice recognition processing method of the embodiment of the present invention, based on decision model, the voice messaging that user inputs is processed, clear and definite feedback is all given to the voice messaging being identified as effectively input, instead of be interpreted as noise, thus the feedback making voice interactive system to feed back to user the most comfortable is mutual, raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
In order to realize above-described embodiment, the present invention also proposes a kind of voice recognition processing device.
Fig. 3 is the structural representation of the voice recognition processing device of one embodiment of the invention.
As shown in Figure 3, voice recognition processing device comprises: receiver module 10, extraction module 20, computing module 30 and set up module 40.
Wherein, receiver module 10 is for received speech signal.Particularly, receiver module 10 receives the voice signal of user's input, and wherein, user can send voice signal by equipment such as microphones.
Extraction module 20 is for extracting the multiple characteristic informations in voice signal.Wherein, multiple characteristic information comprises and refuses to know mark, semantic analysis result, semantic parsing degree of confidence and language model degree of confidence.Particularly, first the voice signal that user inputs is divided into multiple phrase sound, and it is quiet to remove in these phrase sounds, more multiple phrase cent is not inputed to extraction module 20.The context Dynamic Selection language model that extraction module 20 is talked with according to interactive voice processes phrase sound, obtain corresponding recognition result or refuse to know mark, and then recognition result can input to semantic analyzer and carry out context-sensitive semanteme parsing, obtains corresponding semantic analysis result.Meanwhile, after completing Speech processing, extraction module 20 also obtains the characteristic information such as speech analysis degree of confidence and language model degree of confidence during speech analysis.
Computing module 30 is for calculating feedback function according to the multiple characteristic informations in voice signal.
In one embodiment of the invention, according to following formulae discovery feedback function:
R=-(w in i+ w en e+ w fn f+ w rejn rej+ w s1n sem+ w s2f sem+ w lmf lm), wherein, R represents feedback function, n irepresent dialog turns, n erepresent error number, n frepresent known slot quantity, n rejrepresent and refuse to know mark, n semrepresent semantic analysis result, f semrepresent semantic and resolve degree of confidence, f lmrepresentation language model confidence, w represents parameter.Particularly, computing module 30 calculates feedback function in conjunction with all utilizable characteristic informations, that is, in the process that speech recognition system identifies the voice signal that user inputs, computing module 30 carries out user feedback mark, mutual input for user judges, such as, interactive dialogue performance level, whether user provides the expressing information of cooperation to mark etc.
In the process that speech recognition system identifies the voice signal that user inputs, in order to the feedback information that can accurately catch user to give, wherein feedback information comprises positive feedback and negative feedback, therefore feedback function reasonable in design is needed, such as the computing formula of above-mentioned shown feedback function.Wherein, n erepresenting error number, is give tacit consent in speech recognition system.N rejfor refusing to know mark, n rejcan be 1 or-1, n rejbe 1 represent voice signal and normally identified, and n rejfor-1 represent voice signal refused know.N semfor semantic analysis result, n semcan be 1 ,-1 or-2, n sembe that 1 representative is carried out obtaining meeting contextual correct parsing, n after semanteme is resolved to voice signal semfor-1 representative is carried out correctly being resolved after semanteme is resolved but not meeting context to voice signal, and n semfor-2 representatives carry out semantic parsing failure of resolving to voice signal.Thus, computing module 30 knows mark n according to refusing rej, semantic analysis result n sem, semantic resolve degree of confidence f semwith language model degree of confidence f lmfeedback function can be calculated etc. parameter with reference to above-mentioned formula, can judge that the feedback of user is positive feedback or negative feedback according to feedback function R.
Set up module 40 for setting up the decision model of speech recognition according to feedback function.
In one embodiment of the invention, the decision model of speech recognition is set up according to following formula:
Q(s,a)=R(s,a)+r∑ s′P(s′|s,a)max d′Q(s′,a′),
Wherein, Q represents that feedback is expected, s and s ' represents system state node, a and a ' represents decision-making action, and P represents the redirect probability between state in decision-making action.
Particularly, after computing module 30 calculates feedback function according to the feedback that user provides, the positive feedback setting up module 40 couples of users carries out bonus point, deduction is carried out to the negative feedback of user, and then, set up module 40 and use Markovian decision algorithm, namely set up decision model according to above-mentioned formula.For objective function, value iteration (valueiteration) algorithm of standard can be used to carry out parametric solution, the parameter of feedback function and the redirect probability of state can be obtained through training.
The voice recognition processing device of the embodiment of the present invention, for the voice signal received, extract the recognition result of voice signal, result of voice analysis, the information structuring rejuction rulees such as dialogue state, the method that usage data drives carries out the training of decision model, make speech recognition system when carrying out speech recognition, can expect to carry out corresponding feedback according to the feedback after decision model process mutual, for the effective input assert after decision model process, all give clear and definite feedback, instead of be interpreted as noise, thus the accuracy rate of speech recognition can be improved, raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
Fig. 4 is the structural representation of the voice recognition processing device of another embodiment of the present invention.
As shown in Figure 4, voice recognition processing device comprises: receiver module 10, extraction module 20, computing module 30, set up module 40, acquisition module 50 and processing module 60.
Wherein, acquisition module 50 is for obtaining the interactive voice information of user's input.Processing module 60 for processing the interactive voice information that user inputs according to decision model, and selects corresponding interactive strategy and user to carry out interactive voice.Wherein, interactive strategy can comprise such as boot policy, ignore strategy and clarification strategy etc., when the interactive voice information of speech recognition system identification user is noise, can the clear expression of positive guide user positive guide user, and when identifying that the interactive voice information of user has ambiguity or understands fuzzy, should confirm.That is, user and the mutual each dialogue of speech recognition system may have noise, unsharp answer, fuzzy semanteme or complete response, and several strategy such as speech recognition system can be selected to guide, ignores, clarification.
The voice recognition processing device of the embodiment of the present invention, based on decision model, the voice messaging that user inputs is processed, clear and definite feedback is all given to the voice messaging being identified as effectively input, instead of be interpreted as noise, thus the feedback making voice interactive system to feed back to user the most comfortable is mutual, raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, multiple step or method can with to store in memory and the software performed by suitable instruction execution system or firmware realize.Such as, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: the discrete logic with the logic gates for realizing logic function to data-signal, there is the special IC of suitable combinational logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc.
In the present invention, unless otherwise clearly defined and limited, term " installation ", " being connected ", " connection ", etc. term should be interpreted broadly, such as, can be fixedly connected with, also can be removably connect, or integral; Can be mechanical connection, also can be electrical connection; Can be directly be connected, also indirectly can be connected by intermediary, can be the connection of two element internals or the interaction relationship of two elements, unless otherwise clear and definite restriction.For the ordinary skill in the art, above-mentioned term concrete meaning in the present invention can be understood as the case may be.
In the description of this instructions, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not must for be identical embodiment or example.And the specific features of description, structure, material or feature can combine in one or more embodiment in office or example in an appropriate manner.In addition, when not conflicting, the feature of the different embodiment described in this instructions or example and different embodiment or example can carry out combining and combining by those skilled in the art.
Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, and those of ordinary skill in the art can change above-described embodiment within the scope of the invention, revises, replace and modification.

Claims (10)

1. a voice recognition processing method, is characterized in that, comprises the following steps:
Received speech signal;
Extract the multiple characteristic informations in described voice signal;
Feedback function is calculated according to the multiple characteristic informations in described voice signal; And
The decision model of speech recognition is set up according to described feedback function.
2. voice recognition processing method as claimed in claim 1, is characterized in that, described multiple characteristic information comprises to be refused to know mark, semantic analysis result, semantic parsing degree of confidence and language model degree of confidence.
3. voice recognition processing method as claimed in claim 1 or 2, is characterized in that, feedback function according to following formulae discovery:
R=-(w in i+ w en e+ w fn f+ w rejn rej+ w s1n sem+ w s2f sem+ w lms lm), wherein, R represents feedback function, n irepresent dialog turns, n erepresent error number, n frepresent known slot quantity, n rejrepresent and refuse to know mark, n semrepresent semantic analysis result, f semrepresent semantic and resolve degree of confidence, s lmrepresentation language model confidence, w represents parameter.
4. voice recognition processing method as claimed in claim 3, the decision model of described speech recognition is set up according to following formula:
Q(s,a)=R(s,a)+r∑ s′P(s′|s,a)max d′Q(s′,a′),
Wherein, Q represents that feedback is expected, s and s ' represents system state node, a and a ' represents decision-making action, and P represents the redirect probability between state in decision-making action.
5. the voice recognition processing method as described in any one of claim 1-4, is characterized in that, after the decision model setting up speech recognition according to described feedback function, also comprises:
Obtain the interactive voice information of user's input, and according to described decision model, the interactive voice information that described user inputs is processed, and select corresponding interactive strategy and described user to carry out interactive voice.
6. a voice recognition processing device, is characterized in that, comprising:
Receiver module, for received speech signal;
Extraction module, for extracting the multiple characteristic informations in described voice signal;
Computing module, for calculating feedback function according to the multiple characteristic informations in described voice signal; And
Set up module, for setting up the decision model of speech recognition according to described feedback function.
7. voice recognition processing device as claimed in claim 6, is characterized in that, described multiple characteristic information comprises to be refused to know mark, semantic analysis result, semantic parsing degree of confidence and language model degree of confidence.
8. voice recognition processing device as claimed in claims 6 or 7, it is characterized in that, described computing module is feedback function according to following formulae discovery:
R=-(w in i+ w en e+ w fn f+ w rejn rej+ w s1n sem+ w s2f sem+ w lmf lm), wherein, R represents feedback function, n irepresent dialog turns, n erepresent error number, n frepresent known slot quantity, n rejrepresent and refuse to know mark, n semrepresent semantic analysis result, f semrepresent semantic and resolve degree of confidence, f lmrepresentation language model confidence, w represents parameter.
9. voice recognition processing device as claimed in claim 8, describedly set up module sets up described speech recognition decision model according to following formula:
Q(s,a)=R(s,a)+r∑ s′P(s′|s,a)max d′Q(s′,a′),
Wherein, Q represents that feedback is expected, s and s ' represents system state node, a and a ' represents decision-making action, and P represents the redirect probability between state in decision-making action.
10. the voice recognition processing device as described in any one of claim 6-9, is characterized in that, also comprise:
Acquisition module, for obtaining the interactive voice information of user's input;
Processing module, for processing the interactive voice information that described user inputs according to described decision model, and selects corresponding interactive strategy and described user to carry out interactive voice.
CN201511016852.0A 2015-12-29 2015-12-29 Voice recognition processing method and device Active CN105529030B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511016852.0A CN105529030B (en) 2015-12-29 2015-12-29 Voice recognition processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511016852.0A CN105529030B (en) 2015-12-29 2015-12-29 Voice recognition processing method and device

Publications (2)

Publication Number Publication Date
CN105529030A true CN105529030A (en) 2016-04-27
CN105529030B CN105529030B (en) 2020-03-03

Family

ID=55771207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511016852.0A Active CN105529030B (en) 2015-12-29 2015-12-29 Voice recognition processing method and device

Country Status (1)

Country Link
CN (1) CN105529030B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106970993A (en) * 2017-03-31 2017-07-21 百度在线网络技术(北京)有限公司 Mining model update method and device
CN107316643A (en) * 2017-07-04 2017-11-03 科大讯飞股份有限公司 Voice interactive method and device
CN107342081A (en) * 2016-04-28 2017-11-10 通用汽车环球科技运作有限责任公司 Use relative and absolute time slot data speech recognition system and method
CN107665708A (en) * 2016-07-29 2018-02-06 科大讯飞股份有限公司 Intelligent sound exchange method and system
CN109785838A (en) * 2019-01-28 2019-05-21 百度在线网络技术(北京)有限公司 Audio recognition method, device, equipment and storage medium
CN111292746A (en) * 2020-02-07 2020-06-16 普强时代(珠海横琴)信息技术有限公司 Voice input conversion system based on human-computer interaction
CN111899728A (en) * 2020-07-23 2020-11-06 海信电子科技(武汉)有限公司 Training method and device for intelligent voice assistant decision strategy
CN112002321A (en) * 2020-08-11 2020-11-27 海信电子科技(武汉)有限公司 Display device, server and voice interaction method
WO2020238341A1 (en) * 2019-05-31 2020-12-03 华为技术有限公司 Speech recognition method, apparatus and device, and computer-readable storage medium
WO2023124960A1 (en) * 2021-12-27 2023-07-06 广州小鹏汽车科技有限公司 Speech interaction method, vehicle, server, and computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060069560A1 (en) * 2004-08-31 2006-03-30 Christopher Passaretti Method and apparatus for controlling recognition results for speech recognition applications
CN1763843A (en) * 2005-11-18 2006-04-26 清华大学 Pronunciation quality evaluating method for language learning machine
CN102376182A (en) * 2010-08-26 2012-03-14 财团法人工业技术研究院 Language learning system, language learning method and program product thereof
CN103035243A (en) * 2012-12-18 2013-04-10 中国科学院自动化研究所 Real-time feedback method and system of long voice continuous recognition and recognition result
CN104795065A (en) * 2015-04-30 2015-07-22 北京车音网科技有限公司 Method for increasing speech recognition rate and electronic device
CN105070288A (en) * 2015-07-02 2015-11-18 百度在线网络技术(北京)有限公司 Vehicle-mounted voice instruction recognition method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060069560A1 (en) * 2004-08-31 2006-03-30 Christopher Passaretti Method and apparatus for controlling recognition results for speech recognition applications
CN1763843A (en) * 2005-11-18 2006-04-26 清华大学 Pronunciation quality evaluating method for language learning machine
CN102376182A (en) * 2010-08-26 2012-03-14 财团法人工业技术研究院 Language learning system, language learning method and program product thereof
CN103035243A (en) * 2012-12-18 2013-04-10 中国科学院自动化研究所 Real-time feedback method and system of long voice continuous recognition and recognition result
CN104795065A (en) * 2015-04-30 2015-07-22 北京车音网科技有限公司 Method for increasing speech recognition rate and electronic device
CN105070288A (en) * 2015-07-02 2015-11-18 百度在线网络技术(北京)有限公司 Vehicle-mounted voice instruction recognition method and device

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107342081A (en) * 2016-04-28 2017-11-10 通用汽车环球科技运作有限责任公司 Use relative and absolute time slot data speech recognition system and method
CN107665708A (en) * 2016-07-29 2018-02-06 科大讯飞股份有限公司 Intelligent sound exchange method and system
CN107665708B (en) * 2016-07-29 2021-06-08 科大讯飞股份有限公司 Intelligent voice interaction method and system
CN106970993B (en) * 2017-03-31 2020-09-18 百度在线网络技术(北京)有限公司 Mining model updating method and device
CN106970993A (en) * 2017-03-31 2017-07-21 百度在线网络技术(北京)有限公司 Mining model update method and device
CN107316643A (en) * 2017-07-04 2017-11-03 科大讯飞股份有限公司 Voice interactive method and device
CN107316643B (en) * 2017-07-04 2021-08-17 科大讯飞股份有限公司 Voice interaction method and device
CN109785838A (en) * 2019-01-28 2019-05-21 百度在线网络技术(北京)有限公司 Audio recognition method, device, equipment and storage medium
CN109785838B (en) * 2019-01-28 2021-08-31 百度在线网络技术(北京)有限公司 Voice recognition method, device, equipment and storage medium
WO2020238341A1 (en) * 2019-05-31 2020-12-03 华为技术有限公司 Speech recognition method, apparatus and device, and computer-readable storage medium
CN111292746A (en) * 2020-02-07 2020-06-16 普强时代(珠海横琴)信息技术有限公司 Voice input conversion system based on human-computer interaction
CN111899728A (en) * 2020-07-23 2020-11-06 海信电子科技(武汉)有限公司 Training method and device for intelligent voice assistant decision strategy
CN112002321A (en) * 2020-08-11 2020-11-27 海信电子科技(武汉)有限公司 Display device, server and voice interaction method
CN112002321B (en) * 2020-08-11 2023-09-19 海信电子科技(武汉)有限公司 Display device, server and voice interaction method
WO2023124960A1 (en) * 2021-12-27 2023-07-06 广州小鹏汽车科技有限公司 Speech interaction method, vehicle, server, and computer readable storage medium

Also Published As

Publication number Publication date
CN105529030B (en) 2020-03-03

Similar Documents

Publication Publication Date Title
CN105529030A (en) Speech recognition processing method and device
US9899021B1 (en) Stochastic modeling of user interactions with a detection system
JP6138675B2 (en) Speech recognition using parallel recognition tasks.
CN102282609B (en) System and method for recognizing proper names in dialog systems
JP6101196B2 (en) Voice identification method and apparatus
KR101828273B1 (en) Apparatus and method for voice command recognition based on combination of dialog models
KR101699720B1 (en) Apparatus for voice command recognition and method thereof
US11132509B1 (en) Utilization of natural language understanding (NLU) models
CN110047481B (en) Method and apparatus for speech recognition
CN105575386A (en) Method and device for voice recognition
CN105047198B (en) Voice error correction processing method and device
CN108364650B (en) Device and method for adjusting voice recognition result
CN106486120B (en) Interactive voice response method and answering system
KR101863097B1 (en) Apparatus and method for keyword recognition
US9330665B2 (en) Automatic updating of confidence scoring functionality for speech recognition systems with respect to a receiver operating characteristic curve
CN107644638A (en) Audio recognition method, device, terminal and computer-readable recording medium
Kim et al. Sequential labeling for tracking dynamic dialog states
CN114155854A (en) Voice data processing method and device
CN111640423B (en) Word boundary estimation method and device and electronic equipment
CN111866289A (en) Outbound number state detection method and device and intelligent outbound method and system
CN111063338B (en) Audio signal identification method, device, equipment, system and storage medium
CN113724698B (en) Training method, device, equipment and storage medium of voice recognition model
CN114399992B (en) Voice instruction response method, device and storage medium
JP2013064951A (en) Sound model adaptation device, adaptation method thereof and program
CN115294974A (en) Voice recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant