CN105529030A - Speech recognition processing method and device - Google Patents
Speech recognition processing method and device Download PDFInfo
- Publication number
- CN105529030A CN105529030A CN201511016852.0A CN201511016852A CN105529030A CN 105529030 A CN105529030 A CN 105529030A CN 201511016852 A CN201511016852 A CN 201511016852A CN 105529030 A CN105529030 A CN 105529030A
- Authority
- CN
- China
- Prior art keywords
- voice
- represent
- user
- speech recognition
- feedback
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 24
- 230000006870 function Effects 0.000 claims description 53
- 230000002452 interceptive effect Effects 0.000 claims description 42
- 238000004458 analytical method Methods 0.000 claims description 29
- 238000012545 processing Methods 0.000 claims description 25
- 230000009471 action Effects 0.000 claims description 10
- 239000000284 extract Substances 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 8
- 230000003993 interaction Effects 0.000 abstract description 2
- 238000000034 method Methods 0.000 description 34
- 230000008569 process Effects 0.000 description 27
- 238000012549 training Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 238000005352 clarification Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a speech recognition processing method and device. The speech recognition processing method comprises: receiving speech signals; extracting multiple pieces of feature information from the speech signals; calculating a feedback function according to the multiple pieces of feature information in the speech signals; and establishing a decision model of speech recognition according to the feedback function. By adopting the speech recognition processing method, the speech recognition accuracy can be improved, the smoothness of speech interaction between a user and a speech recognition system is improved, and the user experience is promoted.
Description
Technical field
The present invention relates to technical field of voice recognition, particularly relate to a kind of voice recognition processing method and apparatus.
Background technology
In man machine language is mutual, speech recognition system needs to process diversified voice request, and the target of speech recognition system is exactly feed back to the most comfortable feedback result of user.But due to the diversity of voice signal and external environment, the feedback system of speech recognition system also need because of time and determine.
At present, speech recognition system, after receiving the voice request of user, can carry out the identification of corresponding phonetics and semantics usually to this voice request, when after identification user view, operates accordingly according to voice request.But, current Problems existing is, if speech recognition system does not identify user view according to the voice request of user, voice request is re-entered after needing user to operate, complex operation when causing user to use speech recognition system, the accuracy rate of speech recognition is low, and interactive voice process is level and smooth not, and the experience of user is also bad.
Summary of the invention
The present invention is intended to solve one of technical matters in correlation technique at least to a certain extent.
For this reason, first object of the present invention is to propose a kind of voice recognition processing method, this voice recognition processing method can improve the accuracy rate of speech recognition, and raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
Second object of the present invention is to propose a kind of voice recognition processing device.
For reaching above-mentioned purpose, first aspect present invention embodiment proposes a kind of voice recognition processing method, comprises the following steps: received speech signal; Extract the multiple characteristic informations in described voice signal; Feedback function is calculated according to the multiple characteristic informations in described voice signal; And the decision model of speech recognition is set up according to described feedback function.
The voice recognition processing method of the embodiment of the present invention, for the voice signal received, extract the recognition result of voice signal, result of voice analysis, the information structuring rejuction rulees such as dialogue state, the method that usage data drives carries out the training of decision model, make speech recognition system when carrying out speech recognition, can expect to carry out corresponding feedback according to the feedback after decision model process mutual, for the effective input assert after decision model process, all give clear and definite feedback, instead of be interpreted as noise, thus the accuracy rate of speech recognition can be improved, raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
For reaching above-mentioned purpose, second aspect present invention embodiment proposes a kind of voice recognition processing device, comprising: receiver module, for received speech signal; Extraction module, for extracting the multiple characteristic informations in described voice signal; Computing module, for calculating feedback function according to the multiple characteristic informations in described voice signal; And set up module, for setting up the decision model of speech recognition according to described feedback function.
The voice recognition processing device of the embodiment of the present invention, for the voice signal received, extract the recognition result of voice signal, result of voice analysis, the information structuring rejuction rulees such as dialogue state, the method that usage data drives carries out the training of decision model, make speech recognition system when carrying out speech recognition, can expect to carry out corresponding feedback according to the feedback after decision model process mutual, for the effective input assert after decision model process, all give clear and definite feedback, instead of be interpreted as noise, thus the accuracy rate of speech recognition can be improved, raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
The aspect that the present invention adds and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.
Accompanying drawing explanation
The present invention above-mentioned and/or additional aspect and advantage will become obvious and easy understand from the following description of the accompanying drawings of embodiments, wherein:
Fig. 1 is the process flow diagram of the voice recognition processing method of one embodiment of the invention;
Fig. 2 is the process flow diagram of the voice recognition processing method of another embodiment of the present invention;
Fig. 3 is the structural representation of the voice recognition processing device of one embodiment of the invention; And
Fig. 4 is the structural representation of the voice recognition processing device of another embodiment of the present invention.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Be exemplary below by the embodiment be described with reference to the drawings, be intended to for explaining the present invention, and can not limitation of the present invention be interpreted as.
In addition, term " first ", " second " only for describing object, and can not be interpreted as instruction or hint relative importance or imply the quantity indicating indicated technical characteristic.Thus, be limited with " first ", the feature of " second " can express or impliedly comprise one or more these features.In describing the invention, the implication of " multiple " is two or more, unless otherwise expressly limited specifically.
Describe and can be understood in process flow diagram or in this any process otherwise described or method, represent and comprise one or more for realizing the module of the code of the executable instruction of the step of specific logical function or process, fragment or part, and the scope of the preferred embodiment of the present invention comprises other realization, wherein can not according to order that is shown or that discuss, comprise according to involved function by the mode while of basic or by contrary order, carry out n-back test, this should understand by embodiments of the invention person of ordinary skill in the field.
Below with reference to the accompanying drawings voice recognition processing method and apparatus according to the embodiment of the present invention is described.
Fig. 1 is the process flow diagram of the voice recognition processing method of one embodiment of the invention.
As shown in Figure 1, voice recognition processing method comprises:
S101, received speech signal.
Particularly, receive the voice signal of user's input, wherein, user can send voice signal by equipment such as microphones.
S102, extracts the multiple characteristic informations in voice signal.
Wherein, multiple characteristic information comprises and refuses to know mark, semantic analysis result, semantic parsing degree of confidence and language model degree of confidence.
Particularly, first the voice signal that user inputs is divided into multiple phrase sound, and it is quiet to remove in these phrase sounds, more multiple phrase cent is not inputed to speech recognition engine.The context Dynamic Selection language model that speech recognition engine is talked with according to interactive voice processes phrase sound, obtain corresponding recognition result or refuse to know mark, and then, recognition result can input to semantic analyzer and carry out context-sensitive semanteme parsing, obtains corresponding semantic analysis result.Meanwhile, after Speech processing is completed, the characteristic informations such as the speech analysis degree of confidence also during acquisition speech analysis and language model degree of confidence.
S103, calculates feedback function according to the multiple characteristic informations in voice signal.
In one embodiment of the invention, according to following formulae discovery feedback function:
R=-(w
in
i+ w
en
e+ w
fn
f+ w
rejn
rej+ w
s1n
sem+ w
s2f
sem+ w
lmf
lm), wherein, R represents feedback function, n
irepresent dialog turns, n
erepresent error number, n
frepresent known slot quantity, n
rejrepresent and refuse to know mark, n
semrepresent semantic analysis result, f
semrepresent semantic and resolve degree of confidence, f
lmrepresentation language model confidence, w represents parameter.
Particularly, feedback function is calculated in conjunction with all utilizable characteristic informations, that is, user feedback mark is carried out in the process that speech recognition system identifies the voice signal that user inputs, mutual input for user judges, such as, interactive dialogue performance level, whether user provides the expressing information of cooperation to mark etc.
In the process that speech recognition system identifies the voice signal that user inputs, in order to the feedback information that can accurately catch user to give, wherein feedback information comprises positive feedback and negative feedback, therefore feedback function reasonable in design is needed, such as the computing formula of above-mentioned shown feedback function.Wherein, n
erepresenting error number, is give tacit consent in speech recognition system.N
rejfor refusing to know mark, n
rejcan be 1 or-1, n
rejbe 1 represent voice signal and normally identified, and n
rejfor-1 represent voice signal refused know.N
semfor semantic analysis result, n
semcan be 1 ,-1 or-2, n
sembe that 1 representative is carried out obtaining meeting contextual correct parsing, n after semanteme is resolved to voice signal
semfor-1 representative is carried out correctly being resolved after semanteme is resolved but not meeting context to voice signal, and n
semfor-2 representatives carry out semantic parsing failure of resolving to voice signal.Thus, mark n is known according to refusing
rej, semantic analysis result n
sem, semantic resolve degree of confidence f
semwith language model degree of confidence f
lmfeedback function can be calculated etc. parameter with reference to above-mentioned formula, can judge that the feedback of user is positive feedback or negative feedback according to feedback function R.
S104, sets up the decision model of speech recognition according to feedback function.
In one embodiment of the invention, the decision model of speech recognition is set up according to following formula:
Q(s,a)=R(s,a)+r∑
s′P(s′|s,a)max
d′Q(s′,a′),
Wherein, Q represents that feedback is expected, s and s ' represents system state node, a and a ' represents decision-making action, and P represents the redirect probability between state in decision-making action.
Particularly, after the feedback provided according to user calculates feedback function, bonus point is carried out to the positive feedback of user, deduction is carried out to the negative feedback of user, and then, use Markovian decision algorithm, namely set up decision model according to above-mentioned formula.For objective function, value iteration (valueiteration) algorithm of standard can be used to carry out parametric solution, the parameter of feedback function and the redirect probability of state can be obtained through training.
The voice recognition processing method of the embodiment of the present invention, for the voice signal received, extract the recognition result of voice signal, result of voice analysis, the information structuring rejuction rulees such as dialogue state, the method that usage data drives carries out the training of decision model, make speech recognition system when carrying out speech recognition, can expect to carry out corresponding feedback according to the feedback after decision model process mutual, for the effective input assert after decision model process, all give clear and definite feedback, instead of be interpreted as noise, thus the accuracy rate of speech recognition can be improved, raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
Fig. 2 is the process flow diagram of the voice recognition processing method of another embodiment of the present invention.
As shown in Figure 2, voice recognition processing method comprises:
S201, received speech signal.
Particularly, receive the voice signal of user's input, wherein, user can send voice signal by equipment such as microphones.
S202, extracts the multiple characteristic informations in voice signal.
Wherein, multiple characteristic information comprises and refuses to know mark, semantic analysis result, semantic parsing degree of confidence and language model degree of confidence.
Particularly, first the voice signal that user inputs is divided into multiple phrase sound, and it is quiet to remove in these phrase sounds, more multiple phrase cent is not inputed to speech recognition engine.The context Dynamic Selection language model that speech recognition engine is talked with according to interactive voice processes phrase sound, obtain corresponding recognition result or refuse to know mark, and then, recognition result can input to semantic analyzer and carry out context-sensitive semanteme parsing, obtains corresponding semantic analysis result.Meanwhile, after Speech processing is completed, the characteristic informations such as the speech analysis degree of confidence also during acquisition speech analysis and language model degree of confidence.
S203, calculates feedback function according to the multiple characteristic informations in voice signal.
In one embodiment of the invention, according to following formulae discovery feedback function:
R=-(w
in
i+ w
en
e+ w
fn
f+ w
rejn
rej+ w
s1n
sem+ w
s2f
sem+ w
lmf
lm), wherein, R represents feedback function, n
irepresent dialog turns, n
erepresent error number, n
frepresent known slot quantity, n
rejrepresent and refuse to know mark, n
semrepresent semantic analysis result, f
semrepresent semantic and resolve degree of confidence, f
lmrepresentation language model confidence, w represents parameter.
Particularly, feedback function is calculated in conjunction with all utilizable characteristic informations, that is, user feedback mark is carried out in the process that speech recognition system identifies the voice signal that user inputs, mutual input for user judges, such as, interactive dialogue performance level, whether user provides the expressing information of cooperation to mark etc.
In the process that speech recognition system identifies the voice signal that user inputs, in order to the feedback information that can accurately catch user to give, wherein feedback information comprises positive feedback and negative feedback, therefore feedback function reasonable in design is needed, such as the computing formula of above-mentioned shown feedback function.Wherein, n
erepresenting error number, is give tacit consent in speech recognition system.N
rejfor refusing to know mark, n
rejcan be 1 or-1, n
rejbe 1 represent voice signal and normally identified, and n
rejfor-1 represent voice signal refused know.N
semfor semantic analysis result, n
semcan be 1 ,-1 or-2, n
sembe that 1 representative is carried out obtaining meeting contextual correct parsing, n after semanteme is resolved to voice signal
semfor-1 representative is carried out correctly being resolved after semanteme is resolved but not meeting context to voice signal, and n
semfor-2 representatives carry out semantic parsing failure of resolving to voice signal.Thus, mark n is known according to refusing
rej, semantic analysis result n
sem, semantic resolve degree of confidence f
semwith language model degree of confidence f
lmfeedback function can be calculated etc. parameter with reference to above-mentioned formula, can judge that the feedback of user is positive feedback or negative feedback according to feedback function R.
S204, sets up the decision model of speech recognition according to feedback function.
In one embodiment of the invention, the decision model of speech recognition is set up according to following formula:
Q(s,a)=R(s,a)+r∑
s′P(s′|s,a)max
d′Q(s′,a′),
Wherein, Q represents that feedback is expected, s and s ' represents system state node, a and a ' represents decision-making action, and P represents the redirect probability between state in decision-making action.
Particularly, after the feedback provided according to user calculates feedback function, bonus point is carried out to the positive feedback of user, deduction is carried out to the negative feedback of user, and then, use Markovian decision algorithm, namely set up decision model according to above-mentioned formula.For objective function, value iteration (valueiteration) algorithm of standard can be used to carry out parametric solution, the parameter of feedback function and the redirect probability of state can be obtained through training.
S205, obtains the interactive voice information of user's input, and processes the interactive voice information that user inputs according to decision model, and selects corresponding interactive strategy and user to carry out interactive voice.
Wherein, interactive strategy can comprise such as boot policy, ignore strategy and clarification strategy etc., when the interactive voice information of speech recognition system identification user is noise, can the clear expression of positive guide user positive guide user, and when identifying that the interactive voice information of user has ambiguity or understands fuzzy, should confirm.That is, user and the mutual each dialogue of speech recognition system may have noise, unsharp answer, fuzzy semanteme or complete response, and several strategy such as speech recognition system can be selected to guide, ignores, clarification.
Such as, interactive voice engine exports voice " you will determine hotel in which city ", user input voice " En En; ... " assert it is noise after the speech recognition that interactive voice engine inputs user based on decision model, therefore select the strategy that user is guided, export voice and " the city title that you want to move in please be say ".Now, user input voice " Beijing weather how ", assert it is not noise data after the speech recognition that interactive voice engine inputs user based on decision model, that city title still has ambiguity, therefore select the strategy that user view is confirmed, export voice " could you tell me and want to order hotel in Beijing? "Now, user input voice " yes ", regarding as after the speech recognition that interactive voice engine inputs user based on decision model is the recognition result of affirmative, therefore continue to export voice " you want where order hotel in Pekinese ", thus continue to guide user and speech recognition system to carry out alternately according to user view.
The voice recognition processing method of the embodiment of the present invention, based on decision model, the voice messaging that user inputs is processed, clear and definite feedback is all given to the voice messaging being identified as effectively input, instead of be interpreted as noise, thus the feedback making voice interactive system to feed back to user the most comfortable is mutual, raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
In order to realize above-described embodiment, the present invention also proposes a kind of voice recognition processing device.
Fig. 3 is the structural representation of the voice recognition processing device of one embodiment of the invention.
As shown in Figure 3, voice recognition processing device comprises: receiver module 10, extraction module 20, computing module 30 and set up module 40.
Wherein, receiver module 10 is for received speech signal.Particularly, receiver module 10 receives the voice signal of user's input, and wherein, user can send voice signal by equipment such as microphones.
Extraction module 20 is for extracting the multiple characteristic informations in voice signal.Wherein, multiple characteristic information comprises and refuses to know mark, semantic analysis result, semantic parsing degree of confidence and language model degree of confidence.Particularly, first the voice signal that user inputs is divided into multiple phrase sound, and it is quiet to remove in these phrase sounds, more multiple phrase cent is not inputed to extraction module 20.The context Dynamic Selection language model that extraction module 20 is talked with according to interactive voice processes phrase sound, obtain corresponding recognition result or refuse to know mark, and then recognition result can input to semantic analyzer and carry out context-sensitive semanteme parsing, obtains corresponding semantic analysis result.Meanwhile, after completing Speech processing, extraction module 20 also obtains the characteristic information such as speech analysis degree of confidence and language model degree of confidence during speech analysis.
Computing module 30 is for calculating feedback function according to the multiple characteristic informations in voice signal.
In one embodiment of the invention, according to following formulae discovery feedback function:
R=-(w
in
i+ w
en
e+ w
fn
f+ w
rejn
rej+ w
s1n
sem+ w
s2f
sem+ w
lmf
lm), wherein, R represents feedback function, n
irepresent dialog turns, n
erepresent error number, n
frepresent known slot quantity, n
rejrepresent and refuse to know mark, n
semrepresent semantic analysis result, f
semrepresent semantic and resolve degree of confidence, f
lmrepresentation language model confidence, w represents parameter.Particularly, computing module 30 calculates feedback function in conjunction with all utilizable characteristic informations, that is, in the process that speech recognition system identifies the voice signal that user inputs, computing module 30 carries out user feedback mark, mutual input for user judges, such as, interactive dialogue performance level, whether user provides the expressing information of cooperation to mark etc.
In the process that speech recognition system identifies the voice signal that user inputs, in order to the feedback information that can accurately catch user to give, wherein feedback information comprises positive feedback and negative feedback, therefore feedback function reasonable in design is needed, such as the computing formula of above-mentioned shown feedback function.Wherein, n
erepresenting error number, is give tacit consent in speech recognition system.N
rejfor refusing to know mark, n
rejcan be 1 or-1, n
rejbe 1 represent voice signal and normally identified, and n
rejfor-1 represent voice signal refused know.N
semfor semantic analysis result, n
semcan be 1 ,-1 or-2, n
sembe that 1 representative is carried out obtaining meeting contextual correct parsing, n after semanteme is resolved to voice signal
semfor-1 representative is carried out correctly being resolved after semanteme is resolved but not meeting context to voice signal, and n
semfor-2 representatives carry out semantic parsing failure of resolving to voice signal.Thus, computing module 30 knows mark n according to refusing
rej, semantic analysis result n
sem, semantic resolve degree of confidence f
semwith language model degree of confidence f
lmfeedback function can be calculated etc. parameter with reference to above-mentioned formula, can judge that the feedback of user is positive feedback or negative feedback according to feedback function R.
Set up module 40 for setting up the decision model of speech recognition according to feedback function.
In one embodiment of the invention, the decision model of speech recognition is set up according to following formula:
Q(s,a)=R(s,a)+r∑
s′P(s′|s,a)max
d′Q(s′,a′),
Wherein, Q represents that feedback is expected, s and s ' represents system state node, a and a ' represents decision-making action, and P represents the redirect probability between state in decision-making action.
Particularly, after computing module 30 calculates feedback function according to the feedback that user provides, the positive feedback setting up module 40 couples of users carries out bonus point, deduction is carried out to the negative feedback of user, and then, set up module 40 and use Markovian decision algorithm, namely set up decision model according to above-mentioned formula.For objective function, value iteration (valueiteration) algorithm of standard can be used to carry out parametric solution, the parameter of feedback function and the redirect probability of state can be obtained through training.
The voice recognition processing device of the embodiment of the present invention, for the voice signal received, extract the recognition result of voice signal, result of voice analysis, the information structuring rejuction rulees such as dialogue state, the method that usage data drives carries out the training of decision model, make speech recognition system when carrying out speech recognition, can expect to carry out corresponding feedback according to the feedback after decision model process mutual, for the effective input assert after decision model process, all give clear and definite feedback, instead of be interpreted as noise, thus the accuracy rate of speech recognition can be improved, raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
Fig. 4 is the structural representation of the voice recognition processing device of another embodiment of the present invention.
As shown in Figure 4, voice recognition processing device comprises: receiver module 10, extraction module 20, computing module 30, set up module 40, acquisition module 50 and processing module 60.
Wherein, acquisition module 50 is for obtaining the interactive voice information of user's input.Processing module 60 for processing the interactive voice information that user inputs according to decision model, and selects corresponding interactive strategy and user to carry out interactive voice.Wherein, interactive strategy can comprise such as boot policy, ignore strategy and clarification strategy etc., when the interactive voice information of speech recognition system identification user is noise, can the clear expression of positive guide user positive guide user, and when identifying that the interactive voice information of user has ambiguity or understands fuzzy, should confirm.That is, user and the mutual each dialogue of speech recognition system may have noise, unsharp answer, fuzzy semanteme or complete response, and several strategy such as speech recognition system can be selected to guide, ignores, clarification.
The voice recognition processing device of the embodiment of the present invention, based on decision model, the voice messaging that user inputs is processed, clear and definite feedback is all given to the voice messaging being identified as effectively input, instead of be interpreted as noise, thus the feedback making voice interactive system to feed back to user the most comfortable is mutual, raising user and speech recognition system carry out smoothness during interactive voice, improve the experience of user.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, multiple step or method can with to store in memory and the software performed by suitable instruction execution system or firmware realize.Such as, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: the discrete logic with the logic gates for realizing logic function to data-signal, there is the special IC of suitable combinational logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc.
In the present invention, unless otherwise clearly defined and limited, term " installation ", " being connected ", " connection ", etc. term should be interpreted broadly, such as, can be fixedly connected with, also can be removably connect, or integral; Can be mechanical connection, also can be electrical connection; Can be directly be connected, also indirectly can be connected by intermediary, can be the connection of two element internals or the interaction relationship of two elements, unless otherwise clear and definite restriction.For the ordinary skill in the art, above-mentioned term concrete meaning in the present invention can be understood as the case may be.
In the description of this instructions, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not must for be identical embodiment or example.And the specific features of description, structure, material or feature can combine in one or more embodiment in office or example in an appropriate manner.In addition, when not conflicting, the feature of the different embodiment described in this instructions or example and different embodiment or example can carry out combining and combining by those skilled in the art.
Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, and those of ordinary skill in the art can change above-described embodiment within the scope of the invention, revises, replace and modification.
Claims (10)
1. a voice recognition processing method, is characterized in that, comprises the following steps:
Received speech signal;
Extract the multiple characteristic informations in described voice signal;
Feedback function is calculated according to the multiple characteristic informations in described voice signal; And
The decision model of speech recognition is set up according to described feedback function.
2. voice recognition processing method as claimed in claim 1, is characterized in that, described multiple characteristic information comprises to be refused to know mark, semantic analysis result, semantic parsing degree of confidence and language model degree of confidence.
3. voice recognition processing method as claimed in claim 1 or 2, is characterized in that, feedback function according to following formulae discovery:
R=-(w
in
i+ w
en
e+ w
fn
f+ w
rejn
rej+ w
s1n
sem+ w
s2f
sem+ w
lms
lm), wherein, R represents feedback function, n
irepresent dialog turns, n
erepresent error number, n
frepresent known slot quantity, n
rejrepresent and refuse to know mark, n
semrepresent semantic analysis result, f
semrepresent semantic and resolve degree of confidence, s
lmrepresentation language model confidence, w represents parameter.
4. voice recognition processing method as claimed in claim 3, the decision model of described speech recognition is set up according to following formula:
Q(s,a)=R(s,a)+r∑
s′P(s′|s,a)max
d′Q(s′,a′),
Wherein, Q represents that feedback is expected, s and s ' represents system state node, a and a ' represents decision-making action, and P represents the redirect probability between state in decision-making action.
5. the voice recognition processing method as described in any one of claim 1-4, is characterized in that, after the decision model setting up speech recognition according to described feedback function, also comprises:
Obtain the interactive voice information of user's input, and according to described decision model, the interactive voice information that described user inputs is processed, and select corresponding interactive strategy and described user to carry out interactive voice.
6. a voice recognition processing device, is characterized in that, comprising:
Receiver module, for received speech signal;
Extraction module, for extracting the multiple characteristic informations in described voice signal;
Computing module, for calculating feedback function according to the multiple characteristic informations in described voice signal; And
Set up module, for setting up the decision model of speech recognition according to described feedback function.
7. voice recognition processing device as claimed in claim 6, is characterized in that, described multiple characteristic information comprises to be refused to know mark, semantic analysis result, semantic parsing degree of confidence and language model degree of confidence.
8. voice recognition processing device as claimed in claims 6 or 7, it is characterized in that, described computing module is feedback function according to following formulae discovery:
R=-(w
in
i+ w
en
e+ w
fn
f+ w
rejn
rej+ w
s1n
sem+ w
s2f
sem+ w
lmf
lm), wherein, R represents feedback function, n
irepresent dialog turns, n
erepresent error number, n
frepresent known slot quantity, n
rejrepresent and refuse to know mark, n
semrepresent semantic analysis result, f
semrepresent semantic and resolve degree of confidence, f
lmrepresentation language model confidence, w represents parameter.
9. voice recognition processing device as claimed in claim 8, describedly set up module sets up described speech recognition decision model according to following formula:
Q(s,a)=R(s,a)+r∑
s′P(s′|s,a)max
d′Q(s′,a′),
Wherein, Q represents that feedback is expected, s and s ' represents system state node, a and a ' represents decision-making action, and P represents the redirect probability between state in decision-making action.
10. the voice recognition processing device as described in any one of claim 6-9, is characterized in that, also comprise:
Acquisition module, for obtaining the interactive voice information of user's input;
Processing module, for processing the interactive voice information that described user inputs according to described decision model, and selects corresponding interactive strategy and described user to carry out interactive voice.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201511016852.0A CN105529030B (en) | 2015-12-29 | 2015-12-29 | Voice recognition processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201511016852.0A CN105529030B (en) | 2015-12-29 | 2015-12-29 | Voice recognition processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105529030A true CN105529030A (en) | 2016-04-27 |
CN105529030B CN105529030B (en) | 2020-03-03 |
Family
ID=55771207
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201511016852.0A Active CN105529030B (en) | 2015-12-29 | 2015-12-29 | Voice recognition processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105529030B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106970993A (en) * | 2017-03-31 | 2017-07-21 | 百度在线网络技术(北京)有限公司 | Mining model update method and device |
CN107316643A (en) * | 2017-07-04 | 2017-11-03 | 科大讯飞股份有限公司 | Voice interactive method and device |
CN107342081A (en) * | 2016-04-28 | 2017-11-10 | 通用汽车环球科技运作有限责任公司 | Use relative and absolute time slot data speech recognition system and method |
CN107665708A (en) * | 2016-07-29 | 2018-02-06 | 科大讯飞股份有限公司 | Intelligent sound exchange method and system |
CN109785838A (en) * | 2019-01-28 | 2019-05-21 | 百度在线网络技术(北京)有限公司 | Audio recognition method, device, equipment and storage medium |
CN111292746A (en) * | 2020-02-07 | 2020-06-16 | 普强时代(珠海横琴)信息技术有限公司 | Voice input conversion system based on human-computer interaction |
CN111899728A (en) * | 2020-07-23 | 2020-11-06 | 海信电子科技(武汉)有限公司 | Training method and device for intelligent voice assistant decision strategy |
CN112002321A (en) * | 2020-08-11 | 2020-11-27 | 海信电子科技(武汉)有限公司 | Display device, server and voice interaction method |
WO2020238341A1 (en) * | 2019-05-31 | 2020-12-03 | 华为技术有限公司 | Speech recognition method, apparatus and device, and computer-readable storage medium |
WO2023124960A1 (en) * | 2021-12-27 | 2023-07-06 | 广州小鹏汽车科技有限公司 | Speech interaction method, vehicle, server, and computer readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060069560A1 (en) * | 2004-08-31 | 2006-03-30 | Christopher Passaretti | Method and apparatus for controlling recognition results for speech recognition applications |
CN1763843A (en) * | 2005-11-18 | 2006-04-26 | 清华大学 | Pronunciation quality evaluating method for language learning machine |
CN102376182A (en) * | 2010-08-26 | 2012-03-14 | 财团法人工业技术研究院 | Language learning system, language learning method and program product thereof |
CN103035243A (en) * | 2012-12-18 | 2013-04-10 | 中国科学院自动化研究所 | Real-time feedback method and system of long voice continuous recognition and recognition result |
CN104795065A (en) * | 2015-04-30 | 2015-07-22 | 北京车音网科技有限公司 | Method for increasing speech recognition rate and electronic device |
CN105070288A (en) * | 2015-07-02 | 2015-11-18 | 百度在线网络技术(北京)有限公司 | Vehicle-mounted voice instruction recognition method and device |
-
2015
- 2015-12-29 CN CN201511016852.0A patent/CN105529030B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060069560A1 (en) * | 2004-08-31 | 2006-03-30 | Christopher Passaretti | Method and apparatus for controlling recognition results for speech recognition applications |
CN1763843A (en) * | 2005-11-18 | 2006-04-26 | 清华大学 | Pronunciation quality evaluating method for language learning machine |
CN102376182A (en) * | 2010-08-26 | 2012-03-14 | 财团法人工业技术研究院 | Language learning system, language learning method and program product thereof |
CN103035243A (en) * | 2012-12-18 | 2013-04-10 | 中国科学院自动化研究所 | Real-time feedback method and system of long voice continuous recognition and recognition result |
CN104795065A (en) * | 2015-04-30 | 2015-07-22 | 北京车音网科技有限公司 | Method for increasing speech recognition rate and electronic device |
CN105070288A (en) * | 2015-07-02 | 2015-11-18 | 百度在线网络技术(北京)有限公司 | Vehicle-mounted voice instruction recognition method and device |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107342081A (en) * | 2016-04-28 | 2017-11-10 | 通用汽车环球科技运作有限责任公司 | Use relative and absolute time slot data speech recognition system and method |
CN107665708B (en) * | 2016-07-29 | 2021-06-08 | 科大讯飞股份有限公司 | Intelligent voice interaction method and system |
CN107665708A (en) * | 2016-07-29 | 2018-02-06 | 科大讯飞股份有限公司 | Intelligent sound exchange method and system |
CN106970993A (en) * | 2017-03-31 | 2017-07-21 | 百度在线网络技术(北京)有限公司 | Mining model update method and device |
CN106970993B (en) * | 2017-03-31 | 2020-09-18 | 百度在线网络技术(北京)有限公司 | Mining model updating method and device |
CN107316643A (en) * | 2017-07-04 | 2017-11-03 | 科大讯飞股份有限公司 | Voice interactive method and device |
CN107316643B (en) * | 2017-07-04 | 2021-08-17 | 科大讯飞股份有限公司 | Voice interaction method and device |
CN109785838B (en) * | 2019-01-28 | 2021-08-31 | 百度在线网络技术(北京)有限公司 | Voice recognition method, device, equipment and storage medium |
CN109785838A (en) * | 2019-01-28 | 2019-05-21 | 百度在线网络技术(北京)有限公司 | Audio recognition method, device, equipment and storage medium |
WO2020238341A1 (en) * | 2019-05-31 | 2020-12-03 | 华为技术有限公司 | Speech recognition method, apparatus and device, and computer-readable storage medium |
US12087289B2 (en) | 2019-05-31 | 2024-09-10 | Huawei Technologies Co., Ltd. | Speech recognition method, apparatus, and device, and computer-readable storage medium |
CN111292746A (en) * | 2020-02-07 | 2020-06-16 | 普强时代(珠海横琴)信息技术有限公司 | Voice input conversion system based on human-computer interaction |
CN111899728A (en) * | 2020-07-23 | 2020-11-06 | 海信电子科技(武汉)有限公司 | Training method and device for intelligent voice assistant decision strategy |
CN111899728B (en) * | 2020-07-23 | 2024-05-28 | 海信电子科技(武汉)有限公司 | Training method and device for intelligent voice assistant decision strategy |
CN112002321A (en) * | 2020-08-11 | 2020-11-27 | 海信电子科技(武汉)有限公司 | Display device, server and voice interaction method |
CN112002321B (en) * | 2020-08-11 | 2023-09-19 | 海信电子科技(武汉)有限公司 | Display device, server and voice interaction method |
WO2023124960A1 (en) * | 2021-12-27 | 2023-07-06 | 广州小鹏汽车科技有限公司 | Speech interaction method, vehicle, server, and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN105529030B (en) | 2020-03-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105529030A (en) | Speech recognition processing method and device | |
JP6435312B2 (en) | Speech recognition using parallel recognition tasks. | |
US9899021B1 (en) | Stochastic modeling of user interactions with a detection system | |
CN109961792B (en) | Method and apparatus for recognizing speech | |
KR101828273B1 (en) | Apparatus and method for voice command recognition based on combination of dialog models | |
EP3477635B1 (en) | System and method for natural language processing | |
CN108364650B (en) | Device and method for adjusting voice recognition result | |
CN110047481B (en) | Method and apparatus for speech recognition | |
CN105575386A (en) | Method and device for voice recognition | |
CN107644638A (en) | Audio recognition method, device, terminal and computer-readable recording medium | |
KR101863097B1 (en) | Apparatus and method for keyword recognition | |
US9330665B2 (en) | Automatic updating of confidence scoring functionality for speech recognition systems with respect to a receiver operating characteristic curve | |
KR20120012919A (en) | Apparatus for voice command recognition and method thereof | |
Kim et al. | Sequential labeling for tracking dynamic dialog states | |
CN118020100A (en) | Voice data processing method and device | |
CN111640423B (en) | Word boundary estimation method and device and electronic equipment | |
CN111866289B (en) | Outbound number state detection method and device and intelligent outbound method and system | |
CN113724698B (en) | Training method, device, equipment and storage medium of voice recognition model | |
CN111063338A (en) | Audio signal identification method, device, equipment, system and storage medium | |
CN115294974A (en) | Voice recognition method, device, equipment and storage medium | |
KR20190064384A (en) | Device and method for recognizing wake-up word using server recognition result | |
CN111883109B (en) | Voice information processing and verification model training method, device, equipment and medium | |
CN114399992A (en) | Voice instruction response method, device and storage medium | |
CN111048098B (en) | Voice correction system and voice correction method | |
CN111627452A (en) | Voice decoding method and device and terminal equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |