CN107342085A - Method of speech processing and device - Google Patents
Method of speech processing and device Download PDFInfo
- Publication number
- CN107342085A CN107342085A CN201710606704.7A CN201710606704A CN107342085A CN 107342085 A CN107342085 A CN 107342085A CN 201710606704 A CN201710606704 A CN 201710606704A CN 107342085 A CN107342085 A CN 107342085A
- Authority
- CN
- China
- Prior art keywords
- identification result
- voice
- voice identification
- information type
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000009471 action Effects 0.000 claims description 17
- 230000006399 behavior Effects 0.000 description 27
- 238000010586 diagram Methods 0.000 description 14
- 238000004590 computer program Methods 0.000 description 7
- 238000004378 air conditioning Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000013507 mapping Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L15/222—Barge in, i.e. overridable guidance for interrupting prompts
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
Abstract
The present invention be on a kind of method of speech processing and device, wherein, method of speech processing includes:The first speech data information is received, and carries out speech recognition and obtains the first voice identification result;When terminal device is during the first voice identification result is performed, if receiving second speech data information, progress speech recognition obtains the second voice identification result;The information type that the first voice identification result and the second voice messaging recognition result are included is determined respectively;The second information type and default behavior that the first information type and the second voice messaging recognition result included according to the first voice identification result is included interrupt rule, determine the executive mode of the first voice identification result and the second voice identification result.By the technical scheme, in user when with terminal session user can be avoided to wait the long time, i.e. user withouts waiting for loquituring again after terminal has been reported, reduces the stand-by period of user, lift the usage experience of user.
Description
Technical field
The present invention relates to technical field of voice recognition, more particularly to a kind of method of speech processing and device.
Background technology
Fig. 1 shows the conversational mode routinely talked with, as shown in figure 1, being the shortcomings that conventional conversational mode:1) oppress
Sense:After device prompts user can speak, sentence that user must decide as early as possible is said in time before VAD time-out
Out, while centre can not have pause.Because if once having pause being possible to that end of speaking will be judged as by equipment, behind
It cannot be heard and parsed by equipment if besides.Constriction is the point that user experiences worst in interactive voice,
It is one point of learning cost highest when user uses phonetic function.2) it is forced to wait:In dialog procedure, even if user decides
What to say, it is also necessary to which waiting facilities to finish can just speak.For example user already sees first navigation in Fig. 1 examples
Required for result is oneself, but has to waiting facilities and finish " you....Which " it is so tediously long if after could say
Words.3) poor robustness:Because the time window that user speaks is generally based on local VAD to determine, but because VAD is whole
A least intelligent part in Intelligent voice dialog, often occur when user will speak because the noise on side is (such as side
People's chat, television noise etc.) and make it that VAD is judged by accident, cause recording window to be closed in advance, or the feelings closed that delay
Condition.
The content of the invention
The embodiment of the present invention provides a kind of method of speech processing and device, to realize that user and equipment room can carry out streaming
Dialogue, so as to reduce the waiting time of user, improves the robustness of speech recognition system, lifts the usage experience of user.
First aspect according to embodiments of the present invention, there is provided a kind of method of speech processing, including:
The first speech data information is received, and carries out speech recognition and obtains the first voice identification result;
When the terminal device is during first voice identification result is performed, if receiving second speech data
Information, carry out speech recognition and obtain the second voice identification result;
The info class that first voice identification result and the second voice messaging recognition result are included is determined respectively
Type;
The first information type included according to first voice identification result and second voice messaging identification knot
The second information type and default behavior that fruit is included interrupt rule, determine first voice identification result and described
The executive mode of two voice identification results.
In this embodiment, if when performing the first voice identification result, the second voice messaging is received again, then is being known
After not obtaining the second voice identification result, rule is interrupted according to the information type of two voice identification results and default behavior,
The executive mode of two voice identification results is determined, that is, determines whether to interrupt the first voice identification result, starts to perform the second language
Sound recognition result.So, user can be avoided to wait the long time, i.e. user need not in user when with terminal session
Wait terminal to be loquitured again after having reported, reduce the stand-by period of user, lift the usage experience of user.
In one embodiment, described information type includes:Voice broadcast, action executing and media play.
In this embodiment, the information type of voice identification result mainly has three kinds, and one kind is voice broadcast, i.e. terminal is set
The voice broadcast of standby middle virtual personage, such as reports weather, is chatted etc. with user;Another is action executing, such as
Turn on light, navigate, the action such as regulation temperature, these actions hardly consume the time of user;Also one kind is media play, is such as broadcast
Put the music on, radio etc..
In one embodiment, the default behavior, which interrupts rule, includes:
When the first information type and second information type include comprising voice broadcast or media play
When, stop performing first voice identification result, start to perform second voice identification result;
In this embodiment, if including voice broadcast in two voice identification results, or media play is included,
So both must produce conflict, at this point it is possible to interrupt the implementation procedure of the first voice identification result, start to perform the second voice
Recognition result.For example previous voice identification result is " navigating to Lujiazui ", and then latter voice recognition result is " to lead
Navigate to Lujiazui East Road ", then first element, which is reported " navigating to Lujiazui for you " while started, navigates to the dynamic of Lujiazui
Make, but the report for and then interrupting first makes into report " navigating to Lujiazui East Road ", while send navigation to navigation application
To the instruction on Lujiazui East Road.
When the first information type includes media play, and second information type includes voice broadcast, institute is reduced
The volume of media play is stated, starts to perform second voice identification result, and performed in second voice identification result
Bi Hou, recover the volume of the media play;
In this embodiment, if the first voice identification result includes media play, the second voice identification result includes language
Sound is reported, then can now reduce the volume of media play, proceed by voice broadcast corresponding to the second voice identification result,
After voice broadcast terminates, then recover the volume of media play.For example, previous voice identification result is " to play song kind too
Sun ", and then latter voice recognition result is " navigating to Lujiazui East Road ", then first element plays " the kind sun ",
But the volume of " the kind sun " is and then reduced, is reported " navigating to Lujiazui East Road " with normal quantity, while is sent out to navigation application
Go out to navigate to the instruction on Lujiazui East Road, after " navigating to Lujiazui East Road " has been reported, recover normal quantity and continue to play song
Bent " the kind sun ".
When the first information type includes voice broadcast, and second information type includes media play, performing
After complete first voice identification result, start to perform second voice identification result;
In this embodiment, if the first voice identification result includes voice broadcast, the second voice identification result includes matchmaker
Body plays, then can commence play out media information after voice broadcast terminates, for example, previous voice identification result is " navigation
To Lujiazui East Road ", and then latter voice recognition result is " playing the song kind sun ", then first element is reported
" navigating to Lujiazui East Road ", while the instruction for navigating to Lujiazui East Road is sent to navigation application, but and then play song
" the kind sun ".
When the first information type and second information type do not include voice broadcast and media play, order
Perform first voice identification result and second voice identification result.
In this embodiment, if the first voice identification result and the second voice identification result do not include voice broadcast and
Media play, then it can sequentially perform two voice identification results.For example, previous voice identification result is " opening car door ",
And then latter voice recognition result is " opening air-conditioning ", then sends OPEN to car door, then sends and opens to air-conditioning
Instruction.
In one embodiment, it is described to determine first voice identification result and second voice messaging identification respectively
As a result the information type included, including:
First voice identification result and second voice identification result are converted according to predetermined registration operation guide respectively
For executable the first operation instruction information and the second operation instruction information;
Determine the information type that first operation instruction information and the second operation instruction information are included.
In this embodiment, terminal has a mapping table (or operating guidance), is understood according to this mapping table semantic
The result of parsing, judge what action (such as heightening how much air-conditioner temperature spends) done, then do the (regulation of what voice broadcast
Any audio is dialled after complete air-conditioning).Terminal has a set of logic, and being converted into the semantic analysis result received according to the logic locally can
The execution action of operation or voice broadcast or media play.
In one embodiment, methods described also includes:
The behavior for receiving input interrupts rule setting order;
Rule setting order is interrupted according to the behavior, sets the default behavior to interrupt rule.
In this embodiment, user or manufacturer can be arranged as required to default behavior and interrupt rule, so as to basis
The setting of oneself goes to handle the conflict between two voice identification results.
Second aspect according to embodiments of the present invention, there is provided a kind of voice processing apparatus, including:
First identification module, for receiving the first speech data information, and carry out speech recognition and obtain the first speech recognition
As a result;
Second identification module, for when the terminal device is during first voice identification result is performed, if
Second speech data information is received, speech recognition is carried out and obtains the second voice identification result;
First determining module, for determining first voice identification result and second voice messaging identification knot respectively
The information type that fruit is included;
Second determining module, for the first information type included according to first voice identification result and described
The second information type and default behavior that two voice messaging recognition results are included interrupt rule, determine first voice
The executive mode of recognition result and second voice identification result.
In one embodiment, described information type includes:Voice broadcast, action executing and media play.
In one embodiment, the default behavior, which interrupts rule, includes:
When the first information type and second information type include comprising voice broadcast or media play
When, stop performing first voice identification result, start to perform second voice identification result;
When the first information type includes media play, and second information type includes voice broadcast, institute is reduced
The volume of media play is stated, starts to perform second voice identification result, and performed in second voice identification result
Bi Hou, recover the volume of the media play;
When the first information type includes voice broadcast, and second information type includes media play, performing
After complete first voice identification result, start to perform second voice identification result;
When the first information type and second information type do not include voice broadcast and media play, order
Perform first voice identification result and second voice identification result.
In one embodiment, first determining module includes:
Submodule is converted, for respectively according to predetermined registration operation guide by first voice identification result and second language
Sound recognition result is converted into executable the first operation instruction information and the second operation instruction information;
Determination sub-module, the information included for determining first operation instruction information and the second operation instruction information
Type.
In one embodiment, described device also includes:
Receiving module, the behavior for receiving input interrupt rule setting order;
Setup module, for interrupting rule setting order according to the behavior, the default behavior is set to interrupt rule.
It should be appreciated that the general description and following detailed description of the above are only exemplary and explanatory, not
Can the limitation present invention.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by the explanations write
Specifically noted structure is realized and obtained in book, claims and accompanying drawing.
Below by drawings and examples, technical scheme is described in further detail.
Brief description of the drawings
Accompanying drawing herein is merged in specification and forms the part of this specification, shows the implementation for meeting the present invention
Example, and for explaining principle of the invention together with specification.
Fig. 1 is the schematic diagram of the method for speech processing in correlation technique.
Fig. 2 is a kind of flow chart of method of speech processing according to an exemplary embodiment.
Fig. 3 is the flow chart of step S203 in a kind of method of speech processing according to an exemplary embodiment.
Fig. 4 is the flow chart of another method of speech processing according to an exemplary embodiment.
Fig. 5 is the schematic diagram that behavior according to an exemplary embodiment interrupts table.
Fig. 6 is a kind of block diagram of voice processing apparatus according to an exemplary embodiment.
Fig. 7 is the block diagram of the first determining module in a kind of voice processing apparatus according to an exemplary embodiment.
Fig. 8 is the block diagram of another voice processing apparatus according to an exemplary embodiment.
Embodiment
Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to
During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment
Described in embodiment do not represent and the consistent all embodiments of the present invention.On the contrary, they be only with it is such as appended
The example of the consistent apparatus and method of some aspects being described in detail in claims, of the invention.
Fig. 2 is a kind of flow chart of method of speech processing according to an exemplary embodiment.The method of speech processing
Applied in terminal device, the terminal device can be mobile phone, computer, digital broadcast terminal, messaging devices, swim
Any equipment with voice control function such as play console, tablet device, Medical Devices, body-building equipment, personal digital assistant.
As shown in Fig. 2 the method comprising the steps of S201-S204:
In step s 201, the first speech data information is received, and carries out speech recognition and obtains the first voice identification result;
Wherein, carry out speech recognition to speech data information to be completed by terminal, service can also be sent to by terminal
Device, by returning to terminal after the completion of server.Voice identification result is text message corresponding with speech data information.
In step S202, when the terminal device is during first voice identification result is performed, if receiving
To second speech data information, carry out speech recognition and obtain the second voice identification result;
In step S203, first voice identification result and the second voice messaging recognition result institute are determined respectively
Comprising information type;
In one embodiment, described information type includes:Voice broadcast, action executing and media play.
In this embodiment, the information type of voice identification result mainly has three kinds, and one kind is voice broadcast, i.e. terminal is set
The voice broadcast of standby middle virtual personage, such as reports weather, is chatted etc. with user;Another is action executing, such as
Turn on light, navigate, the action such as regulation temperature, these actions hardly consume the time of user;Also one kind is media play, is such as broadcast
Put the music on, radio etc..
The first information type included in step S204 according to first voice identification result and second language
The second information type and default behavior that message breath recognition result is included interrupt rule, determine first speech recognition
As a result with the executive mode of second voice identification result.
In this embodiment, if when performing the first voice identification result, the second voice messaging is received again, then is being known
After not obtaining the second voice identification result, rule is interrupted according to the information type of two voice identification results and default behavior,
The executive mode of two voice identification results is determined, that is, determines whether to interrupt the first voice identification result, starts to perform the second language
Sound recognition result.So, user can be avoided to wait the long time, i.e. user need not in user when with terminal session
Wait terminal to be loquitured again after having reported, reduce the stand-by period of user, lift the usage experience of user.
In one embodiment, the default behavior, which interrupts rule, includes:
When the first information type and second information type include comprising voice broadcast or media play
When, stop performing first voice identification result, start to perform second voice identification result;
In this embodiment, if including voice broadcast in two voice identification results, or media play is included,
So both must produce conflict, at this point it is possible to interrupt the implementation procedure of the first voice identification result, start to perform the second voice
Recognition result.For example previous voice identification result is " navigating to Lujiazui ", and then latter voice recognition result is " to lead
Navigate to Lujiazui East Road ", then first element, which is reported " navigating to Lujiazui for you " while started, navigates to the dynamic of Lujiazui
Make, but the report for and then interrupting first makes into report " navigating to Lujiazui East Road ", while send navigation to navigation application
To the instruction on Lujiazui East Road.
When the first information type includes media play, and second information type includes voice broadcast, institute is reduced
The volume of media play is stated, starts to perform second voice identification result, and performed in second voice identification result
Bi Hou, recover the volume of the media play;
In this embodiment, if the first voice identification result includes media play, the second voice identification result includes language
Sound is reported, then can now reduce the volume of media play, proceed by voice broadcast corresponding to the second voice identification result,
After voice broadcast terminates, then recover the volume of media play.For example, previous voice identification result is " to play song kind too
Sun ", and then latter voice recognition result is " navigating to Lujiazui East Road ", then first element plays " the kind sun ",
But the volume of " the kind sun " is and then reduced, is reported " navigating to Lujiazui East Road " with normal quantity, while is sent out to navigation application
Go out to navigate to the instruction on Lujiazui East Road, after " navigating to Lujiazui East Road " has been reported, recover normal quantity and continue to play song
Bent " the kind sun ".
When the first information type includes voice broadcast, and second information type includes media play, performing
After complete first voice identification result, start to perform second voice identification result;
In this embodiment, if the first voice identification result includes voice broadcast, the second voice identification result includes matchmaker
Body plays, then can commence play out media information after voice broadcast terminates, for example, previous voice identification result is " navigation
To Lujiazui East Road ", and then latter voice recognition result is " playing the song kind sun ", then first element is reported
" navigating to Lujiazui East Road ", while the instruction for navigating to Lujiazui East Road is sent to navigation application, but and then play song
" the kind sun ".
When the first information type and second information type do not include voice broadcast and media play, order
Perform first voice identification result and second voice identification result.
In this embodiment, if the first voice identification result and the second voice identification result do not include voice broadcast and
Media play, then it can sequentially perform two voice identification results.For example, previous voice identification result is " opening car door ",
And then latter voice recognition result is " opening air-conditioning ", then sends OPEN to car door, then sends and opens to air-conditioning
Instruction.
Fig. 3 is the flow chart of step S203 in a kind of method of speech processing according to an exemplary embodiment.
As shown in figure 3, in one embodiment, the step S203 in Fig. 2 includes step S301-S302:
In step S301, respectively according to predetermined registration operation guide by first voice identification result and second voice
Recognition result is converted into executable the first operation instruction information and the second operation instruction information;
In step s 302, the info class that first operation instruction information and the second operation instruction information are included is determined
Type.
In this embodiment, terminal has a mapping table (or operating guidance), is understood according to this mapping table semantic
The result of parsing, judge what action (such as heightening how much air-conditioner temperature spends) done, then do the (regulation of what voice broadcast
Any audio is dialled after complete air-conditioning).Terminal has a set of logic, and being converted into the semantic analysis result received according to the logic locally can
The execution action of operation or voice broadcast or media play.
Fig. 4 is the flow chart of another method of speech processing according to an exemplary embodiment.
As shown in figure 4, in one embodiment, the above method also includes step S401-S402:
The behavior for receiving input interrupts rule setting order;
Rule setting order is interrupted according to the behavior, sets the default behavior to interrupt rule.
In this embodiment, user or manufacturer can be arranged as required to default behavior and interrupt rule, so as to basis
The setting of oneself goes to handle the conflict between two voice identification results.
Wherein, as shown in figure 5, a behavior can be set to interrupt table interrupts rule as a default behavior, wherein,
N, A, M represent voice broadcast, action executing and media play respectively, and P (n-1) represents the first voice identification result, and P (n) is represented
Second voice identification result, specific set see Fig. 5.
Following is apparatus of the present invention embodiment, can be used for performing the inventive method embodiment.
Fig. 6 is a kind of block diagram of voice processing apparatus according to an exemplary embodiment, and the device can be by soft
Part, hardware or both are implemented in combination with as some or all of of terminal device.As shown in fig. 6, the voice processing apparatus
Including:
First identification module 61, for receiving the first speech data information, and carry out speech recognition and obtain the knowledge of the first voice
Other result;
Second identification module 62, for when the terminal device is during first voice identification result is performed,
If receiving second speech data information, carry out speech recognition and obtain the second voice identification result;
First determining module 63, for determining first voice identification result and second voice messaging identification respectively
As a result the information type included;
Second determining module 64, for the first information type that is included according to first voice identification result and described
The second information type and default behavior that second voice messaging recognition result is included interrupt rule, determine first language
The executive mode of sound recognition result and second voice identification result.
In one embodiment, described information type includes:Voice broadcast, action executing and media play.
In one embodiment, the default behavior, which interrupts rule, includes:
When the first information type and second information type include comprising voice broadcast or media play
When, stop performing first voice identification result, start to perform second voice identification result;
When the first information type includes media play, and second information type includes voice broadcast, institute is reduced
The volume of media play is stated, starts to perform second voice identification result, and performed in second voice identification result
Bi Hou, recover the volume of the media play;
When the first information type includes voice broadcast, and second information type includes media play, performing
After complete first voice identification result, start to perform second voice identification result;
When the first information type and second information type do not include voice broadcast and media play, order
Perform first voice identification result and second voice identification result.
Fig. 7 is the block diagram of the first determining module in a kind of voice processing apparatus according to an exemplary embodiment.
As shown in fig. 7, in one embodiment, first determining module 63 includes:
Submodule 71 is converted, for respectively according to predetermined registration operation guide by first voice identification result and described second
Voice identification result is converted into executable the first operation instruction information and the second operation instruction information;
Determination sub-module 72, the letter included for determining first operation instruction information and the second operation instruction information
Cease type.
Fig. 8 is the block diagram of another voice processing apparatus according to an exemplary embodiment.
As shown in figure 8, in one embodiment, described device also includes:
Receiving module 81, the behavior for receiving input interrupt rule setting order;
Setup module 82, for interrupting rule setting order according to the behavior, the default behavior is set to interrupt rule
Then.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program
Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more
The shape for the computer program product that usable storage medium is implemented on (including but is not limited to magnetic disk storage and optical memory etc.)
Formula.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram
Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real
The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention
God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to comprising including these changes and modification.
Claims (10)
- A kind of 1. method of speech processing, for terminal device, it is characterised in that including:The first speech data information is received, and carries out speech recognition and obtains the first voice identification result;When the terminal device is during first voice identification result is performed, if receiving second speech data letter Breath, carry out speech recognition and obtain the second voice identification result;The information type that first voice identification result and the second voice messaging recognition result are included is determined respectively;The first information type included according to first voice identification result and the second voice messaging recognition result institute Comprising the second information type and default behavior interrupt rule, determine first voice identification result and second language The executive mode of sound recognition result.
- 2. according to the method for claim 1, it is characterised in that described information type includes:Voice broadcast, action executing and Media play.
- 3. according to the method for claim 2, it is characterised in that the default behavior, which interrupts rule, to be included:When the first information type and second information type include voice broadcast or include media play, stop First voice identification result is only performed, starts to perform second voice identification result;When the first information type includes media play, and second information type includes voice broadcast, the matchmaker is reduced The volume that body plays, starts to perform second voice identification result, and after second voice identification result is finished, Recover the volume of the media play;When the first information type includes voice broadcast, and second information type includes media play, having performed After stating the first voice identification result, start to perform second voice identification result;When the first information type and second information type do not include voice broadcast and media play, order performs First voice identification result and second voice identification result.
- 4. according to the method for claim 2, it is characterised in that described to determine first voice identification result and institute respectively The information type that the second voice messaging recognition result is included is stated, including:Respectively being converted into first voice identification result and second voice identification result according to predetermined registration operation guide can The first operation instruction information and the second operation instruction information performed;Determine the information type that first operation instruction information and the second operation instruction information are included.
- 5. according to the method for claim 1, it is characterised in that methods described also includes:The behavior for receiving input interrupts rule setting order;Rule setting order is interrupted according to the behavior, sets the default behavior to interrupt rule.
- A kind of 6. voice processing apparatus, for terminal device, it is characterised in that including:First identification module, for receiving the first speech data information, and carry out speech recognition and obtain the first voice identification result;Second identification module, for when the terminal device is during first voice identification result is performed, if receiving To second speech data information, carry out speech recognition and obtain the second voice identification result;First determining module, for determining first voice identification result and the second voice messaging recognition result institute respectively Comprising information type;Second determining module, for the first information type included according to first voice identification result and second language The second information type and default behavior that message breath recognition result is included interrupt rule, determine first speech recognition As a result with the executive mode of second voice identification result.
- 7. device according to claim 6, it is characterised in that described information type includes:Voice broadcast, action executing and Media play.
- 8. device according to claim 7, it is characterised in that the default behavior, which interrupts rule, to be included:When the first information type and second information type include voice broadcast or include media play, stop First voice identification result is only performed, starts to perform second voice identification result;When the first information type includes media play, and second information type includes voice broadcast, the matchmaker is reduced The volume that body plays, starts to perform second voice identification result, and after second voice identification result is finished, Recover the volume of the media play;When the first information type includes voice broadcast, and second information type includes media play, having performed After stating the first voice identification result, start to perform second voice identification result;When the first information type and second information type do not include voice broadcast and media play, order performs First voice identification result and second voice identification result.
- 9. device according to claim 7, it is characterised in that first determining module includes:Submodule is converted, for respectively first voice identification result and second voice being known according to predetermined registration operation guide Other result is converted into executable the first operation instruction information and the second operation instruction information;Determination sub-module, the info class included for determining first operation instruction information and the second operation instruction information Type.
- 10. device according to claim 6, it is characterised in that described device also includes:Receiving module, the behavior for receiving input interrupt rule setting order;Setup module, for interrupting rule setting order according to the behavior, the default behavior is set to interrupt rule.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710606704.7A CN107342085A (en) | 2017-07-24 | 2017-07-24 | Method of speech processing and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710606704.7A CN107342085A (en) | 2017-07-24 | 2017-07-24 | Method of speech processing and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107342085A true CN107342085A (en) | 2017-11-10 |
Family
ID=60216442
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710606704.7A Pending CN107342085A (en) | 2017-07-24 | 2017-07-24 | Method of speech processing and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107342085A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108270928A (en) * | 2018-04-20 | 2018-07-10 | 维沃移动通信有限公司 | The method and mobile terminal of a kind of speech recognition |
CN109887483A (en) * | 2019-01-04 | 2019-06-14 | 平安科技(深圳)有限公司 | Self-Service processing method, device, computer equipment and storage medium |
CN109903758A (en) * | 2017-12-08 | 2019-06-18 | 阿里巴巴集团控股有限公司 | Audio-frequency processing method, device and terminal device |
CN110125946A (en) * | 2019-04-23 | 2019-08-16 | 北京淇瑀信息科技有限公司 | Automatic call method, device, electronic equipment and computer-readable medium |
CN110534108A (en) * | 2019-09-25 | 2019-12-03 | 北京猎户星空科技有限公司 | A kind of voice interactive method and device |
CN110867197A (en) * | 2019-10-23 | 2020-03-06 | 吴杰 | Method and equipment for interrupting voice robot in real time in voice interaction process |
CN111415642A (en) * | 2020-03-31 | 2020-07-14 | 广东美的制冷设备有限公司 | Voice broadcast method and device of electric equipment, air conditioner and storage medium |
CN111540349A (en) * | 2020-03-27 | 2020-08-14 | 北京捷通华声科技股份有限公司 | Voice interruption method and device |
CN112637431A (en) * | 2020-12-10 | 2021-04-09 | 出门问问(苏州)信息科技有限公司 | Voice interaction method and device and computer readable storage medium |
CN112965687A (en) * | 2021-03-19 | 2021-06-15 | 成都启英泰伦科技有限公司 | Multi-user voice recognition product development platform and development method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070150289A1 (en) * | 2005-12-21 | 2007-06-28 | Kyocera Mita Corporation | Electronic apparatus and computer readable medium recorded voice operating program |
CN105138110A (en) * | 2014-05-29 | 2015-12-09 | 中兴通讯股份有限公司 | Voice interaction method and voice interaction device |
CN105206260A (en) * | 2015-08-31 | 2015-12-30 | 努比亚技术有限公司 | Terminal voice broadcasting method, device and terminal voice operation method |
CN105845136A (en) * | 2015-01-13 | 2016-08-10 | 中兴通讯股份有限公司 | Voice control method and device, and terminal |
CN106653021A (en) * | 2016-12-27 | 2017-05-10 | 上海智臻智能网络科技股份有限公司 | Voice wake-up control method and device and terminal |
CN106814639A (en) * | 2015-11-27 | 2017-06-09 | 富泰华工业(深圳)有限公司 | Speech control system and method |
-
2017
- 2017-07-24 CN CN201710606704.7A patent/CN107342085A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070150289A1 (en) * | 2005-12-21 | 2007-06-28 | Kyocera Mita Corporation | Electronic apparatus and computer readable medium recorded voice operating program |
CN105138110A (en) * | 2014-05-29 | 2015-12-09 | 中兴通讯股份有限公司 | Voice interaction method and voice interaction device |
CN105845136A (en) * | 2015-01-13 | 2016-08-10 | 中兴通讯股份有限公司 | Voice control method and device, and terminal |
CN105206260A (en) * | 2015-08-31 | 2015-12-30 | 努比亚技术有限公司 | Terminal voice broadcasting method, device and terminal voice operation method |
CN106814639A (en) * | 2015-11-27 | 2017-06-09 | 富泰华工业(深圳)有限公司 | Speech control system and method |
CN106653021A (en) * | 2016-12-27 | 2017-05-10 | 上海智臻智能网络科技股份有限公司 | Voice wake-up control method and device and terminal |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109903758A (en) * | 2017-12-08 | 2019-06-18 | 阿里巴巴集团控股有限公司 | Audio-frequency processing method, device and terminal device |
CN108270928A (en) * | 2018-04-20 | 2018-07-10 | 维沃移动通信有限公司 | The method and mobile terminal of a kind of speech recognition |
CN108270928B (en) * | 2018-04-20 | 2020-11-20 | 维沃移动通信有限公司 | Voice recognition method and mobile terminal |
CN109887483A (en) * | 2019-01-04 | 2019-06-14 | 平安科技(深圳)有限公司 | Self-Service processing method, device, computer equipment and storage medium |
CN110125946B (en) * | 2019-04-23 | 2021-08-27 | 北京淇瑀信息科技有限公司 | Automatic call method, automatic call device, electronic equipment and computer readable medium |
CN110125946A (en) * | 2019-04-23 | 2019-08-16 | 北京淇瑀信息科技有限公司 | Automatic call method, device, electronic equipment and computer-readable medium |
CN110534108A (en) * | 2019-09-25 | 2019-12-03 | 北京猎户星空科技有限公司 | A kind of voice interactive method and device |
CN110867197A (en) * | 2019-10-23 | 2020-03-06 | 吴杰 | Method and equipment for interrupting voice robot in real time in voice interaction process |
CN111540349A (en) * | 2020-03-27 | 2020-08-14 | 北京捷通华声科技股份有限公司 | Voice interruption method and device |
CN111540349B (en) * | 2020-03-27 | 2023-10-10 | 北京捷通华声科技股份有限公司 | Voice breaking method and device |
CN111415642A (en) * | 2020-03-31 | 2020-07-14 | 广东美的制冷设备有限公司 | Voice broadcast method and device of electric equipment, air conditioner and storage medium |
CN112637431A (en) * | 2020-12-10 | 2021-04-09 | 出门问问(苏州)信息科技有限公司 | Voice interaction method and device and computer readable storage medium |
CN112965687A (en) * | 2021-03-19 | 2021-06-15 | 成都启英泰伦科技有限公司 | Multi-user voice recognition product development platform and development method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107342085A (en) | Method of speech processing and device | |
JP5381988B2 (en) | Dialogue speech recognition system, dialogue speech recognition method, and dialogue speech recognition program | |
US11183187B2 (en) | Dialog method, dialog system, dialog apparatus and program that gives impression that dialog system understands content of dialog | |
US9583102B2 (en) | Method of controlling interactive system, method of controlling server, server, and interactive device | |
US20140036022A1 (en) | Providing a conversational video experience | |
CN107403011B (en) | Virtual reality environment language learning implementation method and automatic recording control method | |
WO2017200072A1 (en) | Dialog method, dialog system, dialog device, and program | |
JP2011209787A (en) | Information processor, information processing method, and program | |
JP2011209786A (en) | Information processor, information processing method, and program | |
JP6970413B2 (en) | Dialogue methods, dialogue systems, dialogue devices, and programs | |
WO2017200080A1 (en) | Intercommunication method, intercommunication device, and program | |
CN110600013B (en) | Training method and device for non-parallel corpus voice conversion data enhancement model | |
WO2017175351A1 (en) | Information processing device | |
WO2017200076A1 (en) | Dialog method, dialog system, dialog device, and program | |
WO2017200079A1 (en) | Dialog method, dialog system, dialog device, and program | |
CN110008481A (en) | Translated speech generation method, device, computer equipment and storage medium | |
JP2016126294A (en) | Voice interaction control device, control method of voice interaction control device, and voice interactive device | |
KR102197387B1 (en) | Natural Speech Recognition Method and Apparatus | |
WO2021077528A1 (en) | Method for interrupting human-machine conversation | |
KR20210123545A (en) | Method and apparatus for conversation service based on user feedback | |
JP6601625B2 (en) | Dialogue method, dialogue system, dialogue apparatus, and program | |
WO2017200077A1 (en) | Dialog method, dialog system, dialog device, and program | |
CN111354351B (en) | Control device, voice interaction device, voice recognition server, and storage medium | |
WO2013181633A1 (en) | Providing a converstional video experience | |
JP6610965B2 (en) | Dialogue method, dialogue system, dialogue apparatus, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171110 |
|
RJ01 | Rejection of invention patent application after publication |