CN107342085A - Method of speech processing and device - Google Patents

Method of speech processing and device Download PDF

Info

Publication number
CN107342085A
CN107342085A CN201710606704.7A CN201710606704A CN107342085A CN 107342085 A CN107342085 A CN 107342085A CN 201710606704 A CN201710606704 A CN 201710606704A CN 107342085 A CN107342085 A CN 107342085A
Authority
CN
China
Prior art keywords
identification result
voice
voice identification
information type
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710606704.7A
Other languages
Chinese (zh)
Inventor
李霄寒
全刚
谢政彪
李鹏
刘升平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Cloud Known Sound Information Technology Co Ltd
Shenzhen Yunzhisheng Information Technology Co Ltd
Original Assignee
Shenzhen Cloud Known Sound Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Cloud Known Sound Information Technology Co Ltd filed Critical Shenzhen Cloud Known Sound Information Technology Co Ltd
Priority to CN201710606704.7A priority Critical patent/CN107342085A/en
Publication of CN107342085A publication Critical patent/CN107342085A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L15/222Barge in, i.e. overridable guidance for interrupting prompts

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention be on a kind of method of speech processing and device, wherein, method of speech processing includes:The first speech data information is received, and carries out speech recognition and obtains the first voice identification result;When terminal device is during the first voice identification result is performed, if receiving second speech data information, progress speech recognition obtains the second voice identification result;The information type that the first voice identification result and the second voice messaging recognition result are included is determined respectively;The second information type and default behavior that the first information type and the second voice messaging recognition result included according to the first voice identification result is included interrupt rule, determine the executive mode of the first voice identification result and the second voice identification result.By the technical scheme, in user when with terminal session user can be avoided to wait the long time, i.e. user withouts waiting for loquituring again after terminal has been reported, reduces the stand-by period of user, lift the usage experience of user.

Description

Method of speech processing and device
Technical field
The present invention relates to technical field of voice recognition, more particularly to a kind of method of speech processing and device.
Background technology
Fig. 1 shows the conversational mode routinely talked with, as shown in figure 1, being the shortcomings that conventional conversational mode:1) oppress Sense:After device prompts user can speak, sentence that user must decide as early as possible is said in time before VAD time-out Out, while centre can not have pause.Because if once having pause being possible to that end of speaking will be judged as by equipment, behind It cannot be heard and parsed by equipment if besides.Constriction is the point that user experiences worst in interactive voice, It is one point of learning cost highest when user uses phonetic function.2) it is forced to wait:In dialog procedure, even if user decides What to say, it is also necessary to which waiting facilities to finish can just speak.For example user already sees first navigation in Fig. 1 examples Required for result is oneself, but has to waiting facilities and finish " you....Which " it is so tediously long if after could say Words.3) poor robustness:Because the time window that user speaks is generally based on local VAD to determine, but because VAD is whole A least intelligent part in Intelligent voice dialog, often occur when user will speak because the noise on side is (such as side People's chat, television noise etc.) and make it that VAD is judged by accident, cause recording window to be closed in advance, or the feelings closed that delay Condition.
The content of the invention
The embodiment of the present invention provides a kind of method of speech processing and device, to realize that user and equipment room can carry out streaming Dialogue, so as to reduce the waiting time of user, improves the robustness of speech recognition system, lifts the usage experience of user.
First aspect according to embodiments of the present invention, there is provided a kind of method of speech processing, including:
The first speech data information is received, and carries out speech recognition and obtains the first voice identification result;
When the terminal device is during first voice identification result is performed, if receiving second speech data Information, carry out speech recognition and obtain the second voice identification result;
The info class that first voice identification result and the second voice messaging recognition result are included is determined respectively Type;
The first information type included according to first voice identification result and second voice messaging identification knot The second information type and default behavior that fruit is included interrupt rule, determine first voice identification result and described The executive mode of two voice identification results.
In this embodiment, if when performing the first voice identification result, the second voice messaging is received again, then is being known After not obtaining the second voice identification result, rule is interrupted according to the information type of two voice identification results and default behavior, The executive mode of two voice identification results is determined, that is, determines whether to interrupt the first voice identification result, starts to perform the second language Sound recognition result.So, user can be avoided to wait the long time, i.e. user need not in user when with terminal session Wait terminal to be loquitured again after having reported, reduce the stand-by period of user, lift the usage experience of user.
In one embodiment, described information type includes:Voice broadcast, action executing and media play.
In this embodiment, the information type of voice identification result mainly has three kinds, and one kind is voice broadcast, i.e. terminal is set The voice broadcast of standby middle virtual personage, such as reports weather, is chatted etc. with user;Another is action executing, such as Turn on light, navigate, the action such as regulation temperature, these actions hardly consume the time of user;Also one kind is media play, is such as broadcast Put the music on, radio etc..
In one embodiment, the default behavior, which interrupts rule, includes:
When the first information type and second information type include comprising voice broadcast or media play When, stop performing first voice identification result, start to perform second voice identification result;
In this embodiment, if including voice broadcast in two voice identification results, or media play is included, So both must produce conflict, at this point it is possible to interrupt the implementation procedure of the first voice identification result, start to perform the second voice Recognition result.For example previous voice identification result is " navigating to Lujiazui ", and then latter voice recognition result is " to lead Navigate to Lujiazui East Road ", then first element, which is reported " navigating to Lujiazui for you " while started, navigates to the dynamic of Lujiazui Make, but the report for and then interrupting first makes into report " navigating to Lujiazui East Road ", while send navigation to navigation application To the instruction on Lujiazui East Road.
When the first information type includes media play, and second information type includes voice broadcast, institute is reduced The volume of media play is stated, starts to perform second voice identification result, and performed in second voice identification result Bi Hou, recover the volume of the media play;
In this embodiment, if the first voice identification result includes media play, the second voice identification result includes language Sound is reported, then can now reduce the volume of media play, proceed by voice broadcast corresponding to the second voice identification result, After voice broadcast terminates, then recover the volume of media play.For example, previous voice identification result is " to play song kind too Sun ", and then latter voice recognition result is " navigating to Lujiazui East Road ", then first element plays " the kind sun ", But the volume of " the kind sun " is and then reduced, is reported " navigating to Lujiazui East Road " with normal quantity, while is sent out to navigation application Go out to navigate to the instruction on Lujiazui East Road, after " navigating to Lujiazui East Road " has been reported, recover normal quantity and continue to play song Bent " the kind sun ".
When the first information type includes voice broadcast, and second information type includes media play, performing After complete first voice identification result, start to perform second voice identification result;
In this embodiment, if the first voice identification result includes voice broadcast, the second voice identification result includes matchmaker Body plays, then can commence play out media information after voice broadcast terminates, for example, previous voice identification result is " navigation To Lujiazui East Road ", and then latter voice recognition result is " playing the song kind sun ", then first element is reported " navigating to Lujiazui East Road ", while the instruction for navigating to Lujiazui East Road is sent to navigation application, but and then play song " the kind sun ".
When the first information type and second information type do not include voice broadcast and media play, order Perform first voice identification result and second voice identification result.
In this embodiment, if the first voice identification result and the second voice identification result do not include voice broadcast and Media play, then it can sequentially perform two voice identification results.For example, previous voice identification result is " opening car door ", And then latter voice recognition result is " opening air-conditioning ", then sends OPEN to car door, then sends and opens to air-conditioning Instruction.
In one embodiment, it is described to determine first voice identification result and second voice messaging identification respectively As a result the information type included, including:
First voice identification result and second voice identification result are converted according to predetermined registration operation guide respectively For executable the first operation instruction information and the second operation instruction information;
Determine the information type that first operation instruction information and the second operation instruction information are included.
In this embodiment, terminal has a mapping table (or operating guidance), is understood according to this mapping table semantic The result of parsing, judge what action (such as heightening how much air-conditioner temperature spends) done, then do the (regulation of what voice broadcast Any audio is dialled after complete air-conditioning).Terminal has a set of logic, and being converted into the semantic analysis result received according to the logic locally can The execution action of operation or voice broadcast or media play.
In one embodiment, methods described also includes:
The behavior for receiving input interrupts rule setting order;
Rule setting order is interrupted according to the behavior, sets the default behavior to interrupt rule.
In this embodiment, user or manufacturer can be arranged as required to default behavior and interrupt rule, so as to basis The setting of oneself goes to handle the conflict between two voice identification results.
Second aspect according to embodiments of the present invention, there is provided a kind of voice processing apparatus, including:
First identification module, for receiving the first speech data information, and carry out speech recognition and obtain the first speech recognition As a result;
Second identification module, for when the terminal device is during first voice identification result is performed, if Second speech data information is received, speech recognition is carried out and obtains the second voice identification result;
First determining module, for determining first voice identification result and second voice messaging identification knot respectively The information type that fruit is included;
Second determining module, for the first information type included according to first voice identification result and described The second information type and default behavior that two voice messaging recognition results are included interrupt rule, determine first voice The executive mode of recognition result and second voice identification result.
In one embodiment, described information type includes:Voice broadcast, action executing and media play.
In one embodiment, the default behavior, which interrupts rule, includes:
When the first information type and second information type include comprising voice broadcast or media play When, stop performing first voice identification result, start to perform second voice identification result;
When the first information type includes media play, and second information type includes voice broadcast, institute is reduced The volume of media play is stated, starts to perform second voice identification result, and performed in second voice identification result Bi Hou, recover the volume of the media play;
When the first information type includes voice broadcast, and second information type includes media play, performing After complete first voice identification result, start to perform second voice identification result;
When the first information type and second information type do not include voice broadcast and media play, order Perform first voice identification result and second voice identification result.
In one embodiment, first determining module includes:
Submodule is converted, for respectively according to predetermined registration operation guide by first voice identification result and second language Sound recognition result is converted into executable the first operation instruction information and the second operation instruction information;
Determination sub-module, the information included for determining first operation instruction information and the second operation instruction information Type.
In one embodiment, described device also includes:
Receiving module, the behavior for receiving input interrupt rule setting order;
Setup module, for interrupting rule setting order according to the behavior, the default behavior is set to interrupt rule.
It should be appreciated that the general description and following detailed description of the above are only exemplary and explanatory, not Can the limitation present invention.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by the explanations write Specifically noted structure is realized and obtained in book, claims and accompanying drawing.
Below by drawings and examples, technical scheme is described in further detail.
Brief description of the drawings
Accompanying drawing herein is merged in specification and forms the part of this specification, shows the implementation for meeting the present invention Example, and for explaining principle of the invention together with specification.
Fig. 1 is the schematic diagram of the method for speech processing in correlation technique.
Fig. 2 is a kind of flow chart of method of speech processing according to an exemplary embodiment.
Fig. 3 is the flow chart of step S203 in a kind of method of speech processing according to an exemplary embodiment.
Fig. 4 is the flow chart of another method of speech processing according to an exemplary embodiment.
Fig. 5 is the schematic diagram that behavior according to an exemplary embodiment interrupts table.
Fig. 6 is a kind of block diagram of voice processing apparatus according to an exemplary embodiment.
Fig. 7 is the block diagram of the first determining module in a kind of voice processing apparatus according to an exemplary embodiment.
Fig. 8 is the block diagram of another voice processing apparatus according to an exemplary embodiment.
Embodiment
Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment Described in embodiment do not represent and the consistent all embodiments of the present invention.On the contrary, they be only with it is such as appended The example of the consistent apparatus and method of some aspects being described in detail in claims, of the invention.
Fig. 2 is a kind of flow chart of method of speech processing according to an exemplary embodiment.The method of speech processing Applied in terminal device, the terminal device can be mobile phone, computer, digital broadcast terminal, messaging devices, swim Any equipment with voice control function such as play console, tablet device, Medical Devices, body-building equipment, personal digital assistant. As shown in Fig. 2 the method comprising the steps of S201-S204:
In step s 201, the first speech data information is received, and carries out speech recognition and obtains the first voice identification result;
Wherein, carry out speech recognition to speech data information to be completed by terminal, service can also be sent to by terminal Device, by returning to terminal after the completion of server.Voice identification result is text message corresponding with speech data information.
In step S202, when the terminal device is during first voice identification result is performed, if receiving To second speech data information, carry out speech recognition and obtain the second voice identification result;
In step S203, first voice identification result and the second voice messaging recognition result institute are determined respectively Comprising information type;
In one embodiment, described information type includes:Voice broadcast, action executing and media play.
In this embodiment, the information type of voice identification result mainly has three kinds, and one kind is voice broadcast, i.e. terminal is set The voice broadcast of standby middle virtual personage, such as reports weather, is chatted etc. with user;Another is action executing, such as Turn on light, navigate, the action such as regulation temperature, these actions hardly consume the time of user;Also one kind is media play, is such as broadcast Put the music on, radio etc..
The first information type included in step S204 according to first voice identification result and second language The second information type and default behavior that message breath recognition result is included interrupt rule, determine first speech recognition As a result with the executive mode of second voice identification result.
In this embodiment, if when performing the first voice identification result, the second voice messaging is received again, then is being known After not obtaining the second voice identification result, rule is interrupted according to the information type of two voice identification results and default behavior, The executive mode of two voice identification results is determined, that is, determines whether to interrupt the first voice identification result, starts to perform the second language Sound recognition result.So, user can be avoided to wait the long time, i.e. user need not in user when with terminal session Wait terminal to be loquitured again after having reported, reduce the stand-by period of user, lift the usage experience of user.
In one embodiment, the default behavior, which interrupts rule, includes:
When the first information type and second information type include comprising voice broadcast or media play When, stop performing first voice identification result, start to perform second voice identification result;
In this embodiment, if including voice broadcast in two voice identification results, or media play is included, So both must produce conflict, at this point it is possible to interrupt the implementation procedure of the first voice identification result, start to perform the second voice Recognition result.For example previous voice identification result is " navigating to Lujiazui ", and then latter voice recognition result is " to lead Navigate to Lujiazui East Road ", then first element, which is reported " navigating to Lujiazui for you " while started, navigates to the dynamic of Lujiazui Make, but the report for and then interrupting first makes into report " navigating to Lujiazui East Road ", while send navigation to navigation application To the instruction on Lujiazui East Road.
When the first information type includes media play, and second information type includes voice broadcast, institute is reduced The volume of media play is stated, starts to perform second voice identification result, and performed in second voice identification result Bi Hou, recover the volume of the media play;
In this embodiment, if the first voice identification result includes media play, the second voice identification result includes language Sound is reported, then can now reduce the volume of media play, proceed by voice broadcast corresponding to the second voice identification result, After voice broadcast terminates, then recover the volume of media play.For example, previous voice identification result is " to play song kind too Sun ", and then latter voice recognition result is " navigating to Lujiazui East Road ", then first element plays " the kind sun ", But the volume of " the kind sun " is and then reduced, is reported " navigating to Lujiazui East Road " with normal quantity, while is sent out to navigation application Go out to navigate to the instruction on Lujiazui East Road, after " navigating to Lujiazui East Road " has been reported, recover normal quantity and continue to play song Bent " the kind sun ".
When the first information type includes voice broadcast, and second information type includes media play, performing After complete first voice identification result, start to perform second voice identification result;
In this embodiment, if the first voice identification result includes voice broadcast, the second voice identification result includes matchmaker Body plays, then can commence play out media information after voice broadcast terminates, for example, previous voice identification result is " navigation To Lujiazui East Road ", and then latter voice recognition result is " playing the song kind sun ", then first element is reported " navigating to Lujiazui East Road ", while the instruction for navigating to Lujiazui East Road is sent to navigation application, but and then play song " the kind sun ".
When the first information type and second information type do not include voice broadcast and media play, order Perform first voice identification result and second voice identification result.
In this embodiment, if the first voice identification result and the second voice identification result do not include voice broadcast and Media play, then it can sequentially perform two voice identification results.For example, previous voice identification result is " opening car door ", And then latter voice recognition result is " opening air-conditioning ", then sends OPEN to car door, then sends and opens to air-conditioning Instruction.
Fig. 3 is the flow chart of step S203 in a kind of method of speech processing according to an exemplary embodiment.
As shown in figure 3, in one embodiment, the step S203 in Fig. 2 includes step S301-S302:
In step S301, respectively according to predetermined registration operation guide by first voice identification result and second voice Recognition result is converted into executable the first operation instruction information and the second operation instruction information;
In step s 302, the info class that first operation instruction information and the second operation instruction information are included is determined Type.
In this embodiment, terminal has a mapping table (or operating guidance), is understood according to this mapping table semantic The result of parsing, judge what action (such as heightening how much air-conditioner temperature spends) done, then do the (regulation of what voice broadcast Any audio is dialled after complete air-conditioning).Terminal has a set of logic, and being converted into the semantic analysis result received according to the logic locally can The execution action of operation or voice broadcast or media play.
Fig. 4 is the flow chart of another method of speech processing according to an exemplary embodiment.
As shown in figure 4, in one embodiment, the above method also includes step S401-S402:
The behavior for receiving input interrupts rule setting order;
Rule setting order is interrupted according to the behavior, sets the default behavior to interrupt rule.
In this embodiment, user or manufacturer can be arranged as required to default behavior and interrupt rule, so as to basis The setting of oneself goes to handle the conflict between two voice identification results.
Wherein, as shown in figure 5, a behavior can be set to interrupt table interrupts rule as a default behavior, wherein, N, A, M represent voice broadcast, action executing and media play respectively, and P (n-1) represents the first voice identification result, and P (n) is represented Second voice identification result, specific set see Fig. 5.
Following is apparatus of the present invention embodiment, can be used for performing the inventive method embodiment.
Fig. 6 is a kind of block diagram of voice processing apparatus according to an exemplary embodiment, and the device can be by soft Part, hardware or both are implemented in combination with as some or all of of terminal device.As shown in fig. 6, the voice processing apparatus Including:
First identification module 61, for receiving the first speech data information, and carry out speech recognition and obtain the knowledge of the first voice Other result;
Second identification module 62, for when the terminal device is during first voice identification result is performed, If receiving second speech data information, carry out speech recognition and obtain the second voice identification result;
First determining module 63, for determining first voice identification result and second voice messaging identification respectively As a result the information type included;
Second determining module 64, for the first information type that is included according to first voice identification result and described The second information type and default behavior that second voice messaging recognition result is included interrupt rule, determine first language The executive mode of sound recognition result and second voice identification result.
In one embodiment, described information type includes:Voice broadcast, action executing and media play.
In one embodiment, the default behavior, which interrupts rule, includes:
When the first information type and second information type include comprising voice broadcast or media play When, stop performing first voice identification result, start to perform second voice identification result;
When the first information type includes media play, and second information type includes voice broadcast, institute is reduced The volume of media play is stated, starts to perform second voice identification result, and performed in second voice identification result Bi Hou, recover the volume of the media play;
When the first information type includes voice broadcast, and second information type includes media play, performing After complete first voice identification result, start to perform second voice identification result;
When the first information type and second information type do not include voice broadcast and media play, order Perform first voice identification result and second voice identification result.
Fig. 7 is the block diagram of the first determining module in a kind of voice processing apparatus according to an exemplary embodiment.
As shown in fig. 7, in one embodiment, first determining module 63 includes:
Submodule 71 is converted, for respectively according to predetermined registration operation guide by first voice identification result and described second Voice identification result is converted into executable the first operation instruction information and the second operation instruction information;
Determination sub-module 72, the letter included for determining first operation instruction information and the second operation instruction information Cease type.
Fig. 8 is the block diagram of another voice processing apparatus according to an exemplary embodiment.
As shown in figure 8, in one embodiment, described device also includes:
Receiving module 81, the behavior for receiving input interrupt rule setting order;
Setup module 82, for interrupting rule setting order according to the behavior, the default behavior is set to interrupt rule Then.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more The shape for the computer program product that usable storage medium is implemented on (including but is not limited to magnetic disk storage and optical memory etc.) Formula.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to comprising including these changes and modification.

Claims (10)

  1. A kind of 1. method of speech processing, for terminal device, it is characterised in that including:
    The first speech data information is received, and carries out speech recognition and obtains the first voice identification result;
    When the terminal device is during first voice identification result is performed, if receiving second speech data letter Breath, carry out speech recognition and obtain the second voice identification result;
    The information type that first voice identification result and the second voice messaging recognition result are included is determined respectively;
    The first information type included according to first voice identification result and the second voice messaging recognition result institute Comprising the second information type and default behavior interrupt rule, determine first voice identification result and second language The executive mode of sound recognition result.
  2. 2. according to the method for claim 1, it is characterised in that described information type includes:Voice broadcast, action executing and Media play.
  3. 3. according to the method for claim 2, it is characterised in that the default behavior, which interrupts rule, to be included:
    When the first information type and second information type include voice broadcast or include media play, stop First voice identification result is only performed, starts to perform second voice identification result;
    When the first information type includes media play, and second information type includes voice broadcast, the matchmaker is reduced The volume that body plays, starts to perform second voice identification result, and after second voice identification result is finished, Recover the volume of the media play;
    When the first information type includes voice broadcast, and second information type includes media play, having performed After stating the first voice identification result, start to perform second voice identification result;
    When the first information type and second information type do not include voice broadcast and media play, order performs First voice identification result and second voice identification result.
  4. 4. according to the method for claim 2, it is characterised in that described to determine first voice identification result and institute respectively The information type that the second voice messaging recognition result is included is stated, including:
    Respectively being converted into first voice identification result and second voice identification result according to predetermined registration operation guide can The first operation instruction information and the second operation instruction information performed;
    Determine the information type that first operation instruction information and the second operation instruction information are included.
  5. 5. according to the method for claim 1, it is characterised in that methods described also includes:
    The behavior for receiving input interrupts rule setting order;
    Rule setting order is interrupted according to the behavior, sets the default behavior to interrupt rule.
  6. A kind of 6. voice processing apparatus, for terminal device, it is characterised in that including:
    First identification module, for receiving the first speech data information, and carry out speech recognition and obtain the first voice identification result;
    Second identification module, for when the terminal device is during first voice identification result is performed, if receiving To second speech data information, carry out speech recognition and obtain the second voice identification result;
    First determining module, for determining first voice identification result and the second voice messaging recognition result institute respectively Comprising information type;
    Second determining module, for the first information type included according to first voice identification result and second language The second information type and default behavior that message breath recognition result is included interrupt rule, determine first speech recognition As a result with the executive mode of second voice identification result.
  7. 7. device according to claim 6, it is characterised in that described information type includes:Voice broadcast, action executing and Media play.
  8. 8. device according to claim 7, it is characterised in that the default behavior, which interrupts rule, to be included:
    When the first information type and second information type include voice broadcast or include media play, stop First voice identification result is only performed, starts to perform second voice identification result;
    When the first information type includes media play, and second information type includes voice broadcast, the matchmaker is reduced The volume that body plays, starts to perform second voice identification result, and after second voice identification result is finished, Recover the volume of the media play;
    When the first information type includes voice broadcast, and second information type includes media play, having performed After stating the first voice identification result, start to perform second voice identification result;
    When the first information type and second information type do not include voice broadcast and media play, order performs First voice identification result and second voice identification result.
  9. 9. device according to claim 7, it is characterised in that first determining module includes:
    Submodule is converted, for respectively first voice identification result and second voice being known according to predetermined registration operation guide Other result is converted into executable the first operation instruction information and the second operation instruction information;
    Determination sub-module, the info class included for determining first operation instruction information and the second operation instruction information Type.
  10. 10. device according to claim 6, it is characterised in that described device also includes:
    Receiving module, the behavior for receiving input interrupt rule setting order;
    Setup module, for interrupting rule setting order according to the behavior, the default behavior is set to interrupt rule.
CN201710606704.7A 2017-07-24 2017-07-24 Method of speech processing and device Pending CN107342085A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710606704.7A CN107342085A (en) 2017-07-24 2017-07-24 Method of speech processing and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710606704.7A CN107342085A (en) 2017-07-24 2017-07-24 Method of speech processing and device

Publications (1)

Publication Number Publication Date
CN107342085A true CN107342085A (en) 2017-11-10

Family

ID=60216442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710606704.7A Pending CN107342085A (en) 2017-07-24 2017-07-24 Method of speech processing and device

Country Status (1)

Country Link
CN (1) CN107342085A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108270928A (en) * 2018-04-20 2018-07-10 维沃移动通信有限公司 The method and mobile terminal of a kind of speech recognition
CN109887483A (en) * 2019-01-04 2019-06-14 平安科技(深圳)有限公司 Self-Service processing method, device, computer equipment and storage medium
CN109903758A (en) * 2017-12-08 2019-06-18 阿里巴巴集团控股有限公司 Audio-frequency processing method, device and terminal device
CN110125946A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 Automatic call method, device, electronic equipment and computer-readable medium
CN110534108A (en) * 2019-09-25 2019-12-03 北京猎户星空科技有限公司 A kind of voice interactive method and device
CN110867197A (en) * 2019-10-23 2020-03-06 吴杰 Method and equipment for interrupting voice robot in real time in voice interaction process
CN111415642A (en) * 2020-03-31 2020-07-14 广东美的制冷设备有限公司 Voice broadcast method and device of electric equipment, air conditioner and storage medium
CN111540349A (en) * 2020-03-27 2020-08-14 北京捷通华声科技股份有限公司 Voice interruption method and device
CN112637431A (en) * 2020-12-10 2021-04-09 出门问问(苏州)信息科技有限公司 Voice interaction method and device and computer readable storage medium
CN112965687A (en) * 2021-03-19 2021-06-15 成都启英泰伦科技有限公司 Multi-user voice recognition product development platform and development method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070150289A1 (en) * 2005-12-21 2007-06-28 Kyocera Mita Corporation Electronic apparatus and computer readable medium recorded voice operating program
CN105138110A (en) * 2014-05-29 2015-12-09 中兴通讯股份有限公司 Voice interaction method and voice interaction device
CN105206260A (en) * 2015-08-31 2015-12-30 努比亚技术有限公司 Terminal voice broadcasting method, device and terminal voice operation method
CN105845136A (en) * 2015-01-13 2016-08-10 中兴通讯股份有限公司 Voice control method and device, and terminal
CN106653021A (en) * 2016-12-27 2017-05-10 上海智臻智能网络科技股份有限公司 Voice wake-up control method and device and terminal
CN106814639A (en) * 2015-11-27 2017-06-09 富泰华工业(深圳)有限公司 Speech control system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070150289A1 (en) * 2005-12-21 2007-06-28 Kyocera Mita Corporation Electronic apparatus and computer readable medium recorded voice operating program
CN105138110A (en) * 2014-05-29 2015-12-09 中兴通讯股份有限公司 Voice interaction method and voice interaction device
CN105845136A (en) * 2015-01-13 2016-08-10 中兴通讯股份有限公司 Voice control method and device, and terminal
CN105206260A (en) * 2015-08-31 2015-12-30 努比亚技术有限公司 Terminal voice broadcasting method, device and terminal voice operation method
CN106814639A (en) * 2015-11-27 2017-06-09 富泰华工业(深圳)有限公司 Speech control system and method
CN106653021A (en) * 2016-12-27 2017-05-10 上海智臻智能网络科技股份有限公司 Voice wake-up control method and device and terminal

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903758A (en) * 2017-12-08 2019-06-18 阿里巴巴集团控股有限公司 Audio-frequency processing method, device and terminal device
CN108270928A (en) * 2018-04-20 2018-07-10 维沃移动通信有限公司 The method and mobile terminal of a kind of speech recognition
CN108270928B (en) * 2018-04-20 2020-11-20 维沃移动通信有限公司 Voice recognition method and mobile terminal
CN109887483A (en) * 2019-01-04 2019-06-14 平安科技(深圳)有限公司 Self-Service processing method, device, computer equipment and storage medium
CN110125946B (en) * 2019-04-23 2021-08-27 北京淇瑀信息科技有限公司 Automatic call method, automatic call device, electronic equipment and computer readable medium
CN110125946A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 Automatic call method, device, electronic equipment and computer-readable medium
CN110534108A (en) * 2019-09-25 2019-12-03 北京猎户星空科技有限公司 A kind of voice interactive method and device
CN110867197A (en) * 2019-10-23 2020-03-06 吴杰 Method and equipment for interrupting voice robot in real time in voice interaction process
CN111540349A (en) * 2020-03-27 2020-08-14 北京捷通华声科技股份有限公司 Voice interruption method and device
CN111540349B (en) * 2020-03-27 2023-10-10 北京捷通华声科技股份有限公司 Voice breaking method and device
CN111415642A (en) * 2020-03-31 2020-07-14 广东美的制冷设备有限公司 Voice broadcast method and device of electric equipment, air conditioner and storage medium
CN112637431A (en) * 2020-12-10 2021-04-09 出门问问(苏州)信息科技有限公司 Voice interaction method and device and computer readable storage medium
CN112965687A (en) * 2021-03-19 2021-06-15 成都启英泰伦科技有限公司 Multi-user voice recognition product development platform and development method

Similar Documents

Publication Publication Date Title
CN107342085A (en) Method of speech processing and device
JP5381988B2 (en) Dialogue speech recognition system, dialogue speech recognition method, and dialogue speech recognition program
US11183187B2 (en) Dialog method, dialog system, dialog apparatus and program that gives impression that dialog system understands content of dialog
US9583102B2 (en) Method of controlling interactive system, method of controlling server, server, and interactive device
US20140036022A1 (en) Providing a conversational video experience
CN107403011B (en) Virtual reality environment language learning implementation method and automatic recording control method
WO2017200072A1 (en) Dialog method, dialog system, dialog device, and program
JP2011209787A (en) Information processor, information processing method, and program
JP2011209786A (en) Information processor, information processing method, and program
JP6970413B2 (en) Dialogue methods, dialogue systems, dialogue devices, and programs
WO2017200080A1 (en) Intercommunication method, intercommunication device, and program
CN110600013B (en) Training method and device for non-parallel corpus voice conversion data enhancement model
WO2017175351A1 (en) Information processing device
WO2017200076A1 (en) Dialog method, dialog system, dialog device, and program
WO2017200079A1 (en) Dialog method, dialog system, dialog device, and program
CN110008481A (en) Translated speech generation method, device, computer equipment and storage medium
JP2016126294A (en) Voice interaction control device, control method of voice interaction control device, and voice interactive device
KR102197387B1 (en) Natural Speech Recognition Method and Apparatus
WO2021077528A1 (en) Method for interrupting human-machine conversation
KR20210123545A (en) Method and apparatus for conversation service based on user feedback
JP6601625B2 (en) Dialogue method, dialogue system, dialogue apparatus, and program
WO2017200077A1 (en) Dialog method, dialog system, dialog device, and program
CN111354351B (en) Control device, voice interaction device, voice recognition server, and storage medium
WO2013181633A1 (en) Providing a converstional video experience
JP6610965B2 (en) Dialogue method, dialogue system, dialogue apparatus, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171110

RJ01 Rejection of invention patent application after publication