CN107342085A

CN107342085A - Method of speech processing and device

Info

Publication number: CN107342085A
Application number: CN201710606704.7A
Authority: CN
Inventors: 李霄寒; 全刚; 谢政彪; 李鹏; 刘升平
Original assignee: Shenzhen Cloud Known Sound Information Technology Co Ltd
Current assignee: Shenzhen Cloud Known Sound Information Technology Co Ltd; Shenzhen Yunzhisheng Information Technology Co Ltd
Priority date: 2017-07-24
Filing date: 2017-07-24
Publication date: 2017-11-10

Abstract

The present invention be on a kind of method of speech processing and device, wherein, method of speech processing includes：The first speech data information is received, and carries out speech recognition and obtains the first voice identification result；When terminal device is during the first voice identification result is performed, if receiving second speech data information, progress speech recognition obtains the second voice identification result；The information type that the first voice identification result and the second voice messaging recognition result are included is determined respectively；The second information type and default behavior that the first information type and the second voice messaging recognition result included according to the first voice identification result is included interrupt rule, determine the executive mode of the first voice identification result and the second voice identification result.By the technical scheme, in user when with terminal session user can be avoided to wait the long time, i.e. user withouts waiting for loquituring again after terminal has been reported, reduces the stand-by period of user, lift the usage experience of user.

Description

Method of speech processing and device

Technical field

The present invention relates to technical field of voice recognition, more particularly to a kind of method of speech processing and device.

Background technology

Fig. 1 shows the conversational mode routinely talked with, as shown in figure 1, being the shortcomings that conventional conversational mode：1) oppress Sense：After device prompts user can speak, sentence that user must decide as early as possible is said in time before VAD time-out Out, while centre can not have pause.Because if once having pause being possible to that end of speaking will be judged as by equipment, behind It cannot be heard and parsed by equipment if besides.Constriction is the point that user experiences worst in interactive voice, It is one point of learning cost highest when user uses phonetic function.2) it is forced to wait：In dialog procedure, even if user decides What to say, it is also necessary to which waiting facilities to finish can just speak.For example user already sees first navigation in Fig. 1 examples Required for result is oneself, but has to waiting facilities and finish " you....Which " it is so tediously long if after could say Words.3) poor robustness：Because the time window that user speaks is generally based on local VAD to determine, but because VAD is whole A least intelligent part in Intelligent voice dialog, often occur when user will speak because the noise on side is (such as side People's chat, television noise etc.) and make it that VAD is judged by accident, cause recording window to be closed in advance, or the feelings closed that delay Condition.

The content of the invention

The embodiment of the present invention provides a kind of method of speech processing and device, to realize that user and equipment room can carry out streaming Dialogue, so as to reduce the waiting time of user, improves the robustness of speech recognition system, lifts the usage experience of user.

First aspect according to embodiments of the present invention, there is provided a kind of method of speech processing, including：

The first speech data information is received, and carries out speech recognition and obtains the first voice identification result；

When the terminal device is during first voice identification result is performed, if receiving second speech data Information, carry out speech recognition and obtain the second voice identification result；

The info class that first voice identification result and the second voice messaging recognition result are included is determined respectively Type；

The first information type included according to first voice identification result and second voice messaging identification knot The second information type and default behavior that fruit is included interrupt rule, determine first voice identification result and described The executive mode of two voice identification results.

In this embodiment, if when performing the first voice identification result, the second voice messaging is received again, then is being known After not obtaining the second voice identification result, rule is interrupted according to the information type of two voice identification results and default behavior, The executive mode of two voice identification results is determined, that is, determines whether to interrupt the first voice identification result, starts to perform the second language Sound recognition result.So, user can be avoided to wait the long time, i.e. user need not in user when with terminal session Wait terminal to be loquitured again after having reported, reduce the stand-by period of user, lift the usage experience of user.

In one embodiment, described information type includes：Voice broadcast, action executing and media play.

In this embodiment, the information type of voice identification result mainly has three kinds, and one kind is voice broadcast, i.e. terminal is set The voice broadcast of standby middle virtual personage, such as reports weather, is chatted etc. with user；Another is action executing, such as Turn on light, navigate, the action such as regulation temperature, these actions hardly consume the time of user；Also one kind is media play, is such as broadcast Put the music on, radio etc..

In one embodiment, the default behavior, which interrupts rule, includes：

When the first information type and second information type include comprising voice broadcast or media play When, stop performing first voice identification result, start to perform second voice identification result；

In this embodiment, if including voice broadcast in two voice identification results, or media play is included, So both must produce conflict, at this point it is possible to interrupt the implementation procedure of the first voice identification result, start to perform the second voice Recognition result.For example previous voice identification result is " navigating to Lujiazui ", and then latter voice recognition result is " to lead Navigate to Lujiazui East Road ", then first element, which is reported " navigating to Lujiazui for you " while started, navigates to the dynamic of Lujiazui Make, but the report for and then interrupting first makes into report " navigating to Lujiazui East Road ", while send navigation to navigation application To the instruction on Lujiazui East Road.

When the first information type includes media play, and second information type includes voice broadcast, institute is reduced The volume of media play is stated, starts to perform second voice identification result, and performed in second voice identification result Bi Hou, recover the volume of the media play；

In this embodiment, if the first voice identification result includes media play, the second voice identification result includes language Sound is reported, then can now reduce the volume of media play, proceed by voice broadcast corresponding to the second voice identification result, After voice broadcast terminates, then recover the volume of media play.For example, previous voice identification result is " to play song kind too Sun ", and then latter voice recognition result is " navigating to Lujiazui East Road ", then first element plays " the kind sun ", But the volume of " the kind sun " is and then reduced, is reported " navigating to Lujiazui East Road " with normal quantity, while is sent out to navigation application Go out to navigate to the instruction on Lujiazui East Road, after " navigating to Lujiazui East Road " has been reported, recover normal quantity and continue to play song Bent " the kind sun ".

When the first information type includes voice broadcast, and second information type includes media play, performing After complete first voice identification result, start to perform second voice identification result；

In this embodiment, if the first voice identification result includes voice broadcast, the second voice identification result includes matchmaker Body plays, then can commence play out media information after voice broadcast terminates, for example, previous voice identification result is " navigation To Lujiazui East Road ", and then latter voice recognition result is " playing the song kind sun ", then first element is reported " navigating to Lujiazui East Road ", while the instruction for navigating to Lujiazui East Road is sent to navigation application, but and then play song " the kind sun ".

When the first information type and second information type do not include voice broadcast and media play, order Perform first voice identification result and second voice identification result.

In this embodiment, if the first voice identification result and the second voice identification result do not include voice broadcast and Media play, then it can sequentially perform two voice identification results.For example, previous voice identification result is " opening car door ", And then latter voice recognition result is " opening air-conditioning ", then sends OPEN to car door, then sends and opens to air-conditioning Instruction.

In one embodiment, it is described to determine first voice identification result and second voice messaging identification respectively As a result the information type included, including：

First voice identification result and second voice identification result are converted according to predetermined registration operation guide respectively For executable the first operation instruction information and the second operation instruction information；

Determine the information type that first operation instruction information and the second operation instruction information are included.

In this embodiment, terminal has a mapping table (or operating guidance), is understood according to this mapping table semantic The result of parsing, judge what action (such as heightening how much air-conditioner temperature spends) done, then do the (regulation of what voice broadcast Any audio is dialled after complete air-conditioning).Terminal has a set of logic, and being converted into the semantic analysis result received according to the logic locally can The execution action of operation or voice broadcast or media play.

In one embodiment, methods described also includes：

The behavior for receiving input interrupts rule setting order；

Rule setting order is interrupted according to the behavior, sets the default behavior to interrupt rule.

In this embodiment, user or manufacturer can be arranged as required to default behavior and interrupt rule, so as to basis The setting of oneself goes to handle the conflict between two voice identification results.

Second aspect according to embodiments of the present invention, there is provided a kind of voice processing apparatus, including：

First identification module, for receiving the first speech data information, and carry out speech recognition and obtain the first speech recognition As a result；

Second identification module, for when the terminal device is during first voice identification result is performed, if Second speech data information is received, speech recognition is carried out and obtains the second voice identification result；

First determining module, for determining first voice identification result and second voice messaging identification knot respectively The information type that fruit is included；

Second determining module, for the first information type included according to first voice identification result and described The second information type and default behavior that two voice messaging recognition results are included interrupt rule, determine first voice The executive mode of recognition result and second voice identification result.

In one embodiment, the default behavior, which interrupts rule, includes：

In one embodiment, first determining module includes：

Submodule is converted, for respectively according to predetermined registration operation guide by first voice identification result and second language Sound recognition result is converted into executable the first operation instruction information and the second operation instruction information；

Determination sub-module, the information included for determining first operation instruction information and the second operation instruction information Type.

In one embodiment, described device also includes：

Receiving module, the behavior for receiving input interrupt rule setting order；

Setup module, for interrupting rule setting order according to the behavior, the default behavior is set to interrupt rule.

It should be appreciated that the general description and following detailed description of the above are only exemplary and explanatory, not Can the limitation present invention.

Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by the explanations write Specifically noted structure is realized and obtained in book, claims and accompanying drawing.

Below by drawings and examples, technical scheme is described in further detail.

Brief description of the drawings

Accompanying drawing herein is merged in specification and forms the part of this specification, shows the implementation for meeting the present invention Example, and for explaining principle of the invention together with specification.

Fig. 1 is the schematic diagram of the method for speech processing in correlation technique.

Fig. 2 is a kind of flow chart of method of speech processing according to an exemplary embodiment.

Fig. 3 is the flow chart of step S203 in a kind of method of speech processing according to an exemplary embodiment.

Fig. 4 is the flow chart of another method of speech processing according to an exemplary embodiment.

Fig. 5 is the schematic diagram that behavior according to an exemplary embodiment interrupts table.

Fig. 6 is a kind of block diagram of voice processing apparatus according to an exemplary embodiment.

Fig. 7 is the block diagram of the first determining module in a kind of voice processing apparatus according to an exemplary embodiment.

Fig. 8 is the block diagram of another voice processing apparatus according to an exemplary embodiment.

Embodiment

Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment Described in embodiment do not represent and the consistent all embodiments of the present invention.On the contrary, they be only with it is such as appended The example of the consistent apparatus and method of some aspects being described in detail in claims, of the invention.

Fig. 2 is a kind of flow chart of method of speech processing according to an exemplary embodiment.The method of speech processing Applied in terminal device, the terminal device can be mobile phone, computer, digital broadcast terminal, messaging devices, swim Any equipment with voice control function such as play console, tablet device, Medical Devices, body-building equipment, personal digital assistant. As shown in Fig. 2 the method comprising the steps of S201-S204：

In step s 201, the first speech data information is received, and carries out speech recognition and obtains the first voice identification result；

Wherein, carry out speech recognition to speech data information to be completed by terminal, service can also be sent to by terminal Device, by returning to terminal after the completion of server.Voice identification result is text message corresponding with speech data information.

In step S202, when the terminal device is during first voice identification result is performed, if receiving To second speech data information, carry out speech recognition and obtain the second voice identification result；

In step S203, first voice identification result and the second voice messaging recognition result institute are determined respectively Comprising information type；

The first information type included in step S204 according to first voice identification result and second language The second information type and default behavior that message breath recognition result is included interrupt rule, determine first speech recognition As a result with the executive mode of second voice identification result.

In one embodiment, the default behavior, which interrupts rule, includes：

As shown in figure 3, in one embodiment, the step S203 in Fig. 2 includes step S301-S302：

In step S301, respectively according to predetermined registration operation guide by first voice identification result and second voice Recognition result is converted into executable the first operation instruction information and the second operation instruction information；

In step s 302, the info class that first operation instruction information and the second operation instruction information are included is determined Type.

As shown in figure 4, in one embodiment, the above method also includes step S401-S402：

The behavior for receiving input interrupts rule setting order；

Wherein, as shown in figure 5, a behavior can be set to interrupt table interrupts rule as a default behavior, wherein, N, A, M represent voice broadcast, action executing and media play respectively, and P (n-1) represents the first voice identification result, and P (n) is represented Second voice identification result, specific set see Fig. 5.

Following is apparatus of the present invention embodiment, can be used for performing the inventive method embodiment.

Fig. 6 is a kind of block diagram of voice processing apparatus according to an exemplary embodiment, and the device can be by soft Part, hardware or both are implemented in combination with as some or all of of terminal device.As shown in fig. 6, the voice processing apparatus Including：

First identification module 61, for receiving the first speech data information, and carry out speech recognition and obtain the knowledge of the first voice Other result；

Second identification module 62, for when the terminal device is during first voice identification result is performed, If receiving second speech data information, carry out speech recognition and obtain the second voice identification result；

First determining module 63, for determining first voice identification result and second voice messaging identification respectively As a result the information type included；

Second determining module 64, for the first information type that is included according to first voice identification result and described The second information type and default behavior that second voice messaging recognition result is included interrupt rule, determine first language The executive mode of sound recognition result and second voice identification result.

In one embodiment, the default behavior, which interrupts rule, includes：

As shown in fig. 7, in one embodiment, first determining module 63 includes：

Submodule 71 is converted, for respectively according to predetermined registration operation guide by first voice identification result and described second Voice identification result is converted into executable the first operation instruction information and the second operation instruction information；

Determination sub-module 72, the letter included for determining first operation instruction information and the second operation instruction information Cease type.

As shown in figure 8, in one embodiment, described device also includes：

Receiving module 81, the behavior for receiving input interrupt rule setting order；

Setup module 82, for interrupting rule setting order according to the behavior, the default behavior is set to interrupt rule Then.

It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more The shape for the computer program product that usable storage medium is implemented on (including but is not limited to magnetic disk storage and optical memory etc.) Formula.

The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.

These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.

These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.

Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to comprising including these changes and modification.

Claims

A kind of 1. method of speech processing, for terminal device, it is characterised in that including：

The first speech data information is received, and carries out speech recognition and obtains the first voice identification result；

When the terminal device is during first voice identification result is performed, if receiving second speech data letter Breath, carry out speech recognition and obtain the second voice identification result；

The information type that first voice identification result and the second voice messaging recognition result are included is determined respectively；

The first information type included according to first voice identification result and the second voice messaging recognition result institute Comprising the second information type and default behavior interrupt rule, determine first voice identification result and second language The executive mode of sound recognition result.
2. according to the method for claim 1, it is characterised in that described information type includes：Voice broadcast, action executing and Media play.
3. according to the method for claim 2, it is characterised in that the default behavior, which interrupts rule, to be included：

When the first information type and second information type include voice broadcast or include media play, stop First voice identification result is only performed, starts to perform second voice identification result；

When the first information type includes media play, and second information type includes voice broadcast, the matchmaker is reduced The volume that body plays, starts to perform second voice identification result, and after second voice identification result is finished, Recover the volume of the media play；

When the first information type includes voice broadcast, and second information type includes media play, having performed After stating the first voice identification result, start to perform second voice identification result；

When the first information type and second information type do not include voice broadcast and media play, order performs First voice identification result and second voice identification result.
4. according to the method for claim 2, it is characterised in that described to determine first voice identification result and institute respectively The information type that the second voice messaging recognition result is included is stated, including：

Respectively being converted into first voice identification result and second voice identification result according to predetermined registration operation guide can The first operation instruction information and the second operation instruction information performed；

Determine the information type that first operation instruction information and the second operation instruction information are included.
5. according to the method for claim 1, it is characterised in that methods described also includes：

The behavior for receiving input interrupts rule setting order；

Rule setting order is interrupted according to the behavior, sets the default behavior to interrupt rule.
A kind of 6. voice processing apparatus, for terminal device, it is characterised in that including：

First identification module, for receiving the first speech data information, and carry out speech recognition and obtain the first voice identification result；

Second identification module, for when the terminal device is during first voice identification result is performed, if receiving To second speech data information, carry out speech recognition and obtain the second voice identification result；

First determining module, for determining first voice identification result and the second voice messaging recognition result institute respectively Comprising information type；

Second determining module, for the first information type included according to first voice identification result and second language The second information type and default behavior that message breath recognition result is included interrupt rule, determine first speech recognition As a result with the executive mode of second voice identification result.
7. device according to claim 6, it is characterised in that described information type includes：Voice broadcast, action executing and Media play.
8. device according to claim 7, it is characterised in that the default behavior, which interrupts rule, to be included：

When the first information type and second information type include voice broadcast or include media play, stop First voice identification result is only performed, starts to perform second voice identification result；

When the first information type includes media play, and second information type includes voice broadcast, the matchmaker is reduced The volume that body plays, starts to perform second voice identification result, and after second voice identification result is finished, Recover the volume of the media play；

When the first information type includes voice broadcast, and second information type includes media play, having performed After stating the first voice identification result, start to perform second voice identification result；

When the first information type and second information type do not include voice broadcast and media play, order performs First voice identification result and second voice identification result.
9. device according to claim 7, it is characterised in that first determining module includes：

Submodule is converted, for respectively first voice identification result and second voice being known according to predetermined registration operation guide Other result is converted into executable the first operation instruction information and the second operation instruction information；

Determination sub-module, the info class included for determining first operation instruction information and the second operation instruction information Type.
10. device according to claim 6, it is characterised in that described device also includes：

Receiving module, the behavior for receiving input interrupt rule setting order；

Setup module, for interrupting rule setting order according to the behavior, the default behavior is set to interrupt rule.