Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
A kind of voice dialogue processing method for voice dialogue platform provided as shown in Figure 1 for one embodiment of the invention
Flow chart, include the following steps:
S11: according to speech recognition and understand the highest semantic knot of n possibility for obtaining the voice data of user's input
Fruit;
S12: as n > 1, determining field involved in each semantic results, judges the corresponding semantic slot of each semantic results
It whether is the key that semantic slot in the field;
S13: will there are m semantic results of crucial semantic slot to increase to disambiguation candidate list, wherein m≤n;
S14: as m > 1, automatic disambiguate is carried out to the disambiguation candidate list according to existing resource and obtains l item semanteme knot
Fruit, wherein the existing resource includes history context information, history disambiguation record, voice dialogue platform resource and/or customization
Disambiguate rule base.
In the present embodiment, this method can be fitted in the equipment with interactive voice, for example, intelligent sound box,
In the equipment such as smart phone.For example, user can be to intelligent sound box when user wants to play a segment of audio by intelligent sound box
Direct voice input.
For step S11, by the voice data of user's input by ASR (Automatic Speech Recognition,
Automatic speech recognition), and corresponding NLU (Natural Language Understanding, natural language understanding)
Processing, thus obtain the highest ASR hypotheses of n possibility (ASR it is assumed that also referred to asnBest input or topn
Input) and its corresponding semantic parsing result.
For step S12, field involved in each semantic results is determined, judge the corresponding semantic slot of each semantic results
It whether is the key that semantic slot in the field, wherein the semantic slot of key here, which refers to, is in short being resolved to semantic knot
When fruit, there is the slot position of major significance for some field.Since semanteme can be comprising very more information, if not making key
The restriction of semantic slot, it will lead to the semantic parsing result of a large amount of inaccuracy.
Such as " music for playing five minutes Zhou Jielun ".
In the parsing of music field are as follows:
" singer "=" Zhou Jielun " " operation "=" broadcasting " " playing duration "=" five minutes "
And it can be resolved in radio station field:
" operation "=" broadcasting " " column "=" five minutes " " keyword "=" Zhou Jielun "
Although radio station field parsing and correctly because radio station field column name " many kinds of ", institute
With when the different degree of semantic slot position is arranged, the significance level of " column " slot position compares the lower, " song relative to music field
For hand " name, the requirement of the semantic slot of " column " in radio station field and the key for being unsatisfactory for ambiguity candidate item, and " singer " is music
The semantic slot of the key in field, therefore the parsing of music field just has been retained, and the semantic parsing in radio station field is then direct
Filtering.
" play small red cap " for another example
In the parsing of music field are as follows:
" operation "=" broadcasting " " song title "=" small red cap "
Parsing in story field are as follows:
" operation "=" broadcasting " " story name "=" small red cap "
Because " story name " is crucial semantic slot for story field, " song title " is also crucial for music field
Semantic slot, therefore two semantic parsing results have passed through most preliminary ambiguity detection simultaneously.
It for step S13, " plays small red cap " in step s 12 and says, since two semantic parsing results pass through simultaneously
Most preliminary ambiguity detection, the highest ASR of n=2 possibility is it is assumed that m=2 after preliminary ambiguity detection, so by 2 semantemes
Parsing result has all been selected into the list of ambiguity candidate.At this point, m=n=2.
For step S14, if in step s 13, when determining m=1, preliminary disambiguation has obtained a semantic knot
Fruit then directly executes corresponding instruction.
If in step s 13, determining m > 1, then, the disambiguation candidate list is disambiguated according to existing resource
Processing.Automatic disambiguate can include relevant resource, customization according to the history Duolun contextual information in existing resource, data service
Automatic disambiguation rule base, history disambiguate record disambiguated.To which the semantic results disambiguated in candidate list be disambiguated.
When disambiguating an only surplus semantic results in candidate list, determine that user inputs the corresponding semantic results of voice, to carry out phase
Corresponding operation.
It can be seen that by the embodiment and the language for including in semantic parsing result made full use of by dialogue ambiguity detection
Adopted slot position information, the semantic slot for being directed to different semantic domains set different importance, only occur in crucial semantic slot
When ambiguity, can just introduce automatic disambiguation mechanism, thus can automatic fitration fall semantic parsing bring vacation ambiguity.
As an implementation, in the present embodiment, the method also includes:
When according to have resource to the disambiguations candidate list carry out it is automatic disambiguate obtain l item above semantic results when, by
The above semantic results of l item confirm to user feedback to user;
When user's input is instructed with the corresponding confirmation of the feedback, the determining voice data pair with user input
The semantic results answered execute corresponding operation;
When the exceptional instructions of user's input, abnormal prompt information is fed back.
In the present embodiment, after automatic disambiguate, candidate list is disambiguated there are also when 1 or more semantic results, Xiang Yong
Family feedback.
For example, " playing small red cap " for being mentioned in above-described embodiment.After automatic disambiguation is finished, ambiguity candidate list
In need to disambiguate there are also two semantic results, then ambiguity detection module, which will be arranged, disambiguates flag information and is, and then calls and disappear
Discrimination processing module disambiguates processing module according to detection module and TTS (the Text To that state flag bit return needs to broadcast is arranged
Speech, from Text To Speech) " finding children stories and music for you, which you will listen " guidance user select,
And enter listening state.
When the new wheel of user, which is input to, to be come, disambiguate processing module by judge that user inputs be " selection " or
" executing new task " then directly jumps out disambiguation operation, executes new task if it is new task.
If user is selected, need to judge whether the answer of user is abnormal.If entering exception, enter
Abnormal prompt process prompts user to reselect and be again introduced into listening state, if without departing from the language for selecting user
Adopted result is as final semantic results and executes relevant operation.
What it is due to user's input is voice messaging, and when disambiguation module guides user, user probably can't be according to
Prompt is replied, it is therefore desirable to further adjustment.
For example, when TTS content is " finding children stories and music for you, which you will listen "
User can reply:
" I wants to listen children stories ", " children stories ", " story ", " juvenile's story ", " music ", " song ", " first ",
" second ", " front that ", " below that ", " not being story ", " not being music "
It is also possible to the abnormal saying such as " joke " " the 6th " can be said.
Meanwhile being had an impact to guarantee that disambiguation module will not be intended to switching to user, when user directly says similar " broadcast
Put lustily water " new task when, disambiguation module can be jumped according to the semantic results of voice newly inputted.
When user says abnormal saying, in addition to generating abnormal T TS prompt, (such as " my unreadable, you are to listen
Music or story? "), at this time other than maintenance is monitored, it also will record abnormal number, when frequency of abnormity is more than one
When determining number, (such as twice), system will be prompted to user's " still not hearing, be said differently and have a try again " language similar in this way
Sentence.
When waiting user's selection, prevent user from not answering or user has been moved off for a long time, disambiguating system can be to prison
Listen one effective time of setting.When user does not answer in effective time, system can execute different according to preparatory configuration
Operation, such as:
The highest NLU of default choice probability, and the TTS for prompting user's " will play story small red cap for you " similar.Or
Person prompts user to be said differently and close monitoring.
Total process is as follows:
It can be seen that by the embodiment when there is " the true ambiguity " that cannot be distinguished from, it just can be by transferring to use
Family is selected.It is directed to the Different Discourse that user may answer, provides the stabilization that corresponding execution method guarantees system
Operation.
As an implementation, in the present embodiment, when the existing resource include at least history context information and/
Or when customization disambiguation rule base:
Inquiry and each language in the disambiguation candidate list in rule base are disambiguated in the history context information and/or customization
The corresponding information of adopted result;
Semantic results each in the disambiguation candidate list are disambiguated according to the corresponding information.
In the present embodiment, continue by taking " I wants to listen small red cap " as an example, for example, user is in the first round talks with, it is clear
It expresses " I wants to listen story ", when the second wheel expresses " I wants to listen small red cap ", at this point, will be that user automatically selects story neck
The semantic results in domain.
It is the program names of ending, such as " story for playing spring " since there are some with domain name, is disappeared by customization
Discrimination rule base will be confirmed whether the value with " story " ending (at this point, " song title "=" event in spring in crucial slot position
Thing " meets the rule), if not meeting the semantic parsing result that will also abandon the condition of being unsatisfactory for automatically, " I thinks for another example
Listen the story of kite ", if " kite " is unexpectedly resolved to song title by semantic meaning analysis module, since song title is not with "
Story " ending, thus it is automatic disambiguate will automatic fitration fall the parsing of the mistake.
As an implementation, in the present embodiment, when the existing resource, which includes at least history, disambiguates record:
The voice data for inquiring user's input whether there is history in preset time range and disambiguate record;
When disambiguating record there are history, record is disambiguated according to the history and determines that the voice data of user's input is corresponding
Semantic results.
In the present embodiment, disambiguating processing module automatically also can disambiguate record using the history that was recorded in the past, when with
When family sends the identical request for needing to disambiguate again in a short time, the disambiguation result of last time is directly selected for user.
As an implementation, in the present embodiment, when the existing resource includes at least voice dialogue platform resource
When:
Inquire the corresponding voice dialogue platform resource of each semantic results in the disambiguation candidate list;
The semantic results of no corresponding voice dialogue platform resource are disambiguated.
In the present embodiment, voice dialogue platform includes the multimedia resource that user wants inquiry or plays, such as
Music field platform resource is library, and the platform resource in stock market field is story library, and specific resource is an audio file
Either video file, such as song " heart is too soft " or story " small red cap " they are all a resources.
It is automatic to disambiguate since semanteme is not aware that whether the data service of voice dialogue platform includes relevant resource
Processing module can also utilize semantic results combination resource searching, and the semantic results for not finding resource will be also disambiguated.
Such as above illustrate, user says " I wants to listen " I is singer " ", due to cross-talk data service provider also
" I is singer " this collection cross-talk is not included, therefore the semantic parsing in cross-talk field is fallen in automatic disambiguation module meeting automatic fitration, keeps away
Exempt from user after selecting cross-talk, reresents user " not finding cross-talk " I is singer " "
It can be seen that by the implementation method based on history takes turns contextual information more in automatic disambiguate and data service is looked into
The automatic disambiguation rule base as a result, engagement height customizable is ask, can include crucial semantic in a variety of semantic parsing results
In the case where slot, invalid semantic results are eliminated automatically, by storage user's history selection record setting validity period, work as user
When requesting identical content in the short time again, ambiguity disambiguation module will read historical record automatically, avoid allowing user to same
Problem carries out multiple ambiguity selection.
A kind of voice dialogue processing system for voice dialogue platform of one embodiment of the invention offer is provided
Structural schematic diagram, the voice dialogue that the technical solution of the present embodiment is applicable to equipment for voice dialogue platform handles
Method, which can be performed the voice dialogue processing method that voice dialogue platform is used for described in above-mentioned any embodiment, and match
It sets in the terminal.
A kind of voice dialogue processing system for voice dialogue platform provided in this embodiment includes: that semantic understanding obtains
Program module 11, crucial semanteme slot determine program module 12, disambiguate candidate list and determine program module 13 and automatic disambiguation program
Module 14.
Wherein, semantic understanding obtains program module 11 and is used for according to speech recognition and understands the voice number for obtaining user and inputting
According to the highest semantic results of n possibility;Crucial semanteme slot determines program module 12 for as n > 1, determining each semantic knot
Field involved in fruit judges whether the corresponding semantic slot of each semantic results is the key that semantic slot in the field;Disappear
Discrimination candidate list determines that program module 13 is used to have m semantic results of crucial semantic slot to increase to disambiguation candidate list,
Wherein, m≤n;Automatic disambiguation program module 14 is for carrying out certainly the disambiguation candidate list according to existing resource as m > 1
Dynamic disambiguation obtains l semantic results, wherein the existing resource includes history context information, history disambiguation record, voice pair
It talks about platform resource and/or customization disambiguates rule base.
Further, institute's systems approach further include: user's analysis program module is used for
When according to have resource to the disambiguations candidate list carry out it is automatic disambiguate obtain l item above semantic results when, by
The above semantic results of l item confirm to user feedback to user;
When user's input is instructed with the corresponding confirmation of the feedback, the determining voice data pair with user input
The semantic results answered execute corresponding operation;
When the exceptional instructions of user's input, abnormal prompt information is fed back.
Further, when the existing resource includes at least history context information and/or customization disambiguates rule base:
Inquiry and each language in the disambiguation candidate list in rule base are disambiguated in the history context information and/or customization
The corresponding information of adopted result;
Semantic results each in the disambiguation candidate list are disambiguated according to the corresponding information.
Further, when the existing resource, which includes at least history, disambiguates record:
The voice data for inquiring user's input whether there is history in preset time range and disambiguate record;
When disambiguating record there are history, record is disambiguated according to the history and determines that the voice data of user's input is corresponding
Semantic results.
Further, when the existing resource includes at least voice dialogue platform resource:
Inquire the corresponding voice dialogue platform resource of each semantic results in the disambiguation candidate list;
The semantic results of no corresponding voice dialogue platform resource are disambiguated.
The embodiment of the invention also provides a kind of nonvolatile computer storage media, computer storage medium is stored with meter
Calculation machine executable instruction, the computer executable instructions can be performed in above-mentioned any means embodiment and are used for voice dialogue platform
Voice dialogue processing method;
As an implementation, nonvolatile computer storage media of the invention is stored with the executable finger of computer
It enables, computer executable instructions setting are as follows:
According to speech recognition and the highest semantic results of n possibility for the voice data for understanding acquisition user's input;
As n > 1, field involved in each semantic results is determined, whether judge the corresponding semantic slot of each semantic results
It is the key that semantic slot in the field;
To there are m semantic results of crucial semantic slot to increase to disambiguation candidate list, wherein m≤n;
As m > 1, automatic disambiguate is carried out to the disambiguation candidate list according to existing resource and obtains l semantic results,
In, the existing resource includes that history context information, history disambiguation record, voice dialogue platform resource and/or customization disambiguate
Rule base.
As a kind of non-volatile computer readable storage medium storing program for executing, it can be used for storing non-volatile software program, non-volatile
Property computer executable program and module, such as the corresponding program instruction/mould of the method for the test software in the embodiment of the present invention
Block.One or more program instruction is stored in non-volatile computer readable storage medium storing program for executing, when being executed by a processor, is held
The voice dialogue processing method for voice dialogue platform in the above-mentioned any means embodiment of row.
Non-volatile computer readable storage medium storing program for executing may include storing program area and storage data area, wherein storage journey
It sequence area can application program required for storage program area, at least one function;Storage data area can be stored according to test software
Device use created data etc..In addition, non-volatile computer readable storage medium storing program for executing may include that high speed is deposited at random
Access to memory, can also include nonvolatile memory, a for example, at least disk memory, flush memory device or other are non-
Volatile solid-state part.In some embodiments, it includes relative to place that non-volatile computer readable storage medium storing program for executing is optional
The remotely located memory of device is managed, these remote memories can be by being connected to the network to the device of test software.Above-mentioned network
Example include but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
The embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor, and with described at least one
The memory of a processor communication connection, wherein the memory is stored with the finger that can be executed by least one described processor
Enable, described instruction executed by least one described processor so that at least one described processor be able to carry out it is of the invention any
The step of voice dialogue processing method for voice dialogue platform of embodiment.
The client of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data
Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low
Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function
Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio,
Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) other electronic devices having data processing function.
Herein, relational terms such as first and second and the like be used merely to by an entity or operation with it is another
One entity or operation distinguish, and without necessarily requiring or implying between these entities or operation, there are any this reality
Relationship or sequence.Moreover, the terms "include", "comprise", include not only those elements, but also including being not explicitly listed
Other element, or further include for elements inherent to such a process, method, article, or device.Do not limiting more
In the case where system, the element that is limited by sentence " including ... ", it is not excluded that including process, method, the article of the element
Or there is also other identical elements in equipment.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, component shown as a unit may or may not be physics list
Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness
Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation
Method described in certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.