CN108962233A - Voice dialogue processing method and system for voice dialogue platform - Google Patents

Voice dialogue processing method and system for voice dialogue platform Download PDF

Info

Publication number
CN108962233A
CN108962233A CN201810835994.7A CN201810835994A CN108962233A CN 108962233 A CN108962233 A CN 108962233A CN 201810835994 A CN201810835994 A CN 201810835994A CN 108962233 A CN108962233 A CN 108962233A
Authority
CN
China
Prior art keywords
semantic
user
disambiguation
history
voice dialogue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810835994.7A
Other languages
Chinese (zh)
Other versions
CN108962233B (en
Inventor
林永楷
周伟达
樊帅
李春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN201810835994.7A priority Critical patent/CN108962233B/en
Publication of CN108962233A publication Critical patent/CN108962233A/en
Application granted granted Critical
Publication of CN108962233B publication Critical patent/CN108962233B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules

Abstract

The embodiment of the present invention provides a kind of voice dialogue processing method for voice dialogue platform.This method comprises: according to speech recognition and the highest semantic results of n possibility for the voice data for understanding acquisition user's input;As n > 1, field involved in each semantic results is determined, judge the corresponding crucial semantic slot of each semantic results;To there are m semantic results of crucial semantic slot to increase to disambiguation candidate list;As m > 1, according to existing resource, to candidate list progress is disambiguated, disambiguation obtains l semantic results automatically, and existing resource includes history context information, history disambiguation record, voice dialogue platform resource and/or customizes disambiguation rule base.The embodiment of the present invention also provides a kind of voice dialogue processing system for voice dialogue platform.The embodiment of the present invention passes through, and the semantic slot for being directed to different semantic domains sets different importance, and semantic parsing bring vacation ambiguity is fallen in automatic fitration, and further improves the effect of interactive voice by disambiguating automatically.

Description

Voice dialogue processing method and system for voice dialogue platform
Technical field
The present invention relates to voice dialogue field more particularly to a kind of voice dialogue processing methods for voice dialogue platform And system.
Background technique
With the development of artificial intelligent voice technology, more and more equipment, which are realized, can be operated phase by user speech The function that should be instructed.For example, user mentions " weather for inquiring tomorrow ", corresponding equipment can be to the weather of user feedback tomorrow How, so that the operation mode of user is easier.
But identical word often has the different meanings, so that with may in short correspond to different operations, example Such as when user mentions " playing " I is singer " ", the intention of user, which is likely to be, plays variety show " I is singer ", and also having can It can be the cross-talk " I is singer " for playing Yue Yunpeng, Sun Yue.It is directed to such case, it will usually confirm it is which is played to user A or shuffle it is therein any one.
In realizing process of the present invention, at least there are the following problems in the related technology for inventor's discovery:
Due to lacking effective automatic disambiguation mechanism, if only ambiguity occur just needs user's progress ambiguity confirmation, this Sample for interactive voice environment user experience by the destruction of bringing on a disaster property.If be only used only in the case where a variety of ambiguities Possibility it is highest that as a result, not needing to need ambiguity to disambiguate again due to there was only a result;It is equivalent to from technological layer In order to evade possible difficulty, the experience of certain customers is had lost, the scene disambiguated has been got around.But due to abandoning Some closely similar speech recognitions and in addition several semantic parsing results that may be present, this will lead to Intelligent voice dialog The overall effect of system is simultaneously declined with user experience.And randomly choose and play therein if any one, not necessarily play It is that user really wants to play " I is singer ".
Summary of the invention
It needs user's confirmation to destroy interactive voice environment bring at least solve excessive ambiguity in the prior art, is existing There is the problem of ambiguity disambiguation is unable to satisfy complexity higher usage scenario.
In a first aspect, the embodiment of the present invention provides a kind of voice dialogue processing method for voice dialogue platform, comprising:
According to speech recognition and the highest semantic results of n possibility for the voice data for understanding acquisition user's input;
As n > 1, field involved in each semantic results is determined, whether judge the corresponding semantic slot of each semantic results It is the key that semantic slot in the field;
To there are m semantic results of crucial semantic slot to increase to disambiguation candidate list, wherein m≤n;
As m > 1, automatic disambiguate is carried out to the disambiguation candidate list according to existing resource and obtains l semantic results, In, the existing resource includes that history context information, history disambiguation record, voice dialogue platform resource and/or customization disambiguate Rule base.
Second aspect, the embodiment of the present invention provide a kind of voice dialogue processing system for voice dialogue platform, comprising:
Semantic understanding obtains program module, for the n according to speech recognition and the voice data for understanding acquisition user's input Highest semantic results of possibility;
Crucial semanteme slot determines program module, for determining field involved in each semantic results, judging institute as n > 1 State whether the corresponding semantic slot of each semantic results is the key that semantic slot in the field;
It disambiguates candidate list and determines program module, for that will have m semantic results of crucial semantic slot to increase to disambiguation Candidate list, wherein m≤n;
Automatic disambiguation program module, for being disappeared automatically to the disambiguation candidate list according to existing resource as m > 1 Discrimination obtains l semantic results, wherein the existing resource includes history context information, history disambiguates record, voice dialogue is put down Taiwan investment source and/or customization disambiguate rule base.
The third aspect provides a kind of electronic equipment comprising: at least one processor, and with described at least one Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, institute It states instruction to be executed by least one described processor, so that at least one described processor is able to carry out any embodiment of the present invention The voice dialogue processing method for voice dialogue platform the step of.
Fourth aspect, the embodiment of the present invention provide a kind of storage medium, are stored thereon with computer program, and feature exists In the voice dialogue processing side for voice dialogue platform of realization any embodiment of the present invention when the program is executed by processor The step of method.
The beneficial effect of the embodiment of the present invention is: making full use of in semantic parsing result by dialogue ambiguity detection and includes Semantic slot position information, the semantic slot for being directed to different semantic domains sets different importance, only in crucial semantic slot When there is ambiguity, can just introduce automatic disambiguation mechanism, thus can automatic fitration fall semantic parsing bring vacation ambiguity.
Automatic disambiguation mechanism takes turns contextual information based on history more simultaneously and data service query result, engagement height can The automatic disambiguation rule base of customization can carry out certainly in the case where a variety of semantic parsing results include crucial semantic slot It is dynamic to eliminate invalid semantic results, by storage user's history selection record setting validity period, when being requested again in user's short time When identical content, ambiguity disambiguation module will read historical record automatically, avoid that user is allowed to carry out multiple ambiguity to same problem Selection.The effect of interactive voice is improved, while automatic disambiguate also meets the higher usage scenario of complexity.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of process for voice dialogue processing method for voice dialogue platform that one embodiment of the invention provides Figure;
Fig. 2 is a kind of structure for voice dialogue processing system for voice dialogue platform that one embodiment of the invention provides Schematic diagram.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
A kind of voice dialogue processing method for voice dialogue platform provided as shown in Figure 1 for one embodiment of the invention Flow chart, include the following steps:
S11: according to speech recognition and understand the highest semantic knot of n possibility for obtaining the voice data of user's input Fruit;
S12: as n > 1, determining field involved in each semantic results, judges the corresponding semantic slot of each semantic results It whether is the key that semantic slot in the field;
S13: will there are m semantic results of crucial semantic slot to increase to disambiguation candidate list, wherein m≤n;
S14: as m > 1, automatic disambiguate is carried out to the disambiguation candidate list according to existing resource and obtains l item semanteme knot Fruit, wherein the existing resource includes history context information, history disambiguation record, voice dialogue platform resource and/or customization Disambiguate rule base.
In the present embodiment, this method can be fitted in the equipment with interactive voice, for example, intelligent sound box, In the equipment such as smart phone.For example, user can be to intelligent sound box when user wants to play a segment of audio by intelligent sound box Direct voice input.
For step S11, by the voice data of user's input by ASR (Automatic Speech Recognition, Automatic speech recognition), and corresponding NLU (Natural Language Understanding, natural language understanding) Processing, thus obtain the highest ASR hypotheses of n possibility (ASR it is assumed that also referred to asnBest input or topn Input) and its corresponding semantic parsing result.
For step S12, field involved in each semantic results is determined, judge the corresponding semantic slot of each semantic results It whether is the key that semantic slot in the field, wherein the semantic slot of key here, which refers to, is in short being resolved to semantic knot When fruit, there is the slot position of major significance for some field.Since semanteme can be comprising very more information, if not making key The restriction of semantic slot, it will lead to the semantic parsing result of a large amount of inaccuracy.
Such as " music for playing five minutes Zhou Jielun ".
In the parsing of music field are as follows:
" singer "=" Zhou Jielun " " operation "=" broadcasting " " playing duration "=" five minutes "
And it can be resolved in radio station field:
" operation "=" broadcasting " " column "=" five minutes " " keyword "=" Zhou Jielun "
Although radio station field parsing and correctly because radio station field column name " many kinds of ", institute With when the different degree of semantic slot position is arranged, the significance level of " column " slot position compares the lower, " song relative to music field For hand " name, the requirement of the semantic slot of " column " in radio station field and the key for being unsatisfactory for ambiguity candidate item, and " singer " is music The semantic slot of the key in field, therefore the parsing of music field just has been retained, and the semantic parsing in radio station field is then direct Filtering.
" play small red cap " for another example
In the parsing of music field are as follows:
" operation "=" broadcasting " " song title "=" small red cap "
Parsing in story field are as follows:
" operation "=" broadcasting " " story name "=" small red cap "
Because " story name " is crucial semantic slot for story field, " song title " is also crucial for music field Semantic slot, therefore two semantic parsing results have passed through most preliminary ambiguity detection simultaneously.
It for step S13, " plays small red cap " in step s 12 and says, since two semantic parsing results pass through simultaneously Most preliminary ambiguity detection, the highest ASR of n=2 possibility is it is assumed that m=2 after preliminary ambiguity detection, so by 2 semantemes Parsing result has all been selected into the list of ambiguity candidate.At this point, m=n=2.
For step S14, if in step s 13, when determining m=1, preliminary disambiguation has obtained a semantic knot Fruit then directly executes corresponding instruction.
If in step s 13, determining m > 1, then, the disambiguation candidate list is disambiguated according to existing resource Processing.Automatic disambiguate can include relevant resource, customization according to the history Duolun contextual information in existing resource, data service Automatic disambiguation rule base, history disambiguate record disambiguated.To which the semantic results disambiguated in candidate list be disambiguated. When disambiguating an only surplus semantic results in candidate list, determine that user inputs the corresponding semantic results of voice, to carry out phase Corresponding operation.
It can be seen that by the embodiment and the language for including in semantic parsing result made full use of by dialogue ambiguity detection Adopted slot position information, the semantic slot for being directed to different semantic domains set different importance, only occur in crucial semantic slot When ambiguity, can just introduce automatic disambiguation mechanism, thus can automatic fitration fall semantic parsing bring vacation ambiguity.
As an implementation, in the present embodiment, the method also includes:
When according to have resource to the disambiguations candidate list carry out it is automatic disambiguate obtain l item above semantic results when, by The above semantic results of l item confirm to user feedback to user;
When user's input is instructed with the corresponding confirmation of the feedback, the determining voice data pair with user input The semantic results answered execute corresponding operation;
When the exceptional instructions of user's input, abnormal prompt information is fed back.
In the present embodiment, after automatic disambiguate, candidate list is disambiguated there are also when 1 or more semantic results, Xiang Yong Family feedback.
For example, " playing small red cap " for being mentioned in above-described embodiment.After automatic disambiguation is finished, ambiguity candidate list In need to disambiguate there are also two semantic results, then ambiguity detection module, which will be arranged, disambiguates flag information and is, and then calls and disappear Discrimination processing module disambiguates processing module according to detection module and TTS (the Text To that state flag bit return needs to broadcast is arranged Speech, from Text To Speech) " finding children stories and music for you, which you will listen " guidance user select, And enter listening state.
When the new wheel of user, which is input to, to be come, disambiguate processing module by judge that user inputs be " selection " or " executing new task " then directly jumps out disambiguation operation, executes new task if it is new task.
If user is selected, need to judge whether the answer of user is abnormal.If entering exception, enter Abnormal prompt process prompts user to reselect and be again introduced into listening state, if without departing from the language for selecting user Adopted result is as final semantic results and executes relevant operation.
What it is due to user's input is voice messaging, and when disambiguation module guides user, user probably can't be according to Prompt is replied, it is therefore desirable to further adjustment.
For example, when TTS content is " finding children stories and music for you, which you will listen "
User can reply:
" I wants to listen children stories ", " children stories ", " story ", " juvenile's story ", " music ", " song ", " first ", " second ", " front that ", " below that ", " not being story ", " not being music "
It is also possible to the abnormal saying such as " joke " " the 6th " can be said.
Meanwhile being had an impact to guarantee that disambiguation module will not be intended to switching to user, when user directly says similar " broadcast Put lustily water " new task when, disambiguation module can be jumped according to the semantic results of voice newly inputted.
When user says abnormal saying, in addition to generating abnormal T TS prompt, (such as " my unreadable, you are to listen Music or story? "), at this time other than maintenance is monitored, it also will record abnormal number, when frequency of abnormity is more than one When determining number, (such as twice), system will be prompted to user's " still not hearing, be said differently and have a try again " language similar in this way Sentence.
When waiting user's selection, prevent user from not answering or user has been moved off for a long time, disambiguating system can be to prison Listen one effective time of setting.When user does not answer in effective time, system can execute different according to preparatory configuration Operation, such as:
The highest NLU of default choice probability, and the TTS for prompting user's " will play story small red cap for you " similar.Or Person prompts user to be said differently and close monitoring.
Total process is as follows:
It can be seen that by the embodiment when there is " the true ambiguity " that cannot be distinguished from, it just can be by transferring to use Family is selected.It is directed to the Different Discourse that user may answer, provides the stabilization that corresponding execution method guarantees system Operation.
As an implementation, in the present embodiment, when the existing resource include at least history context information and/ Or when customization disambiguation rule base:
Inquiry and each language in the disambiguation candidate list in rule base are disambiguated in the history context information and/or customization The corresponding information of adopted result;
Semantic results each in the disambiguation candidate list are disambiguated according to the corresponding information.
In the present embodiment, continue by taking " I wants to listen small red cap " as an example, for example, user is in the first round talks with, it is clear It expresses " I wants to listen story ", when the second wheel expresses " I wants to listen small red cap ", at this point, will be that user automatically selects story neck The semantic results in domain.
It is the program names of ending, such as " story for playing spring " since there are some with domain name, is disappeared by customization Discrimination rule base will be confirmed whether the value with " story " ending (at this point, " song title "=" event in spring in crucial slot position Thing " meets the rule), if not meeting the semantic parsing result that will also abandon the condition of being unsatisfactory for automatically, " I thinks for another example Listen the story of kite ", if " kite " is unexpectedly resolved to song title by semantic meaning analysis module, since song title is not with " Story " ending, thus it is automatic disambiguate will automatic fitration fall the parsing of the mistake.
As an implementation, in the present embodiment, when the existing resource, which includes at least history, disambiguates record:
The voice data for inquiring user's input whether there is history in preset time range and disambiguate record;
When disambiguating record there are history, record is disambiguated according to the history and determines that the voice data of user's input is corresponding Semantic results.
In the present embodiment, disambiguating processing module automatically also can disambiguate record using the history that was recorded in the past, when with When family sends the identical request for needing to disambiguate again in a short time, the disambiguation result of last time is directly selected for user.
As an implementation, in the present embodiment, when the existing resource includes at least voice dialogue platform resource When:
Inquire the corresponding voice dialogue platform resource of each semantic results in the disambiguation candidate list;
The semantic results of no corresponding voice dialogue platform resource are disambiguated.
In the present embodiment, voice dialogue platform includes the multimedia resource that user wants inquiry or plays, such as Music field platform resource is library, and the platform resource in stock market field is story library, and specific resource is an audio file Either video file, such as song " heart is too soft " or story " small red cap " they are all a resources.
It is automatic to disambiguate since semanteme is not aware that whether the data service of voice dialogue platform includes relevant resource Processing module can also utilize semantic results combination resource searching, and the semantic results for not finding resource will be also disambiguated.
Such as above illustrate, user says " I wants to listen " I is singer " ", due to cross-talk data service provider also " I is singer " this collection cross-talk is not included, therefore the semantic parsing in cross-talk field is fallen in automatic disambiguation module meeting automatic fitration, keeps away Exempt from user after selecting cross-talk, reresents user " not finding cross-talk " I is singer " "
It can be seen that by the implementation method based on history takes turns contextual information more in automatic disambiguate and data service is looked into The automatic disambiguation rule base as a result, engagement height customizable is ask, can include crucial semantic in a variety of semantic parsing results In the case where slot, invalid semantic results are eliminated automatically, by storage user's history selection record setting validity period, work as user When requesting identical content in the short time again, ambiguity disambiguation module will read historical record automatically, avoid allowing user to same Problem carries out multiple ambiguity selection.
A kind of voice dialogue processing system for voice dialogue platform of one embodiment of the invention offer is provided Structural schematic diagram, the voice dialogue that the technical solution of the present embodiment is applicable to equipment for voice dialogue platform handles Method, which can be performed the voice dialogue processing method that voice dialogue platform is used for described in above-mentioned any embodiment, and match It sets in the terminal.
A kind of voice dialogue processing system for voice dialogue platform provided in this embodiment includes: that semantic understanding obtains Program module 11, crucial semanteme slot determine program module 12, disambiguate candidate list and determine program module 13 and automatic disambiguation program Module 14.
Wherein, semantic understanding obtains program module 11 and is used for according to speech recognition and understands the voice number for obtaining user and inputting According to the highest semantic results of n possibility;Crucial semanteme slot determines program module 12 for as n > 1, determining each semantic knot Field involved in fruit judges whether the corresponding semantic slot of each semantic results is the key that semantic slot in the field;Disappear Discrimination candidate list determines that program module 13 is used to have m semantic results of crucial semantic slot to increase to disambiguation candidate list, Wherein, m≤n;Automatic disambiguation program module 14 is for carrying out certainly the disambiguation candidate list according to existing resource as m > 1 Dynamic disambiguation obtains l semantic results, wherein the existing resource includes history context information, history disambiguation record, voice pair It talks about platform resource and/or customization disambiguates rule base.
Further, institute's systems approach further include: user's analysis program module is used for
When according to have resource to the disambiguations candidate list carry out it is automatic disambiguate obtain l item above semantic results when, by The above semantic results of l item confirm to user feedback to user;
When user's input is instructed with the corresponding confirmation of the feedback, the determining voice data pair with user input The semantic results answered execute corresponding operation;
When the exceptional instructions of user's input, abnormal prompt information is fed back.
Further, when the existing resource includes at least history context information and/or customization disambiguates rule base:
Inquiry and each language in the disambiguation candidate list in rule base are disambiguated in the history context information and/or customization The corresponding information of adopted result;
Semantic results each in the disambiguation candidate list are disambiguated according to the corresponding information.
Further, when the existing resource, which includes at least history, disambiguates record:
The voice data for inquiring user's input whether there is history in preset time range and disambiguate record;
When disambiguating record there are history, record is disambiguated according to the history and determines that the voice data of user's input is corresponding Semantic results.
Further, when the existing resource includes at least voice dialogue platform resource:
Inquire the corresponding voice dialogue platform resource of each semantic results in the disambiguation candidate list;
The semantic results of no corresponding voice dialogue platform resource are disambiguated.
The embodiment of the invention also provides a kind of nonvolatile computer storage media, computer storage medium is stored with meter Calculation machine executable instruction, the computer executable instructions can be performed in above-mentioned any means embodiment and are used for voice dialogue platform Voice dialogue processing method;
As an implementation, nonvolatile computer storage media of the invention is stored with the executable finger of computer It enables, computer executable instructions setting are as follows:
According to speech recognition and the highest semantic results of n possibility for the voice data for understanding acquisition user's input;
As n > 1, field involved in each semantic results is determined, whether judge the corresponding semantic slot of each semantic results It is the key that semantic slot in the field;
To there are m semantic results of crucial semantic slot to increase to disambiguation candidate list, wherein m≤n;
As m > 1, automatic disambiguate is carried out to the disambiguation candidate list according to existing resource and obtains l semantic results, In, the existing resource includes that history context information, history disambiguation record, voice dialogue platform resource and/or customization disambiguate Rule base.
As a kind of non-volatile computer readable storage medium storing program for executing, it can be used for storing non-volatile software program, non-volatile Property computer executable program and module, such as the corresponding program instruction/mould of the method for the test software in the embodiment of the present invention Block.One or more program instruction is stored in non-volatile computer readable storage medium storing program for executing, when being executed by a processor, is held The voice dialogue processing method for voice dialogue platform in the above-mentioned any means embodiment of row.
Non-volatile computer readable storage medium storing program for executing may include storing program area and storage data area, wherein storage journey It sequence area can application program required for storage program area, at least one function;Storage data area can be stored according to test software Device use created data etc..In addition, non-volatile computer readable storage medium storing program for executing may include that high speed is deposited at random Access to memory, can also include nonvolatile memory, a for example, at least disk memory, flush memory device or other are non- Volatile solid-state part.In some embodiments, it includes relative to place that non-volatile computer readable storage medium storing program for executing is optional The remotely located memory of device is managed, these remote memories can be by being connected to the network to the device of test software.Above-mentioned network Example include but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
The embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor, and with described at least one The memory of a processor communication connection, wherein the memory is stored with the finger that can be executed by least one described processor Enable, described instruction executed by least one described processor so that at least one described processor be able to carry out it is of the invention any The step of voice dialogue processing method for voice dialogue platform of embodiment.
The client of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) other electronic devices having data processing function.
Herein, relational terms such as first and second and the like be used merely to by an entity or operation with it is another One entity or operation distinguish, and without necessarily requiring or implying between these entities or operation, there are any this reality Relationship or sequence.Moreover, the terms "include", "comprise", include not only those elements, but also including being not explicitly listed Other element, or further include for elements inherent to such a process, method, article, or device.Do not limiting more In the case where system, the element that is limited by sentence " including ... ", it is not excluded that including process, method, the article of the element Or there is also other identical elements in equipment.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of voice dialogue processing method for voice dialogue platform, comprising:
According to speech recognition and the highest semantic results of n possibility for the voice data for understanding acquisition user's input;
As n > 1, field involved in each semantic results is determined, judge whether the corresponding semantic slot of each semantic results is institute State the semantic slot of key in field;
To there are m semantic results of crucial semantic slot to increase to disambiguation candidate list, wherein m≤n;
As m > 1, automatic disambiguate is carried out to the disambiguation candidate list according to existing resource and obtains l semantic results, wherein institute Stating existing resource includes that history context information, history disambiguation record, voice dialogue platform resource and/or customization disambiguate rule Library.
2. according to the method described in claim 1, wherein, the method also includes:
When according to having resource and carrying out automatic disambiguation to the disambiguations candidate list and obtain l item above semantic results, by the l The above semantic results of item confirm to user feedback to user;
When user's input is instructed with the corresponding confirmation of the feedback, determination is corresponding with the voice data that the user inputs Semantic results execute corresponding operation;
When the exceptional instructions of user's input, abnormal prompt information is fed back.
3. according to the method described in claim 1, wherein, when the existing resource include at least history context information and/or When customization disambiguates rule base:
It disambiguates to inquire in rule base in the history context information and/or customization and be tied with semanteme each in the disambiguation candidate list The corresponding information of fruit;
Semantic results each in the disambiguation candidate list are disambiguated according to the corresponding information.
4. according to the method described in claim 1, wherein, when the existing resource, which includes at least history, disambiguates record:
The voice data for inquiring user's input whether there is history in preset time range and disambiguate record;
When disambiguating record there are history, is disambiguated according to the history and record the corresponding semanteme of voice data for determining user's input As a result.
5. according to the method described in claim 1, wherein, when the existing resource includes at least voice dialogue platform resource:
Inquire the corresponding voice dialogue platform resource of each semantic results in the disambiguation candidate list;
The semantic results of no corresponding voice dialogue platform resource are disambiguated.
6. a kind of voice dialogue processing system for voice dialogue platform, comprising:
Semantic understanding obtains program module, can for obtaining the n item of voice data of user's input according to speech recognition and understanding It can the highest semantic results of property;
Crucial semanteme slot determines program module, for determining field involved in each semantic results as n > 1, judges described each Whether the corresponding semantic slot of semantic results is the key that semantic slot in the field;
It disambiguates candidate list and determines program module, it is candidate for that will have m semantic results of crucial semantic slot to increase to disambiguation List, wherein m≤n;
Automatic disambiguation program module, for being disambiguated automatically to disambiguation candidate list progress according to existing resource as m > 1 To l semantic results, wherein the existing resource includes history context information, history disambiguates record, voice dialogue platform provides Source and/or customization disambiguate rule base.
7. system according to claim 6, wherein institute's systems approach further include: user's analysis program module is used for
When according to having resource and carrying out automatic disambiguation to the disambiguations candidate list and obtain l item above semantic results, by the l The above semantic results of item confirm to user feedback to user;
When user's input is instructed with the corresponding confirmation of the feedback, determination is corresponding with the voice data that the user inputs Semantic results execute corresponding operation;
When the exceptional instructions of user's input, abnormal prompt information is fed back.
8. system according to claim 6, wherein when the existing resource include at least history context information and/or When customization disambiguates rule base:
It disambiguates to inquire in rule base in the history context information and/or customization and be tied with semanteme each in the disambiguation candidate list The corresponding information of fruit;
Semantic results each in the disambiguation candidate list are disambiguated according to the corresponding information.
9. system according to claim 6, wherein when the existing resource, which includes at least history, disambiguates record:
The voice data for inquiring user's input whether there is history in preset time range and disambiguate record;
When disambiguating record there are history, is disambiguated according to the history and record the corresponding semanteme of voice data for determining user's input As a result.
10. system according to claim 6, wherein when the existing resource includes at least voice dialogue platform resource:
Inquire the corresponding voice dialogue platform resource of each semantic results in the disambiguation candidate list;
The semantic results of no corresponding voice dialogue platform resource are disambiguated.
CN201810835994.7A 2018-07-26 2018-07-26 Voice conversation processing method and system for voice conversation platform Active CN108962233B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810835994.7A CN108962233B (en) 2018-07-26 2018-07-26 Voice conversation processing method and system for voice conversation platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810835994.7A CN108962233B (en) 2018-07-26 2018-07-26 Voice conversation processing method and system for voice conversation platform

Publications (2)

Publication Number Publication Date
CN108962233A true CN108962233A (en) 2018-12-07
CN108962233B CN108962233B (en) 2020-11-17

Family

ID=64463950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810835994.7A Active CN108962233B (en) 2018-07-26 2018-07-26 Voice conversation processing method and system for voice conversation platform

Country Status (1)

Country Link
CN (1) CN108962233B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109462546A (en) * 2018-12-28 2019-03-12 苏州思必驰信息科技有限公司 A kind of voice dialogue history message recording method, apparatus and system
CN110570867A (en) * 2019-09-12 2019-12-13 安信通科技(澳门)有限公司 Voice processing method and system for locally added corpus
CN110705267A (en) * 2019-09-29 2020-01-17 百度在线网络技术(北京)有限公司 Semantic parsing method, semantic parsing device and storage medium
CN110808051A (en) * 2019-10-30 2020-02-18 腾讯科技(深圳)有限公司 Skill selection method and related device
CN111125346A (en) * 2019-12-26 2020-05-08 苏州思必驰信息科技有限公司 Semantic resource updating method and system
CN111274819A (en) * 2020-02-13 2020-06-12 北京声智科技有限公司 Resource acquisition method and device
CN111831795A (en) * 2019-04-11 2020-10-27 北京猎户星空科技有限公司 Multi-turn conversation processing method and device, electronic equipment and storage medium
CN112148847A (en) * 2020-08-27 2020-12-29 出门问问(苏州)信息科技有限公司 Voice information processing method and device
CN112486844A (en) * 2020-12-18 2021-03-12 苏州思必驰信息科技有限公司 Data increment testing method and system for resource type data
CN112634888A (en) * 2020-12-11 2021-04-09 广州橙行智动汽车科技有限公司 Voice interaction method, server, voice interaction system and readable storage medium
CN112685535A (en) * 2020-12-25 2021-04-20 广州橙行智动汽车科技有限公司 Voice interaction method, server, voice interaction system and storage medium
CN113918701A (en) * 2021-10-20 2022-01-11 北京亿信华辰软件有限责任公司 Billboard display method and device
CN115019787A (en) * 2022-06-02 2022-09-06 中国第一汽车股份有限公司 Interactive homophonic and heteronym word disambiguation method, system, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120059658A1 (en) * 2010-09-08 2012-03-08 Nuance Communications, Inc. Methods and apparatus for performing an internet search
US20140136212A1 (en) * 2012-11-14 2014-05-15 Electronics And Telecommunications Research Institute Spoken dialog system based on dual dialog management using hierarchical dialog task library
CN104299623A (en) * 2013-07-15 2015-01-21 国际商业机器公司 Automated confirmation and disambiguation modules in voice applications
CN106228983A (en) * 2016-08-23 2016-12-14 北京谛听机器人科技有限公司 Scene process method and system during a kind of man-machine natural language is mutual
US20170228366A1 (en) * 2016-02-05 2017-08-10 Adobe Systems Incorporated Rule-based dialog state tracking
CN107785018A (en) * 2016-08-31 2018-03-09 科大讯飞股份有限公司 More wheel interaction semantics understanding methods and device
CN108231080A (en) * 2018-01-05 2018-06-29 广州蓝豹智能科技有限公司 Voice method for pushing, device, smart machine and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120059658A1 (en) * 2010-09-08 2012-03-08 Nuance Communications, Inc. Methods and apparatus for performing an internet search
US20140136212A1 (en) * 2012-11-14 2014-05-15 Electronics And Telecommunications Research Institute Spoken dialog system based on dual dialog management using hierarchical dialog task library
CN104299623A (en) * 2013-07-15 2015-01-21 国际商业机器公司 Automated confirmation and disambiguation modules in voice applications
US20170228366A1 (en) * 2016-02-05 2017-08-10 Adobe Systems Incorporated Rule-based dialog state tracking
CN106228983A (en) * 2016-08-23 2016-12-14 北京谛听机器人科技有限公司 Scene process method and system during a kind of man-machine natural language is mutual
CN107785018A (en) * 2016-08-31 2018-03-09 科大讯飞股份有限公司 More wheel interaction semantics understanding methods and device
CN108231080A (en) * 2018-01-05 2018-06-29 广州蓝豹智能科技有限公司 Voice method for pushing, device, smart machine and storage medium

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109462546A (en) * 2018-12-28 2019-03-12 苏州思必驰信息科技有限公司 A kind of voice dialogue history message recording method, apparatus and system
CN111831795A (en) * 2019-04-11 2020-10-27 北京猎户星空科技有限公司 Multi-turn conversation processing method and device, electronic equipment and storage medium
CN111831795B (en) * 2019-04-11 2023-10-27 北京猎户星空科技有限公司 Multi-round dialogue processing method and device, electronic equipment and storage medium
CN110570867A (en) * 2019-09-12 2019-12-13 安信通科技(澳门)有限公司 Voice processing method and system for locally added corpus
CN110705267A (en) * 2019-09-29 2020-01-17 百度在线网络技术(北京)有限公司 Semantic parsing method, semantic parsing device and storage medium
CN110808051A (en) * 2019-10-30 2020-02-18 腾讯科技(深圳)有限公司 Skill selection method and related device
CN111125346B (en) * 2019-12-26 2022-07-08 思必驰科技股份有限公司 Semantic resource updating method and system
CN111125346A (en) * 2019-12-26 2020-05-08 苏州思必驰信息科技有限公司 Semantic resource updating method and system
CN111274819A (en) * 2020-02-13 2020-06-12 北京声智科技有限公司 Resource acquisition method and device
CN112148847A (en) * 2020-08-27 2020-12-29 出门问问(苏州)信息科技有限公司 Voice information processing method and device
CN112148847B (en) * 2020-08-27 2024-03-12 出门问问创新科技有限公司 Voice information processing method and device
CN112634888A (en) * 2020-12-11 2021-04-09 广州橙行智动汽车科技有限公司 Voice interaction method, server, voice interaction system and readable storage medium
CN112486844A (en) * 2020-12-18 2021-03-12 苏州思必驰信息科技有限公司 Data increment testing method and system for resource type data
CN112486844B (en) * 2020-12-18 2022-07-08 思必驰科技股份有限公司 Data increment testing method and system for resource type data
WO2022135493A1 (en) * 2020-12-25 2022-06-30 广州橙行智动汽车科技有限公司 Voice interaction method, server, voice interaction system, and storage medium
CN112685535A (en) * 2020-12-25 2021-04-20 广州橙行智动汽车科技有限公司 Voice interaction method, server, voice interaction system and storage medium
CN113918701B (en) * 2021-10-20 2022-04-15 北京亿信华辰软件有限责任公司 Billboard display method and device
CN113918701A (en) * 2021-10-20 2022-01-11 北京亿信华辰软件有限责任公司 Billboard display method and device
CN115019787A (en) * 2022-06-02 2022-09-06 中国第一汽车股份有限公司 Interactive homophonic and heteronym word disambiguation method, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN108962233B (en) 2020-11-17

Similar Documents

Publication Publication Date Title
CN108962233A (en) Voice dialogue processing method and system for voice dialogue platform
US11398236B2 (en) Intent-specific automatic speech recognition result generation
US11520471B1 (en) Systems and methods for identifying a set of characters in a media file
US11437041B1 (en) Speech interface device with caching component
US11915707B1 (en) Outcome-oriented dialogs on a speech recognition platform
JP6588637B2 (en) Learning personalized entity pronunciation
US9495956B2 (en) Dealing with switch latency in speech recognition
US10917758B1 (en) Voice-based messaging
US7143037B1 (en) Spelling words using an arbitrary phonetic alphabet
US8635243B2 (en) Sending a communications header with voice recording to send metadata for use in speech recognition, formatting, and search mobile search application
US8949266B2 (en) Multiple web-based content category searching in mobile search application
US8972260B2 (en) Speech recognition using multiple language models
US20110153322A1 (en) Dialog management system and method for processing information-seeking dialogue
CN106796787A (en) The linguistic context carried out using preceding dialog behavior in natural language processing is explained
US20110054900A1 (en) Hybrid command and control between resident and remote speech recognition facilities in a mobile voice-to-speech application
US10860289B2 (en) Flexible voice-based information retrieval system for virtual assistant
US20110054894A1 (en) Speech recognition through the collection of contact information in mobile dictation application
US20110054898A1 (en) Multiple web-based content search user interface in mobile search application
US9922650B1 (en) Intent-specific automatic speech recognition result generation
KR20200130352A (en) Voice wake-up method and apparatus
US9311914B2 (en) Method and apparatus for enhanced phonetic indexing and search
CN110784768B (en) Multimedia resource playing method, storage medium and electronic equipment
CN109979450B (en) Information processing method and device and electronic equipment
JP7342286B2 (en) Voice function jump method, electronic equipment and storage medium for human-machine interaction
CN111753061A (en) Multi-turn conversation processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee before: AI SPEECH Co.,Ltd.

CP01 Change in the name or title of a patent holder
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Speech Conversation Processing Method and System for a Speech Conversation Platform

Effective date of registration: 20230726

Granted publication date: 20201117

Pledgee: CITIC Bank Limited by Share Ltd. Suzhou branch

Pledgor: Sipic Technology Co.,Ltd.

Registration number: Y2023980049433

PE01 Entry into force of the registration of the contract for pledge of patent right