CN108962233B - Voice conversation processing method and system for voice conversation platform - Google Patents

Voice conversation processing method and system for voice conversation platform Download PDF

Info

Publication number
CN108962233B
CN108962233B CN201810835994.7A CN201810835994A CN108962233B CN 108962233 B CN108962233 B CN 108962233B CN 201810835994 A CN201810835994 A CN 201810835994A CN 108962233 B CN108962233 B CN 108962233B
Authority
CN
China
Prior art keywords
semantic
disambiguation
voice
user
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810835994.7A
Other languages
Chinese (zh)
Other versions
CN108962233A (en
Inventor
林永楷
周伟达
樊帅
李春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN201810835994.7A priority Critical patent/CN108962233B/en
Publication of CN108962233A publication Critical patent/CN108962233A/en
Application granted granted Critical
Publication of CN108962233B publication Critical patent/CN108962233B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a voice conversation processing method for a voice conversation platform. The method comprises the following steps: acquiring n semantic results with highest possibility of voice data input by a user according to voice recognition and understanding; when n is greater than 1, determining the field related to each semantic result, and judging a key semantic slot corresponding to each semantic result; adding m semantic results with key semantic slots to a disambiguation candidate list; and when m is greater than 1, automatically disambiguating the disambiguation candidate list according to the existing resources to obtain l semantic results, wherein the existing resources comprise historical context information, historical disambiguation records, voice conversation platform resources and/or a customized disambiguation rule base. The embodiment of the invention also provides a voice conversation processing system for the voice conversation platform. According to the embodiment of the invention, different importance is set for semantic slots in different semantic fields, so that false ambiguity caused by semantic analysis is automatically filtered out, and the voice interaction effect is further improved through automatic disambiguation.

Description

Voice conversation processing method and system for voice conversation platform
Technical Field
The present invention relates to the field of voice dialog, and in particular, to a voice dialog processing method and system for a voice dialog platform.
Background
With the development of artificial intelligence voice technology, more and more devices realize the function of operating corresponding instructions through the voice of a user. For example, when the user says "inquire about the weather in tomorrow", the corresponding device can feed back how the weather is tomorrow to the user, so that the operation mode of the user is simpler and more convenient.
However, the same words often have different meanings, so that the same words may correspond to different operations, for example, when the user says "play my singer", the intention of the user may be to play an integrated program, "i am singer", or may be to play a yunynpeng, sunpleasing meeting, "i am singer". For this case, the user is usually confirmed which one is played or either one is randomly played.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:
due to the lack of an effective automatic disambiguation mechanism, the user is required to confirm the ambiguity if only the ambiguity occurs, and thus the user experience of the voice interaction environment will be disastrous. If only the result with the highest possibility is used under the condition of multiple ambiguities, the ambiguity resolution is not needed because only one result exists; equivalently, in order to avoid possible difficulties from the technical level, the experience of part of users is lost, and a scene needing disambiguation is bypassed. But the overall effect of the intelligent voice dialogue system is reduced simultaneously with the user experience because some very similar voice recognition and other possible semantic parsing results are abandoned. And randomly selecting any one of them to be played, it is not always the "my singer" that the user really wants to play.
Disclosure of Invention
The method and the device aim to at least solve the problems that in the prior art, too much ambiguity needs to be confirmed by a user to damage a voice interaction environment, and the existing ambiguity disambiguation cannot meet a use scene with higher complexity.
In a first aspect, an embodiment of the present invention provides a voice dialog processing method for a voice dialog platform, including:
acquiring n semantic results with highest possibility of voice data input by a user according to voice recognition and understanding;
when n is greater than 1, determining the field related to each semantic result, and judging whether the semantic slot corresponding to each semantic result is a key semantic slot in the field;
adding m semantic results with key semantic slots to a disambiguation candidate list, wherein m is less than or equal to n;
and when m is greater than 1, automatically disambiguating the disambiguation candidate list according to the existing resources to obtain l semantic results, wherein the existing resources comprise historical context information, historical disambiguation records, voice conversation platform resources and/or a customized disambiguation rule base.
In a second aspect, an embodiment of the present invention provides a voice dialog processing system for a voice dialog platform, including:
the semantic understanding acquisition program module is used for acquiring n semantic results with highest possibility of voice data input by a user according to voice recognition and understanding;
the key semantic groove determining program module is used for determining the field related to each semantic result when n is greater than 1, and judging whether the semantic groove corresponding to each semantic result is a key semantic groove in the field;
a disambiguation candidate list determining program module for adding m semantic results having a key semantic slot to a disambiguation candidate list, wherein m is less than or equal to n;
and the automatic disambiguation program module is used for automatically disambiguating the disambiguation candidate list according to the existing resources to obtain a semantic result I when m is greater than 1, wherein the existing resources comprise historical context information, historical disambiguation records, voice conversation platform resources and/or a customized disambiguation rule base.
In a third aspect, an electronic device is provided, comprising: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the method for voice dialog processing for a voice dialog platform of any of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, implement the steps of the voice conversation processing method for a voice conversation platform according to any embodiment of the present invention.
The embodiment of the invention has the beneficial effects that: the semantic slot position information contained in the semantic parsing result is fully utilized through the dialogue ambiguity detection, different importance is set for semantic slots in different semantic fields, and an automatic disambiguation mechanism is introduced only when the key semantic slot is ambiguous, so that the false ambiguity caused by the semantic parsing can be automatically filtered.
Meanwhile, an automatic disambiguation mechanism is matched with a highly customizable automatic disambiguation rule base based on historical multi-round context information and data service query results, invalid semantic results can be automatically eliminated under the condition that various semantic analysis results all contain key semantic slots, an expiration date is set by storing user historical selection records, and when a user requests the same content again in a short time, an ambiguity disambiguation module automatically reads the historical records, so that the user is prevented from carrying out multiple ambiguity selections on the same problem. The effect of voice interaction is improved, and meanwhile, automatic disambiguation also meets the use scene with higher complexity.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a voice conversation processing method for a voice conversation platform according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a voice dialog processing system for a voice dialog platform according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a voice dialog processing method for a voice dialog platform according to an embodiment of the present invention, which includes the following steps:
s11: acquiring n semantic results with highest possibility of voice data input by a user according to voice recognition and understanding;
s12: when n is greater than 1, determining the field related to each semantic result, and judging whether the semantic slot corresponding to each semantic result is a key semantic slot in the field;
s13: adding m semantic results with key semantic slots to a disambiguation candidate list, wherein m is less than or equal to n;
s14: and when m is greater than 1, automatically disambiguating the disambiguation candidate list according to the existing resources to obtain l semantic results, wherein the existing resources comprise historical context information, historical disambiguation records, voice conversation platform resources and/or a customized disambiguation rule base.
In this embodiment, the method may be adapted to a device with voice interaction, for example, a smart speaker, a smart phone, and the like. For example, when a user wants to play a piece of audio through the smart speaker, the user can directly input voice to the smart speaker.
For step S11, the Speech data input by the user is subjected to ASR (Automatic Speech Recognition) and NLU (Natural Language Understanding) processing corresponding to the ASR, so as to obtain n ASR hyptheses (ASR hypotheses, also called Natural Language Understanding) with the highest possibilitynbest input or topnInput) and its corresponding semantic parsing results.
In step S12, determining the domain to which each semantic result relates, and determining whether the semantic slot corresponding to each semantic result is a key semantic slot in the domain, where a key semantic slot is a slot having a primary meaning for a certain domain when a sentence is parsed into a semantic result. Since semantics can contain very much information, without defining a key semantic slot, a large number of inaccurate semantic parsing results will result.
Such as "playing five minutes of fringen's music".
The resolution in the music domain is:
"singer" - "Zhonglun" - "operation" - "play duration" - "five minutes"
And in the radio domain will be resolved into:
"operate" ═ play "-" column "-" five minutes "-" keyword "-" Zhoujilun "
Although the parsing in the radio field is also correct, because the names of the columns in the radio field are of various types, the importance degree of the column slot is relatively low when the importance degree of the semantic slot is set, and compared with the name of the singer in the music field, the columns in the radio field do not meet the requirement of the key semantic slot of the ambiguity candidate, and the singer is the key semantic slot in the music field, so the parsing in the music field is reserved, and the semantic parsing in the radio field is directly filtered.
For another example, play a little red cap "
The resolution in the music domain is:
"operate" as "play" and "song name" as "little red cap"
The resolution in the story domain is:
"operate" as "play" and "story name" as "little red cap"
Because the "story name" is a key semantic slot for the story field and the "song name" is a key semantic slot for the music field, the two semantic parsing results pass the ambiguity detection of the initial step at the same time.
For step S13, in step S12, saying "play small red hat", since two semantic parsing results pass ambiguity detection in the initial step at the same time, n is 2 ASR hypotheses with the highest probability, and m is 2 after preliminary ambiguity detection, 2 semantic parsing results are all selected into the list of ambiguity candidates. In this case, m is 2.
For step S14, if the preliminary disambiguation has resulted in a semantic result when the determined m is 1 in step S13, the corresponding instruction is directly executed.
If m >1 is determined in step S13, the disambiguation candidate list is processed according to the existing resources. The automatic disambiguation can be performed according to historical Toronto context information in the existing resources, data services including related resources, a customized automatic disambiguation rule base and historical disambiguation records. Thereby disambiguating the semantic results in the disambiguation candidate list. And when only one semantic result is left in the disambiguation candidate list, determining the semantic result corresponding to the voice input by the user, and performing corresponding operation.
According to the embodiment, the semantic slot position information contained in the semantic parsing result is fully utilized through the dialogue ambiguity detection, different importance is set for the semantic slots in different semantic fields, and only when the key semantic slot is ambiguous, an automatic disambiguation mechanism is introduced, so that the false ambiguity caused by the semantic parsing can be automatically filtered.
As an implementation manner, in this embodiment, the method further includes:
when the disambiguation candidate list is automatically disambiguated according to the existing resources to obtain more than l semantic results, feeding the more than l semantic results back to the user for the user to confirm;
when a user inputs a confirmation instruction corresponding to the feedback, determining a semantic result corresponding to the voice data input by the user, and executing corresponding operation;
and when the user inputs an abnormal instruction, feeding back abnormal prompt information.
In the present embodiment, after automatic disambiguation, if the disambiguation candidate list has more than 1 semantic result, feedback is given to the user.
For example, the above embodiments refer to "play a small red cap". After the automatic disambiguation is finished, if two semantic results in the ambiguity candidate list need disambiguation, the ambiguity detection module sets disambiguation flag information as "To find children's story and music" and "To listen To" To guide the user To select and enter a monitoring state, and then calls the disambiguation processing module, and the disambiguation processing module returns TTS (Text To Speech, from Text To Speech) "needing To be broadcasted according To the status flag bit set by the detection module.
When a new round of input comes, the disambiguation processing module judges whether the input of the user is 'select' or 'execute a new task', and if the input of the user is the new task, the disambiguation processing module directly jumps out of the disambiguation operation to execute the new task.
If the user makes a selection, it is necessary to determine whether the user's answer is abnormal. And if the abnormal state is detected, entering an abnormal prompt flow to prompt the user to reselect and enter the monitoring state again, and if the abnormal state is not detected, taking the semantic result selected by the user as a final semantic result and executing related operation.
Since the user is entering voice information, it is likely that the user will not reply as prompted when the disambiguation module guides the user, and further adjustments are needed.
For example, when the TTS content is "for you to find the children's story and music, which you want to listen to"
The user can reply with:
"My wants to hear children's story", "children's story", "music", "song", "first", "second", "front", "rear", "not story", "not music"
There is also the possibility of saying an unusual utterance such as "joke", "sixth", etc.
Meanwhile, in order to ensure that the disambiguation module does not influence the intention switching of the user, when the user directly speaks a new task like 'forgetting to play', the disambiguation module jumps according to the semantic result of the newly input voice.
When the user speaks an abnormal utterance, in addition to generating an abnormal TTS prompt (e.g., "i do not hear understand, are you going to listen to music or a story"), the number of exceptions is recorded, and when the number of exceptions exceeds a certain number of times (e.g., twice), the system prompts the user to "also do not hear, or try again" such similar sentences.
While waiting for the user to select, the disambiguation system sets an effective duration for the monitoring, preventing the user from not answering for a long time or the user has left. When the user does not answer within the validity period, the system may perform different operations according to a predetermined configuration, for example:
the NLU with the highest probability is selected by default and the user is prompted to "just about to play a small red cap of the story for you" a similar TTS. Or prompt the user to change the speech and turn off the monitoring.
The overall flow is as follows:
Figure BDA0001744510760000071
it can be seen from this embodiment that the selection by the user is only made by interaction when an indistinguishable "true ambiguity" occurs. Corresponding execution methods are provided for different utterances which the user may answer, so as to ensure stable operation of the system.
As an embodiment, in this embodiment, when the existing resource at least includes the historical context information and/or the customized disambiguation rule base:
querying the historical context information and/or the customized disambiguation rule base for information corresponding to each semantic result in the disambiguation candidate list;
and disambiguating each semantic result in the disambiguation candidate list according to the corresponding information.
In this embodiment, continuing to take "i want to listen to a small red cap" as an example, for example, when the user explicitly expresses "i want to listen to a story" in the first round of conversation and expresses "i want to listen to a small red cap" in the second round, the semantic result of the story field will be automatically selected for the user at this time.
Since there are some program names ending in the domain name, such as "spring story played", it will be confirmed whether there is a value ending in "story" in the key slot by customizing the disambiguation rule base (at this time, "song name" satisfies the rule for "spring story"), if not, the semantic resolution result that does not satisfy the condition will be automatically discarded, such as "story that my want to listen to kite", if the semantic resolution module accidentally resolves "kite" into song name, since song name does not end in "story", the automatic disambiguation will automatically filter out the incorrect resolution.
As an implementation manner, in this embodiment, when the existing resource at least includes the history disambiguation record:
inquiring whether the voice data input by the user has a historical disambiguation record within a preset time range;
and when the historical disambiguation record exists, determining a semantic result corresponding to the voice data input by the user according to the historical disambiguation record.
In this embodiment, the auto-disambiguation processing module also uses the previously recorded historical disambiguation record to directly select the last disambiguation result for the user when the user sends the same request for disambiguation again in a short time.
As an implementation manner, in this embodiment, when the existing resources at least include the voice dialog platform resource:
querying voice conversation platform resources corresponding to each semantic result in the disambiguation candidate list;
disambiguating semantic results that do not have a corresponding voice dialog platform resource.
In this embodiment, the voice conversation platform includes a multimedia resource that the user wants to query or play, for example, the platform resource in the music field is a song library, the platform resource in the stock market field is a story library, and the specific resource is an audio file or a video file, for example, a song "heart is too soft" or a story "small red hat" is a resource.
Since semantics do not know whether the data services of the voice dialog platform contain relevant resources, the auto-disambiguation processing module will also utilize semantic results in conjunction with resource searches, and will also disambiguate semantic results for which no resources are found.
For example, as illustrated above, the user says "i want to hear" my is singer ", because the audio data service provider has not included the audio collection of" i are singers ", the automatic disambiguation module will automatically filter out the semantic resolution in the audio field, and avoid prompting the user" i do not find audio "after selecting audio"
According to the implementation method, based on historical multi-round context information and data service query results in automatic disambiguation, a highly customizable automatic disambiguation rule base is matched, invalid semantic results can be automatically eliminated under the condition that various semantic analysis results all contain key semantic slots, the validity period is set by storing the user history selection records, and when the user requests the same content again in a short time, the ambiguity disambiguation module automatically reads the history records, so that the user is prevented from carrying out ambiguity selection on the same problem for multiple times.
Fig. 2 is a schematic structural diagram of a voice dialog processing system for a voice dialog platform according to an embodiment of the present invention, and the technical solution of this embodiment is applicable to a voice dialog processing method for a voice dialog platform of a device, and the system can execute the voice dialog processing method for the voice dialog platform according to any of the above embodiments and is configured in a terminal.
The embodiment provides a voice dialogue processing system for a voice dialogue platform, which comprises: a semantic understanding acquisition program module 11, a key semantic slot determination program module 12, a disambiguation candidate list determination program module 13 and an automatic disambiguation program module 14.
The semantic understanding acquiring program module 11 is configured to acquire n semantic results with the highest possibility of the voice data input by the user according to voice recognition and understanding; the key semantic groove determining program module 12 is configured to determine, when n >1, a field to which each semantic result relates, and determine whether a semantic groove corresponding to each semantic result is a key semantic groove in the field; the disambiguation candidate list determining program module 13 is configured to add m semantic results having key semantic slots to the disambiguation candidate list, where m is ≦ n; the automatic disambiguation program module 14 is configured to, when m >1, automatically disambiguate the disambiguation candidate list according to existing resources to obtain l semantic results, where the existing resources include historical context information, historical disambiguation records, voice dialog platform resources, and/or a customized disambiguation rule base.
Further, the system method further comprises: user parser module for
When the disambiguation candidate list is automatically disambiguated according to the existing resources to obtain more than l semantic results, feeding the more than l semantic results back to the user for the user to confirm;
when a user inputs a confirmation instruction corresponding to the feedback, determining a semantic result corresponding to the voice data input by the user, and executing corresponding operation;
and when the user inputs an abnormal instruction, feeding back abnormal prompt information.
Further, when the existing resource includes at least historical context information and/or a custom disambiguation rule base:
querying the historical context information and/or the customized disambiguation rule base for information corresponding to each semantic result in the disambiguation candidate list;
and disambiguating each semantic result in the disambiguation candidate list according to the corresponding information.
Further, when the existing resource includes at least a historical disambiguation record:
inquiring whether the voice data input by the user has a historical disambiguation record within a preset time range;
and when the historical disambiguation record exists, determining a semantic result corresponding to the voice data input by the user according to the historical disambiguation record.
Further, when the existing resources include at least voice dialog platform resources:
querying voice conversation platform resources corresponding to each semantic result in the disambiguation candidate list;
disambiguating semantic results that do not have a corresponding voice dialog platform resource.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the voice conversation processing method for the voice conversation platform in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
acquiring n semantic results with highest possibility of voice data input by a user according to voice recognition and understanding;
when n is greater than 1, determining the field related to each semantic result, and judging whether the semantic slot corresponding to each semantic result is a key semantic slot in the field;
adding m semantic results with key semantic slots to a disambiguation candidate list, wherein m is less than or equal to n;
and when m is greater than 1, automatically disambiguating the disambiguation candidate list according to the existing resources to obtain l semantic results, wherein the existing resources comprise historical context information, historical disambiguation records, voice conversation platform resources and/or a customized disambiguation rule base.
As a non-volatile computer readable storage medium, may be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the methods of testing software in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform a voice dialog processing method for a voice dialog platform in any of the method embodiments described above.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a device of test software, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the means for testing software over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the method for voice dialog processing for a voice dialog platform of any of the embodiments of the present invention.
The client of the embodiment of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.
(4) Other electronic devices with data processing capabilities.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A voice dialog processing method for a voice dialog platform, comprising:
acquiring n semantic results with highest possibility of voice data input by a user according to voice recognition and understanding;
when n is greater than 1, determining the field related to each semantic result, and judging whether the semantic slot corresponding to each semantic result is a key semantic slot in the field;
adding m semantic results with key semantic slots to a disambiguation candidate list, wherein m is less than or equal to n;
and when m is greater than 1, automatically disambiguating the disambiguation candidate list according to the existing resources to obtain l semantic results, wherein the existing resources comprise historical context information, historical disambiguation records, voice conversation platform resources and/or a customized disambiguation rule base.
2. The method of claim 1, wherein the method further comprises:
when the disambiguation candidate list is automatically disambiguated according to the existing resources to obtain more than l semantic results, feeding the more than l semantic results back to the user for the user to confirm;
when a user inputs a confirmation instruction corresponding to the feedback, determining a semantic result corresponding to the voice data input by the user, and executing corresponding operation;
and when the user inputs an abnormal instruction, feeding back abnormal prompt information.
3. The method of claim 1, wherein, when the existing resources include at least historical context information and/or a custom disambiguation rule base:
querying the historical context information and/or the customized disambiguation rule base for information corresponding to each semantic result in the disambiguation candidate list;
and disambiguating each semantic result in the disambiguation candidate list according to the corresponding information.
4. The method of claim 1, wherein, when the existing resource includes at least a historical disambiguation record:
inquiring whether the voice data input by the user has a historical disambiguation record within a preset time range;
and when the historical disambiguation record exists, determining a semantic result corresponding to the voice data input by the user according to the historical disambiguation record.
5. The method of claim 1, wherein, when the existing resources include at least voice conversation platform resources:
querying voice conversation platform resources corresponding to each semantic result in the disambiguation candidate list;
disambiguating semantic results that do not have a corresponding voice dialog platform resource.
6. A voice dialog processing system for a voice dialog platform, comprising:
the semantic understanding acquisition program module is used for acquiring n semantic results with highest possibility of voice data input by a user according to voice recognition and understanding;
the key semantic groove determining program module is used for determining the field related to each semantic result when n is greater than 1, and judging whether the semantic groove corresponding to each semantic result is a key semantic groove in the field;
a disambiguation candidate list determining program module for adding m semantic results having a key semantic slot to a disambiguation candidate list, wherein m is less than or equal to n;
and the automatic disambiguation program module is used for automatically disambiguating the disambiguation candidate list according to the existing resources to obtain a semantic result I when m is greater than 1, wherein the existing resources comprise historical context information, historical disambiguation records, voice conversation platform resources and/or a customized disambiguation rule base.
7. The system of claim 6, wherein the system method further comprises: user parser module for
When the disambiguation candidate list is automatically disambiguated according to the existing resources to obtain more than l semantic results, feeding the more than l semantic results back to the user for the user to confirm;
when a user inputs a confirmation instruction corresponding to the feedback, determining a semantic result corresponding to the voice data input by the user, and executing corresponding operation;
and when the user inputs an abnormal instruction, feeding back abnormal prompt information.
8. The system of claim 6, wherein when the existing resources include at least historical context information and/or a custom disambiguation rule base:
querying the historical context information and/or the customized disambiguation rule base for information corresponding to each semantic result in the disambiguation candidate list;
and disambiguating each semantic result in the disambiguation candidate list according to the corresponding information.
9. The system of claim 6, wherein, when the existing resource includes at least a historical disambiguation record:
inquiring whether the voice data input by the user has a historical disambiguation record within a preset time range;
and when the historical disambiguation record exists, determining a semantic result corresponding to the voice data input by the user according to the historical disambiguation record.
10. The system of claim 6, wherein when the existing resources include at least voice conversation platform resources:
querying voice conversation platform resources corresponding to each semantic result in the disambiguation candidate list;
disambiguating semantic results that do not have a corresponding voice dialog platform resource.
CN201810835994.7A 2018-07-26 2018-07-26 Voice conversation processing method and system for voice conversation platform Active CN108962233B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810835994.7A CN108962233B (en) 2018-07-26 2018-07-26 Voice conversation processing method and system for voice conversation platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810835994.7A CN108962233B (en) 2018-07-26 2018-07-26 Voice conversation processing method and system for voice conversation platform

Publications (2)

Publication Number Publication Date
CN108962233A CN108962233A (en) 2018-12-07
CN108962233B true CN108962233B (en) 2020-11-17

Family

ID=64463950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810835994.7A Active CN108962233B (en) 2018-07-26 2018-07-26 Voice conversation processing method and system for voice conversation platform

Country Status (1)

Country Link
CN (1) CN108962233B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109462546A (en) * 2018-12-28 2019-03-12 苏州思必驰信息科技有限公司 A kind of voice dialogue history message recording method, apparatus and system
CN111831795B (en) * 2019-04-11 2023-10-27 北京猎户星空科技有限公司 Multi-round dialogue processing method and device, electronic equipment and storage medium
CN110570867A (en) * 2019-09-12 2019-12-13 安信通科技(澳门)有限公司 Voice processing method and system for locally added corpus
CN110705267B (en) * 2019-09-29 2023-03-21 阿波罗智联(北京)科技有限公司 Semantic parsing method, semantic parsing device and storage medium
CN110808051A (en) * 2019-10-30 2020-02-18 腾讯科技(深圳)有限公司 Skill selection method and related device
CN111125346B (en) * 2019-12-26 2022-07-08 思必驰科技股份有限公司 Semantic resource updating method and system
CN111274819A (en) * 2020-02-13 2020-06-12 北京声智科技有限公司 Resource acquisition method and device
CN112148847B (en) * 2020-08-27 2024-03-12 出门问问创新科技有限公司 Voice information processing method and device
CN112634888A (en) * 2020-12-11 2021-04-09 广州橙行智动汽车科技有限公司 Voice interaction method, server, voice interaction system and readable storage medium
CN112486844B (en) * 2020-12-18 2022-07-08 思必驰科技股份有限公司 Data increment testing method and system for resource type data
CN112685535A (en) * 2020-12-25 2021-04-20 广州橙行智动汽车科技有限公司 Voice interaction method, server, voice interaction system and storage medium
CN113918701B (en) * 2021-10-20 2022-04-15 北京亿信华辰软件有限责任公司 Billboard display method and device
CN115019787A (en) * 2022-06-02 2022-09-06 中国第一汽车股份有限公司 Interactive homophonic and heteronym word disambiguation method, system, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104299623A (en) * 2013-07-15 2015-01-21 国际商业机器公司 Automated confirmation and disambiguation modules in voice applications
CN106228983A (en) * 2016-08-23 2016-12-14 北京谛听机器人科技有限公司 Scene process method and system during a kind of man-machine natural language is mutual
CN107785018A (en) * 2016-08-31 2018-03-09 科大讯飞股份有限公司 More wheel interaction semantics understanding methods and device
CN108231080A (en) * 2018-01-05 2018-06-29 广州蓝豹智能科技有限公司 Voice method for pushing, device, smart machine and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120059658A1 (en) * 2010-09-08 2012-03-08 Nuance Communications, Inc. Methods and apparatus for performing an internet search
KR101709187B1 (en) * 2012-11-14 2017-02-23 한국전자통신연구원 Spoken Dialog Management System Based on Dual Dialog Management using Hierarchical Dialog Task Library
US10055403B2 (en) * 2016-02-05 2018-08-21 Adobe Systems Incorporated Rule-based dialog state tracking

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104299623A (en) * 2013-07-15 2015-01-21 国际商业机器公司 Automated confirmation and disambiguation modules in voice applications
CN106228983A (en) * 2016-08-23 2016-12-14 北京谛听机器人科技有限公司 Scene process method and system during a kind of man-machine natural language is mutual
CN107785018A (en) * 2016-08-31 2018-03-09 科大讯飞股份有限公司 More wheel interaction semantics understanding methods and device
CN108231080A (en) * 2018-01-05 2018-06-29 广州蓝豹智能科技有限公司 Voice method for pushing, device, smart machine and storage medium

Also Published As

Publication number Publication date
CN108962233A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
CN108962233B (en) Voice conversation processing method and system for voice conversation platform
US20210201932A1 (en) Method of and system for real time feedback in an incremental speech input interface
CN107895578B (en) Voice interaction method and device
CN108268587B (en) Context-aware human-machine conversation
CN107146602B (en) Voice recognition method and device and electronic equipment
KR102222317B1 (en) Speech recognition method, electronic device, and computer storage medium
US11564090B1 (en) Audio verification
US11669300B1 (en) Wake word detection configuration
US10917758B1 (en) Voice-based messaging
CN106796787B (en) Context interpretation using previous dialog behavior in natural language processing
TWI585745B (en) Method for processing speech in a digital assistant, electronic device for processing speech, and computer readable storage medium for processing speech
US11763808B2 (en) Temporary account association with voice-enabled devices
CN111540349B (en) Voice breaking method and device
WO2017166650A1 (en) Voice recognition method and device
CN109979450B (en) Information processing method and device and electronic equipment
CN110223692B (en) Multi-turn dialogue method and system for voice dialogue platform cross-skill
CN110765270B (en) Training method and system of text classification model for spoken language interaction
US11721328B2 (en) Method and apparatus for awakening skills by speech
CN111832308A (en) Method and device for processing consistency of voice recognition text
CN111540356A (en) Correction method and system for voice conversation
CN111414764A (en) Method and system for determining skill field of dialog text
CN112182046A (en) Information recommendation method, device, equipment and medium
CN109273004B (en) Predictive speech recognition method and device based on big data
CN112988956A (en) Method and device for automatically generating conversation and method and device for detecting information recommendation effect
CN112786031B (en) Man-machine conversation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee before: AI SPEECH Co.,Ltd.

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Speech Conversation Processing Method and System for a Speech Conversation Platform

Effective date of registration: 20230726

Granted publication date: 20201117

Pledgee: CITIC Bank Limited by Share Ltd. Suzhou branch

Pledgor: Sipic Technology Co.,Ltd.

Registration number: Y2023980049433