WO2021098318A1 - 应答方法、终端及存储介质 - Google Patents

应答方法、终端及存储介质 Download PDF

Info

Publication number
WO2021098318A1
WO2021098318A1 PCT/CN2020/111150 CN2020111150W WO2021098318A1 WO 2021098318 A1 WO2021098318 A1 WO 2021098318A1 CN 2020111150 W CN2020111150 W CN 2020111150W WO 2021098318 A1 WO2021098318 A1 WO 2021098318A1
Authority
WO
WIPO (PCT)
Prior art keywords
intention
prediction
answer
moment
target text
Prior art date
Application number
PCT/CN2020/111150
Other languages
English (en)
French (fr)
Inventor
张文涛
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Priority to US17/775,406 priority Critical patent/US20220399013A1/en
Priority to EP20890060.5A priority patent/EP4053836A4/en
Publication of WO2021098318A1 publication Critical patent/WO2021098318A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • This application relates to the field of terminal technology, and in particular to a response method, terminal and storage medium.
  • Speech recognition refers to the process of obtaining useful information from audio data and using related technologies to recognize the audio data and convert the audio data into text information.
  • the embodiments of the present application provide a response method, terminal, and storage medium, which not only improve response processing efficiency, but also overcome the defect of loss of intent, further improve the accuracy of response processing, and the terminal is more intelligent.
  • an embodiment of the present application provides a response method, and the method includes:
  • response processing is performed according to the answer to be pushed.
  • an embodiment of the present application provides a terminal, and the terminal includes: a determination part, a judgment part, and a processing part,
  • the determining part is configured to determine the first target text corresponding to the first moment through voice recognition processing at the first moment;
  • the determining part is further configured to determine the first predicted intention and the answer to be pushed according to the first target text; wherein the answer to be pushed is used to respond to voice information;
  • the determining part is further configured to continue to determine the second target text and the second predicted intention corresponding to the second moment through the voice recognition processing; wherein, the second moment is the next consecutive moment at the first moment ;
  • the judging part is configured to judge whether a preset response condition is satisfied according to the first prediction intention and the second prediction intention;
  • the processing part is configured to perform response processing according to the answer to be pushed if it is determined that the preset response condition is satisfied.
  • an embodiment of the present application provides a terminal.
  • the terminal includes a processor and a memory storing executable instructions of the processor. When the instructions are executed by the processor, the above-mentioned Response method.
  • an embodiment of the present application provides a computer-readable storage medium with a program stored thereon and applied to a terminal.
  • the program is executed by a processor, the above-mentioned response method is implemented.
  • the embodiments of the present application provide a response method, terminal, and storage medium.
  • the terminal determines the first target text corresponding to the first time through voice recognition processing at the first time; determines the first predicted intention and the answer to be pushed according to the first target text ; Among them, the answer to be pushed is used to respond to the voice information; continue to use voice recognition processing to determine the second target text and the second predicted intention corresponding to the second moment; where the second moment is the next consecutive moment at the first moment ; According to the first prediction intention and the second prediction intention, it is determined whether the preset response condition is satisfied; if it is determined that the preset response condition is satisfied, the response processing is performed according to the answer to be pushed.
  • the terminal uses real-time voice recognition processing to continuously predict the input voice information, assemble the answer in advance, and temporarily store the answer.
  • Push answers to implement response processing It not only improves the efficiency of response processing, but also overcomes the defect of intent loss, further improves the accuracy of response processing, and the terminal is more intelligent.
  • Figure 1 is a schematic structural diagram of a voice response system proposed in an embodiment of the application
  • FIG. 2 is a schematic diagram 1 of the implementation process of the response method proposed in the embodiment of the application;
  • FIG. 3 is a schematic diagram of real-time identification integration proposed by an embodiment of the application.
  • FIG. 4 is a schematic diagram of meeting preset response conditions proposed by an embodiment of the application.
  • FIG. 5 is a second schematic diagram of the implementation process of the response method proposed in the embodiment of this application.
  • FIG. 6 is a schematic diagram of not satisfying the preset response condition according to an embodiment of the application.
  • FIG. 7 is a schematic diagram 1 of the composition structure of a terminal proposed in an embodiment of the application.
  • FIG. 8 is a second schematic diagram of the composition structure of a terminal proposed in an embodiment of the application.
  • Speech recognition Automatic Speech Recognition, ASR refers to the process of obtaining useful information from audio data, and using related technologies to recognize the audio data, and convert the audio data into text information, that is, let the machine recognize and understand The process is a high-tech process that converts language signals into corresponding texts or commands, which are equivalent to human ears and mouths.
  • Speech synthesis (Text-To-Speech, TTS) technology is a technology that generates artificial speech through mechanical and electronic methods. It is a technology that converts the text information generated by the computer itself or input from outside into an understandable and fluent Chinese spoken output technology, which is analogous to a human mouth. Furthermore, speech synthesis and speech recognition technology are two key technologies necessary to realize human-machine speech communication and establish a spoken language system with listening and speaking capabilities.
  • the response system is a system used to realize human-machine spoken dialogue. Its front end is a speech recognizer, which converts the speech information into corresponding text through speech recognition processing on the input speech information, and then calls Natural Language Understanding (Natural Language Understanding) , NLU) technology for entity recognition and intent recognition. After intent comparison, inheritance, fusion and other schemes, the only intent with the highest score is obtained, and the answer is assembled according to the intent with the highest score, and the result is further assembled. The answer is processed by speech synthesis, and the text information is converted into voice (read it aloud), and then the voice response is realized.
  • Natural Language Understanding Natural Language Understanding
  • the response mechanism in related technologies is often determined to perform voice recognition processing after the user has finished speaking, that is, after recognizing the end of voice input. , And perform the intent recognition and answer assembly processes based on the obtained recognition results, making the answer assembly have an insurmountable time starting point (the end of the user’s speech), which seriously affects the processing efficiency of response processing, and is facing large-length voice signals
  • the defect of low response efficiency is more obvious; at the same time, when recognizing the intention of a large-length speech signal, the response mechanism in the related technology will regard the speech input information as a complete sentence, and Only a single intent will be retained. Since large-length speech signals are not just a single intent, a single recognized intent will cause the defect of loss of user intent and reduce the accuracy of response processing.
  • the embodiment of the present application provides a response method, in which the terminal performs continuous intent prediction on the input voice information through real-time voice recognition processing, pre-assembles the answer, and temporarily stores the answer.
  • the terminal performs continuous intent prediction on the input voice information through real-time voice recognition processing, pre-assembles the answer, and temporarily stores the answer.
  • Push the answer to realize the answer processing It not only improves the efficiency of response processing, but also overcomes the defect of intent loss, further improves the accuracy of response processing, and the terminal is more intelligent.
  • FIG. 1 is a schematic structural diagram of the voice response system proposed in an embodiment of the application.
  • the voice response system 100 includes: a voice continuous recognition module 101, an intention prediction module 102, an intention management module 103, The answer assembly module 104, the answer decision module 105, and the voice response module 106.
  • the voice continuous recognition module 101 converts the input voice information into corresponding text information in real time; the intention prediction module 102 and the intention management module 103 predict the intention and make a decision on whether the predicted intention is correct; the answer assembly module 104 predicts the module according to the intention The predicted intent is assembled to obtain the answer to be pushed; the answer decision module 105 provides temporary storage of the answer to be pushed and receives the decision result of the intent management module on the predicted intent, and further implements the decision result of the predicted intent according to the intent management module Send or discard the answer to be pushed; the voice response module 106 converts the corresponding text information to be pushed into voice information.
  • Fig. 2 is a schematic diagram 1 of the implementation process of the response method proposed in the embodiment of the application.
  • the method for the terminal to perform response processing may include the following steps:
  • Step 1001 Determine the first target text corresponding to the first moment through voice recognition processing at the first moment.
  • the terminal may determine the first target text corresponding to the first moment through voice recognition processing at the first moment.
  • the terminal may be any device that has communication and storage functions and is provided with a voice response system.
  • the terminal may be any device that has communication and storage functions and is provided with a voice response system.
  • tablet computers mobile phones, smart speakers, smart TVs, smart air purifiers, smart air conditioners, e-book readers, MP3 (Moving Picture Experts Group Audio Layer III, dynamic image experts compress standard audio layer 3) players, MP4 ( Moving Picture Experts Group Audio Layer IV, moving picture experts compress standard audio layer 4) Players and other equipment.
  • the terminal determines the first target text according to the voice information acquired at the first moment. Specifically, the terminal obtains the first voice information corresponding to the first moment, performs voice recognition processing on the first voice information through the voice continuous recognition module, converts the first voice information into the first target text, and then determines the first voice information corresponding to the first moment The first target text.
  • the first voice information acquired by the terminal can be collected by the terminal itself through the audio collection component; or, it can also be collected by other equipment through the audio collection component and sent to the terminal.
  • the other equipment is an electronic device independent of the terminal. equipment.
  • the continuous speech recognition module performs speech recognition processing on the first speech information based on the "acoustic model” and the "language model”, thereby determining the first target text corresponding to the first speech information
  • the acoustic model (AM) is obtained by training speech data
  • the input is a feature vector
  • the output is phoneme information
  • the language model (LM) is obtained by training a large amount of text information to obtain a single word Or the probability that the words are related to each other.
  • first preprocess the acquired first voice information corresponding to the first moment extract the feature information of the voice, and then determine the phoneme information corresponding to the feature information through the "acoustic model", that is, the first voice information corresponds to The phoneme information of the phoneme, continue to use the "dictionary” to find all the characters or words corresponding to the phoneme, and use the "language model” to get the probability that the words or words corresponding to the phoneme are related to each other, so as to further determine the first voice through "decoding”
  • the best target text corresponding to the message is the first target text.
  • the terminal implements the continuous speech recognition processing through the continuous speech recognition module, and there is no need to look for the sentence end marker, and the whole conversation is no longer regarded as a whole.
  • the terminal collects voice information in real time, and converts the acquired voice information into target text in real time through the voice continuous recognition module.
  • the terminal collects the voice information corresponding to that time, and obtains a clear output through the voice continuous recognition module, that is, the target text corresponding to the voice information at that time.
  • the voice continuous recognition module obtains the phoneme information corresponding to the first voice information based on AM as "wodedingdan", and further determines based on LM that the first target text corresponding to the first voice information at time T1 is " My Order”.
  • the terminal may further predict the intent corresponding to the voice information at the first moment and the answer to be pushed according to the determined first target text. Make sure.
  • Step 1002 Determine the first predicted intention and the answer to be pushed according to the first target text; wherein the answer to be pushed is used to respond to the voice information.
  • the terminal after determining the first target text corresponding to the first moment, the terminal can determine the first predicted intention and the answer to be pushed according to the first target text, where the answer to be pushed is used to respond to the voice information Respond.
  • the terminal may use the intent prediction module to predict the intent of the determined first target text, that is to say, Before the user finishes speaking, the terminal can predict the question that the user wants to consult based on the acquired first target text, thereby determining the first prediction intention, and store the first prediction intention in the intention management module.
  • the answer assembly module is based on The determined first prediction intention is assembled in advance to obtain the first answer corresponding to the first prediction intention, and the first answer is used as the answer to be pushed and stored in the answer decision module.
  • the user's first target text determined by the user at the first moment is “my order”
  • a prediction intention is "logistics query”
  • the answer is assembled in advance according to the determined first prediction intention, and the first answer is obtained as “logistics information”, and then the "logistics information” is temporarily stored as the answer to be pushed.
  • the real-time voice recognition processing can be continued to determine the target text corresponding to the next time consecutive to the first time. And the prediction intent is determined.
  • Step 1003 Continue to use voice recognition processing to determine the second target text and the second predicted intention corresponding to the second moment; where the second moment is the next consecutive moment of the first moment.
  • the terminal after the first prediction intention is determined based on the first target text, the terminal continues to perform voice recognition processing, and then determines the second target text and the second target text corresponding to the second time consecutive to the first time. Forecast intent.
  • the terminal performs real-time voice recognition processing, and after determining the first predicted intent corresponding to the first target text and the answer to be pushed, it continues to perform voice recognition processing through the voice recognition module. Acquire the voice information corresponding to the next moment consecutive to the first moment, and determine the target text corresponding to the next moment according to the voice information, that is, the second target text. The intention prediction module determines the second target text according to the second target text. The second prediction intent corresponding to the text.
  • the second target text is a real-time recognition integration result of the first target text corresponding to the first moment and the real-time text corresponding to the second moment.
  • the speech continuous recognition module will perform real-time recognition and integration of the target text corresponding to the previous moment and the real-time text corresponding to the current moment at each moment, and use the real-time recognition integration result as the target text corresponding to the current moment, that is to say , Perform real-time recognition and integration of the first target text corresponding to the first moment with the real-time text corresponding to the second moment, and then determine the second target text corresponding to the second moment, and the intention prediction module determines the second predicted intention according to the second target text .
  • FIG. 3 is a schematic diagram of real-time recognition integration proposed in an embodiment of the application.
  • the real-time text corresponding to the voice continuous recognition module 101 at T1, T2, and T3 is "my" and "order” respectively. "Where is it?", the continuous speech recognition module 101 performs real-time recognition and integration at each time. Real-time recognition and integration are not required at T1. That is, the real-time text "mine” corresponding to T1 is the target text corresponding to T1. , At T2, by real-time recognition and integration of the target text "My” corresponding to T1 and the real-time text "Order” corresponding to T2, the target text corresponding to T2 is "My order".
  • the target text is "Where is my order?" Further, the intention prediction module 102 predicts the user's intention according to the target text "my order" corresponding to time T2, and the second predicted intention that can be determined is "logistics query”.
  • the determined first prediction intention and second prediction intention can be used to determine whether the terminal is Judge if the response conditions are met.
  • Step 1004 According to the first predicted intention and the second predicted intention, it is judged whether the preset response condition is satisfied.
  • the terminal after the terminal determines the first prediction intention corresponding to the first moment and the second prediction intention corresponding to the second moment, it can determine the terminal according to the determined first prediction intention and second prediction intention Whether the preset response conditions are met.
  • the first predicted intent can be the predicted intent determined according to the target text corresponding to the previous moment
  • the second predicted intent is the current continuous with the previous moment.
  • the predicted intent corresponding to the moment, and the determined first predicted intent and the second predicted intent are stored in the intent management module, and the intent management module judges whether the preset response condition is currently met according to the determined first predicted intent and the second predicted intent . Specifically, when the first prediction intention and the second prediction intention are consistent, it is determined that the preset response condition is satisfied; when the first prediction intention and the second prediction intention are not consistent, it is determined that the preset response condition is not satisfied.
  • the voice recognition process is continued, and the acquired voice information is predicted for the intent, and the second predicted intent corresponding to the second moment is In the case of "logistics query”, the first predicted intent is consistent with the second predicted intent, that is, the intention management module determines that the preset response conditions are currently met; the second predicted intent at the second moment is "signing abnormal" In the case that the first predicted intention is inconsistent with the second predicted intention, at this time, the intention management module determines that the preset response condition is not currently met.
  • the terminal judges whether the preset response condition is satisfied according to the first predicted intention and the second predicted intention, it may further determine whether to perform response processing according to the judgment result.
  • Step 1005 If it is determined that the preset response condition is satisfied, the response processing is performed according to the answer to be pushed.
  • the terminal after the terminal determines whether the preset response condition is satisfied according to the first predicted intention and the second predicted intention, in the case of determining that the preset response condition is satisfied, the terminal will follow the temporarily stored waiting condition. Push answers for response processing.
  • the intention management module will send the decision result determined that the preset response condition is currently met to the answer decision module, and the answer decision module will temporarily store The answer to be pushed is sent to achieve response processing, that is, the answer to be pushed determined in advance based on the first prediction intention is subjected to response processing.
  • the answer to be pushed means that the answer processing is performed according to the first answer determined by the first prediction intention.
  • determining that the preset response condition is satisfied indicates that the first predicted intention determined according to the first target text is the user's explicit intention, and the first answer corresponding to the first predicted intention is also the information that the user really wants to obtain.
  • the response process can be directly based on the first answer assembled in advance, that is, the answer to be pushed temporarily stored in the answer decision module.
  • FIG. 4 is a schematic diagram of meeting preset response conditions proposed by an embodiment of the application.
  • the intention prediction module 102 predicts that the first prediction intention is "logistics query”
  • the prediction intention is The "logistics query”
  • the first answer determined by the answer assembly module according to the first predicted intention is "logistics information”.
  • the “logistics information” is the answer to be pushed and stored in the answer decision module 105.
  • the intention management module 103 determines that the first predicted intention is consistent with the second predicted intention, indicating that the preset response conditions are currently met, that is, the user's intention is clear.
  • the real intention of users is "logistics inquiry", and the information they want to obtain is “logistics information”.
  • the intention management module 103 sends the current decision result that satisfies the preset response condition to the answer decision
  • the module 105 directly performs response processing according to the answer to be pushed stored in the answer decision module 105 in advance, that is, the first answer "logistics information”.
  • the voice response needs to be implemented based on the voice synthesis technology. Specifically, the text information corresponding to the answer to be pushed is converted into a target voice through a speech synthesis technology, and then the target voice is played through a device such as a speaker, thereby realizing a voice response to the acquired voice information.
  • the terminal performs voice recognition processing through the voice continuous recognition module again at the next moment, and continues to implement the response processing.
  • the target text corresponding to the voice information at that time will be determined, and the prediction intention corresponding to this time will be further determined, and the prediction intention corresponding to the current time at the previous time is consistent with the current time.
  • the voice response module will continue to voice after responding to the current user's intention.
  • the recognition module will continue to perform voice recognition processing on the voice information input at the next moment, re-determine the target text corresponding to the new intention, and further predict the new intention corresponding to the voice information at the next moment, thereby implementing response processing.
  • the terminal performs continuous intention prediction through real-time speech recognition processing, and each time a clear user intention is recognized, it immediately responds according to the answer to be pushed, and continues to predict and respond to the next intention.
  • the voice response system does not end this time The response processing process, but continue to perform voice recognition processing on the input voice information.
  • the target text determined by the continuous voice recognition module is "It will rain heavily in these two days, and send it to me as soon as possible”, the intention prediction
  • the new predicted intent determined by the module based on the target text is "delivery”, which is a new intention that is different from the already responded intent "logistics query”, then we will respond to the new intention of "delivery” .
  • FIG. 5 is the second implementation flowchart of the response method proposed in the embodiment of the present application. As shown in FIG. Second, predict the intention. After judging whether the preset response condition is met, that is, after step 1004, the terminal response method may further include the following steps:
  • Step 1006 If it is determined that the preset response condition is not satisfied, the second answer is determined according to the second predicted intention.
  • the terminal determines whether the preset response condition is satisfied according to the first predicted intention and the second predicted intention, it can be further obtained that if the preset response condition is not met, then The second answer will be determined based on the second predicted intent.
  • the answer assembly module needs to reassemble the answer according to the determined second predicted intent to determine the second answer corresponding to the second predicted intent.
  • FIG. 6 is a schematic diagram of not satisfying the preset response condition proposed by an embodiment of the application.
  • the intent prediction module 102 predicts that the first predicted intent corresponding to the first moment is "logistics query”. And when the answer to be pushed determined according to the first predictive intention is "logistics information", when the second predictive intent determined at the second moment is "abnormal receipt", at this time, the intention management module 103 determines that the first The predicted intent is inconsistent with the second predicted intent. It is further determined that the preset response conditions are not met at present, indicating that the user’s real intention is to query and sign for abnormal information, not to query logistics, which is then stored in the “logistics information” section of the answer decision module 105.
  • the answer to be pushed is also wrong.
  • the intention management module 103 sends the current decision result that does not meet the preset response conditions to the answer decision module 105, and the answer decision module 105 discards the answer to be pushed "logistics information", and the answer is assembled
  • the module re-assembles the answer according to the determined second predicted intent, that is, "signature abnormality”, determines the second answer "signature information” corresponding to the predicted intent "signature abnormality”, and uses the second answer "signature information” as waiting
  • the push answer is stored in the answer decision module 105.
  • the terminal determines the second answer according to the second predicted intention, the answer to be pushed corresponding to the second moment needs to be further determined.
  • Step 1007 Set the second answer as the answer to be pushed.
  • the terminal after the terminal determines that the preset response condition is not satisfied according to the first predicted intention and the second predicted intention, and determines the second answer according to the second predicted intention, the terminal sets the second answer as waiting. Push the answer.
  • the second target text is determined according to the second target text.
  • the second predicted intent of is also more likely to be the user’s clear intention, which is more accurate than the first predicted intent. Therefore, the second answer determined based on the second predicted intent is also more likely to be the user really wants to get The second answer determined according to the second prediction intention is replaced with the answer to be pushed.
  • the answer assembly module determines the second answer "signature information” according to the determined second predicted intention "signature abnormality”, and uses the "signature information" as the answer to be pushed Store it in the answer decision module, and replace the answer to be pushed, which is the "logistics information” determined according to the first prediction intention at the previous moment.
  • the second answer is set as the answer to be pushed, it is necessary to continue to use voice recognition processing to further determine the predicted intent corresponding to the next moment, so that according to the second predicted intent, and the next consecutive moment corresponding to the second moment
  • the prediction intention judges whether the preset response conditions are met.
  • Step 1008 Continue to use voice recognition processing to determine the third target text and the third predicted intent corresponding to the next moment, and determine whether the preset response condition is met again according to the second predicted intent and the third predicted intent, so as to continue the response processing .
  • the voice recognition process is continued, the third target text and the third prediction intention corresponding to the third moment can be determined, and the second prediction is re-based The intention and the third predicted intention determine whether the preset response conditions are met, and then continue to implement response processing.
  • the intention management module determines that the first predicted intention is inconsistent with the second predicted intention, it indicates that the answer is assembled in advance according to the first predicted intention, and the determined answer to be pushed is not correct.
  • the answer assembly module will reassemble the answer according to the second predicted intention, and temporarily replace the second answer determined according to the second predicted intention with the answer to be pushed, and store it in the answer decision module.
  • the continuous speech recognition module obtains the third target text corresponding to the next moment and determines the third predicted intention through the intention prediction module, and then the intention management module judges whether the preset response conditions are met again according to the second predicted intention and the third predicted intention, and then Realize response processing.
  • the intention prediction module determines the third prediction intention corresponding to the next moment, and the intention management module determines that the third prediction intention is consistent with the second prediction intention. In this case, it indicates that the preset response condition is met at the next moment, and therefore, the response processing will be performed according to the answer to be pushed, that is, the second answer.
  • the embodiment of the present application provides a response method.
  • the terminal determines the first target text corresponding to the first time through voice recognition processing at the first time; determines the first predicted intention and the answer to be pushed according to the first target text; wherein, to be pushed
  • the answer is used to respond to the voice information; continue to use voice recognition processing to determine the second target text and the second prediction intention corresponding to the second moment; where the second moment is the next consecutive moment at the first moment; according to the first prediction
  • the terminal uses real-time voice recognition processing to continuously predict the input voice information, assemble the answer in advance, and temporarily store the answer.
  • Push answers to implement response processing It not only improves the efficiency of response processing, but also overcomes the defect of intent loss, further improves the accuracy of response processing, and the terminal is more intelligent.
  • the method for the terminal to determine the first prediction intention and the answer to be pushed according to the first target text may include the following steps:
  • Step 201 Perform prediction intent matching on the first target text through a preset prediction model, and determine N prediction intents corresponding to the first target text; wherein the preset prediction model is a model established based on deep learning, and N is greater than An integer of 1.
  • the terminal uses a preset prediction model to perform prediction intent matching on the determined first target text, thereby determining that the first target text matches the first target text.
  • the preset prediction model is a model established based on deep learning, and N is an integer greater than 1.
  • the terminal when the terminal predicts the intent matching of the first target text through the preset prediction model, since the intent prediction is performed during the user's speech, that is, the intent is determined at each time.
  • the target text performs intention prediction. At this time, the complete voice information corresponding to the user's intention is not obtained, and the complete target text information cannot be determined.
  • the target text obtained at each time is also part of the target text information. Therefore, the intention prediction module Multiple user intentions may be predicted when the target text, that is, part of the target text, is predicted by the preset prediction model.
  • the user input at this time that is, the complete target text information may be "where is my order” or “what is my order number” "Therefore, when the target text "My Order” is matched with the prediction intent through the preset prediction model, the obtained prediction intent can be “logistics query” or “single number query”, which will match multiple Forecast intent.
  • the only one predicted intent may be further determined from the N predicted intents.
  • Step 202 Determine a first prediction intention from the N prediction intentions.
  • the terminal after the terminal matches the first target text through the preset prediction model and determines the N predicted intents corresponding to the first target text, it may further determine the first from the N predicted intents. Forecast intent.
  • the terminal when the terminal performs prediction intent matching on the first target text through the preset prediction model, it can not only determine the N prediction intents corresponding to the first target text, but also obtain N prediction intents.
  • the weight value corresponding to the predicted intent After determining the N predicted intents corresponding to the first target text, the terminal needs to determine a predicted intent with the highest accuracy from the multiple predicted intents, that is, the first predicted intent.
  • the accuracy can be determined by the weight corresponding to the prediction intent, and the larger the weight value, the higher the accuracy.
  • the first prediction intention can be used to determine the answer to be pushed.
  • Step 203 Determine the first answer according to the first prediction intention, and use the first answer as the answer to be pushed.
  • the terminal may further determine the first answer according to the first predicted intention, and use the first answer as the answer to be pushed.
  • the terminal may assemble the answer in advance according to the first predicted intention, and determine the answer to be pushed.
  • the feature information corresponding to the first prediction intention is extracted.
  • the feature information can be a keyword; then all the information corresponding to the keyword is acquired, and all the information corresponding to the keyword is assembled by a preset algorithm. , And then get the first answer, the terminal sets the first answer as the answer to be pushed, and stores it in the answer decision module.
  • the terminal after determining that the first predicted intent is "logistics query", the terminal extracts the characteristic information corresponding to the predicted intent, such as the keyword "logistics", and then the terminal obtains the user's package corresponding to the keyword "logistics". Information about the inbound, outbound and vehicle driving in various places, and all the information is assembled through a preset algorithm to obtain a complete logistics information list, and the complete logistics information list is temporarily stored as the answer to be pushed.
  • the method for the terminal to determine the first prediction intention from the N prediction intentions may include the following steps:
  • Step 202a Obtain N weights corresponding to the N prediction intentions; among them, one prediction intention corresponds to one weight.
  • Step 202b Determine the prediction intention corresponding to the weight with the largest value among the N weights as the first prediction intention.
  • the terminal after determining the N predicted intents corresponding to the first target text, the terminal further obtains the N weights corresponding to the N predicted intents and the N predicted intents, and calculates the value of the N weights.
  • the predicted intent corresponding to the largest weight is determined as the first predicted intent.
  • the terminal when the terminal performs prediction intent matching on the first target text through the preset prediction model, it can not only determine the N prediction intents corresponding to the first target text, but also obtain N prediction intents.
  • the weight value corresponding to the predicted intent the weight value reflects the accuracy of the predicted intent. Furthermore, since it is necessary to determine the only predicted intent corresponding to the first target text from the obtained N predicted intents, that is, the probability is relatively high. Large and clear user intentions. Therefore, it is necessary to compare the N weight values corresponding to the N predicted intents, and use the predicted intent corresponding to the largest weight value, that is, the predicted intent with the highest accuracy as the predicted intent corresponding to the first target text , That is, the first prediction intention.
  • the intent prediction module determines multiple prediction intents such as “logistics query”, “order number query”, and “signature exception” according to the first target text "My Order”, and at the same time obtains the weight of "logistics query” as 0.45, the weight of "track number query” is 0.3, and the weight of "signature exception” is 0.25. It can be seen that the predicted intent "logistics query” corresponds to the largest weight value, indicating that the user's intention is more likely to be “logistics query”. Therefore, Among the multiple predicted intents corresponding to the first target text "My Order", the predicted intent "logistics query” with the largest weight value is taken as the first predicted intent.
  • the embodiment of the application provides a response method.
  • the terminal performs continuous intent prediction on the input voice information through real-time voice recognition processing, pre-assembles the answer, and temporarily stores the answer, and pushes it when it is determined that the preset response condition is currently met. Answer to achieve response processing. It not only improves the efficiency of response processing, but also overcomes the defect of intent loss, further improves the accuracy of response processing, and the terminal is more intelligent.
  • the method for the terminal to determine whether the preset response condition is satisfied according to the first predicted intention and the second predicted intention may include the following steps:
  • Step 301 Determine a first weight corresponding to the first prediction intention and a second weight corresponding to the second prediction intention.
  • the terminal may further determine the first weight corresponding to the first prediction intention and the second weight corresponding to the second prediction intention.
  • the predicted intent corresponding to the target text and the weight value of the predicted intent can be obtained, and the weight value reflects the determination.
  • the accuracy of the predicted intent That is to say, the first weight is the weight value corresponding to the first prediction intent with the highest accuracy, which is matched by the preset prediction model at the first moment, and correspondingly, the second weight is the weight value corresponding to the preset prediction model at the second moment. The matched weight value corresponding to the second prediction intent with the highest accuracy.
  • the terminal determines the first weight corresponding to the first prediction intention and the second weight corresponding to the second prediction intention
  • the first weight corresponding to the first prediction intention and the second weight corresponding to the second prediction intention are determined It can be further used to judge whether the preset response condition is met.
  • Step 302 When the first prediction intention and the second prediction intention are the same, and the first weight and the second weight are both greater than the preset weight threshold, it is determined that the preset response condition is satisfied; wherein the preset weight threshold is used to determine the prediction intention. The accuracy is determined.
  • the first prediction intention may be the same as the second prediction intention, and the first weight When both and the second weight are greater than the preset weight threshold, it is determined that the preset response condition is satisfied.
  • the preset response condition is satisfied, but it is further judged whether to meet the requirements based on the first weight and the second weight.
  • Preset response conditions Specifically, when the first prediction intention and the second prediction intention are the same, and the first weight and the second weight are both greater than the preset weight threshold, it is determined that the preset response condition is satisfied.
  • the preset weight threshold is a weight value that satisfies the preset response condition.
  • the first predicted intent corresponding to the first moment is “logistics query”
  • the second predicted intent corresponding to the second moment is also “logistics query”
  • the first prediction intention is consistent with the second prediction intention, but it cannot be determined that the preset response condition is satisfied. It is necessary to further determine whether the preset response condition is satisfied based on the first weight and the second weight. If the first weight is 0.75, The second weight is 0.81. At this time, not only the first prediction intention is consistent with the second prediction intention, but the first weight and the second weight are both greater than the preset weight threshold, and it can be determined that the preset response condition is satisfied.
  • the preset response condition is not satisfied.
  • the embodiment of the application provides a response method.
  • the terminal performs continuous intent prediction on the input voice information through real-time voice recognition processing, pre-assembles the answer, and temporarily stores the answer, and pushes it when it is determined that the preset response condition is currently met. Answer to achieve response processing. It not only improves the efficiency of response processing, but also overcomes the defect of intent loss, further improves the accuracy of response processing, and the terminal is more intelligent.
  • FIG. 7 is a schematic diagram 1 of the composition structure of the terminal proposed in the embodiment of the present application.
  • the terminal 20 proposed in the embodiment of the present application may include a determining part 21, The judging section 22, the processing section 23, the storage section 24, and the setting section 25.
  • the determining part 21 is configured to determine the first target text corresponding to the first time through voice recognition processing at the first time; and determine the first predicted intention and the answer to be pushed according to the first target text; wherein, The answer to be pushed is configured to respond to voice information; and the voice recognition processing is continued to determine the second target text and the second predicted intention corresponding to the second moment; wherein, the second moment is the first moment The next consecutive moment;
  • the judgment part 22 is configured to judge whether a preset response condition is satisfied according to the first prediction intention and the second prediction intention;
  • the processing part 23 is configured to perform response processing according to the answer to be pushed if it is determined that the preset response condition is satisfied.
  • the determining part 21 is specifically configured to obtain the first voice information corresponding to the first moment; and perform the voice recognition processing on the first voice information, and perform the voice recognition processing on the first voice information.
  • the voice information is converted into the first target text.
  • the determining part 21 is further specifically configured to perform prediction intent matching on the first target text through a preset prediction model, and determine N prediction intents corresponding to the first target text;
  • the preset prediction model is a model established based on deep learning, and N is an integer greater than 1; and the first prediction intent is determined from the N prediction intents; and the first prediction intent is determined according to the first prediction intent. State the first answer, and use the first answer as the answer to be pushed.
  • the determining part 21 is further specifically configured to obtain N weights corresponding to the N prediction intentions; wherein one prediction intention corresponds to one weight; and among the N weights, The predicted intent corresponding to the weight with the largest value is determined as the first predicted intent.
  • the determining part 21 is further specifically configured to obtain feature information corresponding to the first prediction intention; and to determine the first answer according to the feature information and a preset algorithm, wherein The preset algorithm is configured to perform answer assembly based on the characteristic information.
  • the storage part 24 is configured to store the answer to be pushed after taking the first answer as the answer to be pushed.
  • the determining part 21 is further specifically configured to obtain the second voice information corresponding to the second moment; and perform the voice recognition processing on the second voice information to determine the first voice information. 2. Real-time text corresponding to voice information; and determining the second target text according to the first target text and the real-time text.
  • the determining part 22 is specifically configured to determine that the preset response condition is satisfied when the first prediction intention is the same as the second prediction intention; and when the first prediction intention is the same When the intention is different from the second predicted intention, it is determined that the preset response condition is not satisfied.
  • the judgment part 22 is further specifically configured to determine a first weight corresponding to the first prediction intention and a second weight corresponding to the second prediction intention; and when the first prediction intention When the prediction intention is the same as the second prediction intention, and the first weight and the second weight are both greater than the preset weight threshold, it is determined that the preset response condition is satisfied; wherein, the preset weight threshold is used for The accuracy of the predicted intention is determined.
  • the determining part 21 is further configured to determine whether the preset response condition is satisfied according to the first predicted intention and the second predicted intention, and if it is determined that the preset response is not satisfied Condition, the second answer is determined according to the second predicted intention.
  • the setting part 25 is configured to set the second answer as the answer to be pushed.
  • the determining part 21 is further specifically configured to continue the voice recognition processing to determine the third target text and the third predicted intention corresponding to the next moment.
  • the judging part 22 is further configured to judge whether the preset response condition is satisfied again according to the second predicted intention and the third predicted intention, so as to continue to implement the response processing .
  • the processing part 23 is specifically configured to perform voice synthesis processing on the answer to be pushed to determine a target voice; and play the target voice to implement the response processing.
  • FIG. 8 is a second schematic diagram of the composition structure of the terminal proposed in the embodiment of the present application.
  • the terminal 20 proposed in the embodiment of the present application may further include a processor 26 and store the processor 26.
  • the terminal 20 may further include a communication interface 28 and a bus 29 for connecting the processor 26, the memory 27, and the communication interface 28.
  • the above-mentioned processor 26 may be an application specific integrated circuit (ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital signal processing device (Digital Signal Processing Device, DSPD). ), Programmable Logic Device (ProgRAMmable Logic Device, PLD), Field Programmable Gate Array (Field ProgRAMmable Gate Array, FPGA), Central Processing Unit (CPU), Controller, Microcontroller, Microprocessor At least one of.
  • ASIC application specific integrated circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • Field Programmable Gate Array Field ProgRAMmable Gate Array
  • FPGA Field ProgRAMmable Gate Array
  • CPU Central Processing Unit
  • Controller Microcontroller
  • Microprocessor At least one of.
  • the terminal 20 may further include a memory 27, which may be connected to the processor 26, wherein the memory 27 is used to store executable program code, the program code includes computer operation instructions, the memory 27 may include a high-speed RAM memory, or may also include Non-volatile memory, for example, at least two disk memories.
  • the bus 29 is used to connect the communication interface 28, the processor 26, the memory 27, and the mutual communication between these devices.
  • the memory 27 is used to store instructions and data.
  • the above-mentioned processor 26 is configured to determine the first target text corresponding to the first time through speech recognition processing at the first time; determine the first prediction according to the first target text Intentions and answers to be pushed; wherein, the answers to be pushed are used to respond to voice information; continue to use the voice recognition processing to determine the second target text and the second predicted intent corresponding to the second moment; wherein, the first The second moment is the next consecutive moment of the first moment; according to the first predicted intention and the second predicted intention, it is determined whether the preset response condition is satisfied; if it is determined that the preset response condition is satisfied, the Describe the answer to be pushed for response processing.
  • the aforementioned memory 27 may be a volatile memory (volatile memory), such as a random-access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory), such as a read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and send it to the processor 26 Provide instructions and data.
  • volatile memory such as a random-access memory (Random-Access Memory, RAM)
  • non-volatile memory such as a read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and send it to the processor 26 Provide instructions and data.
  • the functional modules in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be realized in the form of hardware or software function module.
  • the integrated unit When the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the technology in the related technology or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions to make a computer device ( It may be a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method in this embodiment.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • An embodiment of the present application provides a terminal, the terminal determines the first target text corresponding to the first moment through voice recognition processing at the first moment; determines the first predicted intention and the answer to be pushed according to the first target text; wherein, to be pushed The answer is used to respond to voice information; continue to use voice recognition processing to determine the second target text corresponding to the second moment and the second prediction intention; where the second moment is the next consecutive moment at the first moment; according to the first prediction For the intention and the second predicted intention, it is determined whether the preset response condition is satisfied; if it is determined that the preset response condition is satisfied, the response processing is performed according to the answer to be pushed.
  • the terminal uses real-time voice recognition processing to continuously predict the input voice information, assemble the answer in advance, and temporarily store the answer.
  • Push answers to implement response processing It not only improves the efficiency of response processing, but also overcomes the defect of intent loss, further improves the accuracy of response processing, and the terminal is more intelligent.
  • the embodiment of the present application provides a computer-readable storage medium on which a program is stored, and when the program is executed by a processor, the above-mentioned response method is realized.
  • the program instructions corresponding to a response method in this embodiment can be stored on storage media such as optical disks, hard disks, USB flash drives, etc.
  • storage media such as optical disks, hard disks, USB flash drives, etc.
  • response processing is performed according to the answer to be pushed.
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of hardware embodiment, software embodiment, or a combination of software and hardware embodiments. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device realizes the functions specified in one or more processes in the schematic diagram and/or one block or more in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in one or more processes in the schematic diagram and/or one block or more in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种应答方法、终端及存储介质,应答方法包括:在第一时刻通过语音识别处理确定第一时刻对应的第一目标文字(1001);根据第一目标文字确定第一预测意图和待推送答案;其中,待推送答案用于对语音信息进行应答(1002);继续通过语音识别处理,确定第二时刻对应的第二目标文字和第二预测意图;其中,第二时刻为第一时刻连续的下一个时刻(1003);根据第一预测意图和第二预测意图,判断是否满足预设应答条件(1004);若判定满足预设应答条件,则按照待推送答案进行应答处理(1005)。

Description

应答方法、终端及存储介质
相关申请的交叉引用
本申请基于申请号为201911147594.8、申请日为2019年11月21日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以全文引入的方式引入本申请。
技术领域
本申请涉及终端技术领域,尤其涉及一种应答方法、终端及存储介质。
背景技术
随着移动互联网的高速发展,语音识别及其相关技术成为最自然高效的人机交互手段之一,广泛应用于语音拨号、语音导航、智能家居控制、语音搜索、听写数据录入等场景。语音识别是指通过从音频数据中获取到有用的信息,并利用相关技术对音频数据进行识别,将音频数据转换为文字信息的过程。
在实际应用中,面向大篇幅的语音信号输入进行语音识别时,往往是在用户讲话结束后,才进行语音识别、意图识别以及答案组装等过程,然后再根据组装后的答案进行应答处理,严重影响了应答处理的处理效率;并且在进行意图识别时,单一的识别意图导致了用户意图丢失的缺陷,降低了应答处理的准确性。
发明内容
本申请实施例提供了一种应答方法、终端及存储介质,不仅提高了应答处理效率,同时,还克服了意图丢失的缺陷,进一步提高了应答处理的准确性,终端智能性更高。
本申请实施例的技术方案是这样实现的:
第一方面,本申请实施例提供了一种应答方法,所述方法包括:
在第一时刻通过语音识别处理确定所述第一时刻对应的第一目标文字;
根据所述第一目标文字确定第一预测意图和待推送答案;其中,所述待推送答案用于对语音信息进行应答;
继续通过所述语音识别处理,确定第二时刻对应的第二目标文字和第二预测意图;其中,所述第二时刻为所述第一时刻连续的下一个时刻;
根据所述第一预测意图和所述第二预测意图,判断是否满足预设应答条件;
若判定满足所述预设应答条件,则按照所述待推送答案进行应答处理。
第二方面,本申请实施例提供了一种终端,所述终端包括:确定部分,判断部分以及处理部分,
所述确定部分,配置为在第一时刻通过语音识别处理确定所述第一时刻对应的第一目标文字;
所述确定部分,还配置为根据所述第一目标文字确定第一预测意图和待推送答案; 其中,所述待推送答案用于对语音信息进行应答;
所述确定部分,还配置为继续通过所述语音识别处理,确定第二时刻对应的第二目标文字和第二预测意图;其中,所述第二时刻为所述第一时刻连续的下一个时刻;
所述判断部分,配置为根据所述第一预测意图和所述第二预测意图,判断是否满足预设应答条件;
所述处理部分,配置为若判定满足所述预设应答条件,则按照所述待推送答案进行应答处理。
第三方面,本申请实施例提供了一种终端,所述终端包括处理器、存储有所述处理器可执行指令的存储器,当所述指令被所述处理器执行时,实现如上所述的应答方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,其上存储有程序,应用于终端中,所述程序被处理器执行时,实现如上所述的应答方法。
本申请实施例提供了一种应答方法、终端及存储介质,终端在第一时刻通过语音识别处理确定第一时刻对应的第一目标文字;根据第一目标文字确定第一预测意图和待推送答案;其中,待推送答案用于对语音信息进行应答;继续通过语音识别处理,确定第二时刻对应的第二目标文字和第二预测意图;其中,第二时刻为第一时刻连续的下一个时刻;根据第一预测意图和第二预测意图,判断是否满足预设应答条件;若判定满足预设应答条件,则按照待推送答案进行应答处理。也就是说,在本申请的实施例中,终端通过实时语音识别处理,对输入的语音信息进行连续意图预测,提前进行答案组装,并将答案暂存,在判定当前满足预设应答条件时,推送答案以实现应答处理。不仅提高了应答处理效率,同时,还克服了意图丢失的缺陷,进一步提高了应答处理的准确性,终端智能性更高。
附图说明
图1为本申请实施例提出的语音应答系统结构示意图;
图2为本申请实施例提出的应答方法的实现流程示意图一;
图3为本申请实施例提出的实时识别整合示意图;
图4为本申请实施例提出的满足预设应答条件的示意图;
图5为本申请实施例提出的应答方法的实现流程示意图二;
图6为本申请实施例提出的不满足预设应答条件的示意图;
图7为本申请实施例提出的终端的组成结构示意图一;
图8为本申请实施例提出的终端的组成结构示意图二。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。可以理解的是,此处所描述的具体实施例仅用于解释相关申请,而非对该申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关申请相关的部分。
随着移动互联网的高速发展,语音识别及其相关技术成为最自然高效的人机交互手 段之一,广泛应用于语音拨号、语音导航、智能家居控制、语音搜索、听写数据录入等场景。语音识别(Automatic Speech Recognition,ASR)是指通过从音频数据中获取到有用的信息,并利用相关技术对音频数据进行识别,将音频数据转换为文字信息的过程,也就是让机器通过识别和理解过程把语言信号转变为相应的文本或命令的高技术,相当于人的耳朵以及嘴巴。
语音合成(Text-To-Speech,TTS)技术是通过机械的、电子的方法产生人造语音的技术。它是将计算机自己产生的、或外部输入的文字信息转变为可以听得懂的、流利的汉语口语输出的技术,类比于人类的嘴巴。进一步地,语音合成和语音识别技术是实现人机语音通信,建立一个有听和讲能力的口语系统所必需的两项关键技术。
应答系统是用于实现人机口语对话的系统,其前端是一个语音识别器,通过对输入的语音信息进行语音识别处理,将语音信息转换成对应的文本,然后调用自然语音理解(Natural Language Understanding,NLU)技术来进行实体识别以及意图识别,在经过意图比较,继承,融合等方案后,得出唯一得分最高的意图,并根据该得分最高的意图进行答案组装,以及进一步对组装后得到的答案进行语音合成处理,将文字信息转换为声音(朗读出来),进而实现语音应答。
在实际应用中,由于用户对系统的响应速度有着越来越高的要求,相关技术中的应答机制,往往是在用户讲话结束后,即识别到语音输入结束标识后,才确定进行语音识别处理,并根据获取到的识别结果进行意图识别以及答案组装等过程,使得答案组装存在一个不可逾越的时间起点(用户讲话结束),严重影响了应答处理的处理效率,且在面向大篇幅的语音信号输入进行语音识别时,该应答效率低下的缺陷更为明显;同时,在对大篇幅语音信号进行意图识别时,相关技术中的应答机制,会将该语音输入信息认为是完整的一句话,并且只会保留一个唯一的意图,由于大篇幅语音信号都不仅仅是一个单一的意图,因此,单一的识别意图将会造成用户意图丢失的缺陷,降低应答处理的准确性。
本申请实施例提供了一种应答方法,其中,终端通过实时语音识别处理,对输入的语音信息进行连续意图预测,提前进行答案组装,并将答案暂存,在判定当前满足预设应答条件时,推送答案以实现应答处理。不仅提高了应答处理效率,同时,还克服了意图丢失的缺陷,进一步提高了应答处理的准确性,终端智能性更高。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
本申请一实施例提供了一种应答方法,该应答方法应用于终端。终端设置有语音应答系统,图1为本申请实施例提出的语音应答系统结构示意图,如图1所示,语音应答系统100包括:语音连续识别模块101,意图预测模块102、意图管理模块103、答案组装模块104、答案决策模块105以及语音应答模块106。其中,语音连续识别模块101将输入的语音信息实时转换成对应的文本信息;意图预测模块102和意图管理模块103进行意图的预测以及对预测意图是否正确进行决策;答案组装模块104根据意图预测模 块预测出的意图进行答案组装,进而得到待推送答案;答案决策模块105提供待推送答案的暂存以及接收意图管理模块对预测意图的决策结果,并进一步根据意图管理模块对预测意图的决策结果实现待推送答案的发送或者丢弃;语音应答模块106将待推送对应的文本信息转换成语音信息。
图2为本申请实施例提出的应答方法的实现流程示意图一,如图2所示,在本申请的实施例中,终端进行应答处理的方法可以包括以下步骤:
步骤1001、在第一时刻通过语音识别处理确定第一时刻对应的第一目标文字。
在本申请的实施例中,终端可以在第一时刻通过语音识别处理确定出第一时刻对应的第一目标文字。
需要说明的是,在本申请的实施例中,终端可以为任何具备通信和存储功能、且设置有语音应答系统的设备。例如:平板电脑、手机、智能音箱、智能电视、智能空气净化器、智能空调、电子书阅读器、MP3(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)播放器、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器等设备。
需要说明的是,在本申请的实施例中,在第一时刻,终端根据第一时刻获取到的语音信息确定第一目标文字。具体地,终端获取第一时刻对应的第一语音信息,通过语音连续识别模块对第一语音信息进行语音识别处理,将第一语音信息转换成第一目标文字,进而确定出第一时刻对应的第一目标文字。其中,终端获取到的第一语音信息可以是终端自身通过音频采集组件采集到的;或者,也可以是其他设备通过音频采集组件采集到后发送给终端的,其他设备是与终端相独立的电子设备。
需要说明的是,在本申请的实施例中,语音连续识别模块基于“声学模型”和“语言模型”对第一语音信息进行语音识别处理,从而确定出第一语音信息对应的第一目标文字,其中,声学模型(Acoustic model,AM)是通过对语音数据进行训练获得,输入是特征向量,输出为音素信息;语言模型(language model,LM)为通过对大量文本信息进行训练,得到单个字或者词相互关联的概率。
具体地,首先对获取到的第一时刻对应的第一语音信息进行预处理,提取语音的特征信息,然后通过“声学模型”确定出与该特征信息对应的音素信息,即第一语音信息对应的音素信息,继续通过“字典”找出该音素对应的所有字或者词,并通过“语言模型”得到该音素对应的字或者词相互关联的概率,从而进一步通过“解码”确定出第一语音信息对应的最佳目标文字,即第一目标文字。
需要说明的是,在本申请的实施例中,终端通过语音连续识别模块实现语音连续识别处理,不需要再去寻找句子结束标志,不在将整通会话看作一个整体。具体地,终端对语音信息进行实时采集,并通过语音连续识别模块将获取到的语音信息实时转换成目标文字。相应地,在每一个对应的时刻T,终端采集该时刻对应的语音信息,通过语音连续识别模块都会得到一个明确的输出,也就是该时刻语音信息对应的目标文字。例如,在T1时刻采集第一语音信息,语音连续识别模块基于AM得到第一语音信息对应的音 素信息为“wodedingdan”,进一步基于LM确定出T1时刻第一语音信息对应的第一目标文字为“我的订单”。
进一步地,在本申请的实施例中,终端在确定出第一时刻对应的第一目标文字之后,可以根据确定出的第一目标文字进一步对第一时刻语音信息对应的预测意图以及待推送答案进行确定。
步骤1002、根据第一目标文字确定第一预测意图和待推送答案;其中,待推送答案用于对语音信息进行应答。
在本申请的实施例中,终端在确定出第一时刻对应的第一目标文字之后,可以根据第一目标文字确定出第一预测意图和待推送答案,其中,待推送答案用于对语音信息进行应答。
需要说明的是,在本申请的实施例中,终端在确定出第一时刻对应的第一目标文字之后,可以通过意图预测模块对确定出的第一目标文字进行意图预测,也就是说,在用户说完话之前,终端可以根据获取到的第一目标文字对用户想要咨询的问题进行预测,从而确定出第一预测意图,并将第一预测意图存放于意图管理模块,答案组装模块根据确定出的第一预测意图提前进行答案组装,得到第一预测意图对应的第一答案,并将该第一答案作为待推送答案,存储至答案决策模块。
示例性地,在用户在第一时刻确定出的第一目标文字为“我的订单”的情况下,此时预测用户输入可能为“我的订单到哪了”,进一步可以确定出用户的第一预测意图为“物流查询”,根据确定出的第一预测意图提前进行答案组装,得到第一答案为“物流信息”,进而将“物流信息”作为待推送答案进行暂存。
进一步地,在本申请的实施例中,根据第一目标文字确定出第一预测意图和待推送答案之后,可以继续通过实时语音识别处理,对与第一时刻连续的下一时刻对应的目标文字以及预测意图进行确定。
步骤1003、继续通过语音识别处理,确定第二时刻对应的第二目标文字和第二预测意图;其中,第二时刻为第一时刻连续的下一个时刻。
在本申请的实施例中,在根据第一目标文字确定出第一预测意图以后,终端继续进行语音识别处理,进而确定出与第一时刻连续的第二时刻对应的第二目标文字以及第二预测意图。
需要说明的是,在本申请的实施例中,终端通过进行实时语音识别处理,在确定出第一目标文字对应的第一预测意图以及待推送答案之后,继续通过语音识别模块进行语音识别处理,获取与第一时刻连续的下一时刻对应的语音信息,并根据该语音信息确定出下一时刻对应的目标文字,即第二目标文字,意图预测模块根据第二目标文字确定出与第二目标文字对应的第二预测意图。
需要说明的是,在本申请的实施例中,第二目标文字为第一时刻对应的第一目标文字与第二时刻对应的实时文字的实时识别整合结果。具体地,语音连续识别模块在每个时刻都会将上一时刻对应的目标文字与当前时刻对应的实时文字进行实时识别整合,并 将该实时识别整合结果作为当前时刻对应的目标文字,也就是说,将第一时刻对应的第一目标文字与第二时刻对应的实时文字进行实时识别整合,进而确定出第二时刻对应的第二目标文字,意图预测模块根据第二目标文字确定第二预测意图。
示例性地,图3为本申请实施例提出的实时识别整合示意图,如图3所示,语音连续识别模块101在T1、T2、T3时刻对应的实时文字分别为“我的”、“订单”、“到哪了”,语音连续识别模块101在每个时间进行实时识别整合,T1时刻不需要进行实时识别整合,即T1时刻对应的实时文字“我的”也就是T1时刻对应的目标文字为,T2时刻通过将T1时刻对应的目标文字“我的”和T2对应的实时文字“订单”进行实时识别整合,得到T2时刻对应的目标文字为“我的订单”,相同的,T3时刻对应的目标文字为“我的订单到哪了”。进一步地,意图预测模块102根据T2时刻对应的目标文字“我的订单”对用户意图进行预测,可以确定出的第二预测意图为“物流查询”。
进一步地,在本申请的实施例中,终端在确定出第二时刻对应的第二目标文字和第二预测意图之后,确定出的第一预测意图和第二预测意图,可以用于对终端是否满足应答条件进行判断。
步骤1004、根据第一预测意图和第二预测意图,判断是否满足预设应答条件。
在本申请的实施例中,终端在确定出第一时刻对应的第一预测意图以及第二时刻对应的第二预测意图之后,可以根据确定出的第一预测意图和第二预测意图,判断终端是否满足预设应答条件。
需要说明是的,在本申请的实施例中,通过语音连续识别,第一预测意图可以为根据上一时刻对应的目标文字确定出的预测意图,第二预测意图为与上一时刻连续的当前时刻对应的预测意图,并将确定出的第一预测意图以及第二预测意图存放于意图管理模块,意图管理模块根据确定出的第一预测意图和第二预测意图判断当前是否满足预设应答条件。具体地,在第一预测意图和第二预测意图一致的情况下,判定满足预设应答条件;在第一预测意图和第二预测意图不一致的情况下,判定不满足预设应答条件。
示例性地,在确定出第一时刻对应的第一预测意图为“物流查询”之后,继续进行语音识别处理,并对获取到的语音信息进行意图预测,在第二时刻对应的第二预测意图也是“物流查询”的情况下,第一预测意图与第二预测意图一致,也就是说,意图管理模块判定当前满足预设应答条件;在第二时刻对应的第二预测意图是“签收异常”的情况下,第一预测意图与第二预测意图不一致,此时,意图管理模块判定当前不满足预设应答条件。
进一步地,在本申请的实施例中,终端在根据第一预测意图和第二预测意图,对是否满足预设应答条件进行判断之后,可以进一步根据判定结果确定是否进行应答处理。
步骤1005、若判定满足预设应答条件,则按照待推送答案进行应答处理。
在本申请的实施例中,终端在根据确定出第一预测意图和第二预测意图,对是否满足预设应答条件进行判断之后,在判定满足预设应答条件的情况下,按照暂存的待推送答案进行应答处理。
需要说明的是,在本申请的实施例中,若判定出当前满足预设应答条件,意图管理模块将判定当前满足预设应答条件的决策结果发送至答案决策模块,那么答案决策模块将暂存的待推送答案进行发送,以实现应答处理,也就是说,将基于第一预测意图提前确定出的待推送答案进行应答处理。
具体地,在第一预测意图与第二预测意图一致,即判定当前满足预设应答条件的情况下,不再需要根据确定出的第二预测意图重新进行答案组装,而是直接按照暂存的待推送答案,也就是根据第一预测意图确定出的第一答案进行应答处理即可。也就是说,判定出满足预设应答条件表明了根据第一目标文字确定出的第一预测意图为用户明确的意图,第一预测意图对应的第一答案也是用户真正想要获取的信息,此时,直接基于提前组装的第一答案,也就是暂存在答案决策模块的待推送答案进行应答处理即可。
示例性地,图4为本申请实施例提出的满足预设应答条件的示意图,如图4所示,在意图预测模块102预测出第一预测意图为“物流查询”的情况下,将预测意图“物流查询”存储至意图管理模块103,答案组装模块根据第一预测意图确定出的第一答案为“物流信息”,现将“物流信息”作为待推送答案并存储在答案决策模块105,当确定出的第二预测意图也同样是“物流查询”时,此时意图管理模块103判断出第一预测意图与第二预测意图一致,表明当前满足预设应答条件,也就是说用户意图明确,用户真正的意图就是“物流查询”,想要获取的信息是“物流信息”。此时,不再需要根据第二预测意图“物流查询”再去确定与第一答案“物流信息”相同的第二答案,意图管理模块103将当前满足预设应答条件的决策结果发送至答案决策模块105,直接按照提前存储在答案决策模块105的待推送答案,也就是第一答案“物流信息”进行应答处理即可。
需要说明的是,在本申请的实施例中,在按照待推送答案进行应答处理时,需要基于语音合成技术实现语音应答。具体地,通过语音合成技术将待推送答案对应的文字信息转换成目标语音,然后通过扬声器等设备播放该目标语音,从而实现了对获取到的语音信息进行语音应答。
进一步地,在当前满足预设应答条件,基于待推送答案进行应答处理之后,终端重新在下一个时刻通过语音连续识别模块进行语音识别处理,继续实现所述应答处理。
由于通过连续识别,在每一个对应的时刻T通过ASR都会确定出该时刻语音信息对应的目标文字,并且进一步的确定出该时刻对应的预测意图,在上一时刻与当前时刻对应的预测意图一致时,立刻对用户输入的语音信息进行响应,实现语音应答。由于下一时刻输入的语音信息可能会对应用户的另外一个意图,即与上一时刻已响应的用户意图不相同的新的意图,因此,语音应答模块在对当前用户意图进行响应过后,语音连续识别模块会继续对下一时刻输入的语音信息进行语音识别处理,重新确定新的意图对应的目标文字,以及进一步对下一时刻语音信息对应的新的意图进行预测,进而实现应答处理。也就是说,终端通过实时语音识别处理,进行连续意图预测,且每识别出一个明确的用户意图,即刻按照待推送答案进行应答处理,并继续进行下一个意图的预测和应答处理。
示例性地,终端在确定出语音信息“我的电视机到哪了”的预测意图为“物流查询”,并按照待推送答案“物流信息”进行应答处理之后,语音应答系统并不结束此次应答处理过程,而是继续对输入的语音信息进行语音识别处理,在下一时刻语音连续识别模块确定出的目标文字为“这两天要下暴雨,尽快帮我送过来”的情况下,意图预测模块根据目标文字确定出的新的预测意图为“催配送”,是与已响应的意图“物流查询”不相同的新的意图,那么我们将针对“催配送”这一新的意图进行应答处理。
基于上述实施例,在本申请的另一实施例中,图5为本申请实施例提出的应答方法的实现流程图二,如图5所示,终端在根据确定出的第一预测意图和第二预测意图,对是否满足预设应答条件进行判断之后,即步骤1004之后,终端应答方法还可以包括以下步骤:
步骤1006、若判定不满足预设应答条件,则根据第二预测意图确定第二答案。
在本申请的实施例中,终端在根据确定出第一预测意图和第二预测意图,对是否满足预设应答条件进行判断之后,可以进一步得到,在不满足预设应答条件的情况下,那么将根据第二预测意图确定出第二答案。
需要说明的是,本申请的实施例中,若判定出当前不满足预设应答条件,也就是说根据获取到的第一目标文字,得到的第一预测意图并不是正确的,表明了在用户说完话之前,对用户想要咨询的问题进行预测,预测得到的用户意图并不想用户真正想要咨询的问题,进而根据第一预测意图确定出的待推送答案也不是正确答案,即不是用户真正想要获取的信息,此时答案组装模块需重新根据确定出的第二预测意图进行答案组装,确定第二预测意图对应的第二答案。
示例性地,图6为本申请实施例提出的不满足预设应答条件的示意图,如图6所示,在意图预测模块102预测出第一时刻对应的第一预测意图为“物流查询”,且根据第一预测意图确定出的待推送答案为“物流信息”的情况下,当第二时刻确定出的第二预测意图为“签收异常”时,此时,意图管理模块103判断出第一预测意图与第二预测意图不一致,进一步判定出当前不满足预设应答条件,表明了用户真正的意图为查询签收异常信息,并不是查询物流,进而存储在答案决策模块105的“物流信息”这一待推送答案也是错误的,此时,意图管理模块103将当前不满足预设应答条件的决策结果发送至答案决策模块105,答案决策模块105丢弃“物流信息”这一待推送答案,答案组装模块重新根据确定出的第二预测意图,即“签收异常”,重新进行答案组装,确定预测意图“签收异常”对应的第二答案“签收信息”,并将第二答案“签收信息”作为待推送答案存储至答案决策模块105。
进一步地,在根据第一预测意图和第二预测意图,判定出不满足预设应答条件,以及终端根据第二预测意图确定出第二答案之后,需进一步确定第二时刻对应的待推送答案。
步骤1007、将第二答案设置为待推送答案。
在本申请实施例中,终端在根据第一预测意图和第二预测意图,判定出不满足预设 应答条件,以及根据第二预测意图确定出第二答案之后,终端将第二答案设置为待推送答案。
需要说明的是,本申请的实施例中,根据第二预测意图确定出第二答案之后,由于第二时刻获取到的第二目标文字具有更多的有用信息,进而根据第二目标文字确定出的第二预测意图也更可能为用户明确的意图,相对于第一预测意图来说准确性更高,因此,将根据第二预测意图确定出的第二答案也更可能是用户真正想要获得的信息,则将根据第二预测意图确定出的第二答案替换为待推送答案。
示例性地,在第二时刻确定出的第二预测意图为“签收异常”时,表明了用户真正的意图可能是想查询订单签收异常,而不是第一预测意图“物流查询”,也就是说用户想要获得的信息是订单的签收信息,此时答案组装模块根据确定出的第二预测意图“签收异常”,确定出第二答案“签收信息”,并将“签收信息”作为待推送答案存储至答案决策模块,替换上一时刻根据第一预测意图确定出的“物流信息”这一待推送答案。
进一步地,在将第二答案设置为待推送答案之后,需要继续通过语音识别处理进一步确定下一时刻对应的预测意图,从而根据第二预测意图,以及与第二时刻连续的下一时刻对应的预测意图对是否满足预设应答条件进行判断。
步骤1008、继续通过语音识别处理,确定下一时刻对应的第三目标文字和第三预测意图,重新根据第二预测意图和第三预测意图,判断是否满足预设应答条件,以继续实现应答处理。
在本申请的实施例中,终端将第二答案设置为待推送答案之后,继续进行语音识别处理,可以确定出第三时刻对应的第三目标文字和第三预测意图,并且重新根据第二预测意图和第三预测意图判断是否满足预设应答条件,进而继续实现应答处理。
需要说明的是,本申请的实施例中,若意图管理模块判定第一预测意图与第二预测意图不一致,表明根据第一预测意图提前进行答案组装,确定出的待推送答案并不是正确的,此时答案组装模块将重新根据第二预测意图进行答案组装,并将根据第二预测意图确定出的第二答案暂时替换为待推送答案,存储至答案决策模块,为了进一步明确用户意图,继续通过语音连续识别模块获取下一时刻对应的第三目标文字以及通过意图预测模块确定第三预测意图,然后意图管理模块重新根据第二预测意图和第三预测意图,判断是否满足预设应答条件,进而实现应答处理。
需要说明的是,在本申请的实施例中,通过连续意图预测,意图预测模块确定出下一时刻对应的第三预测意图,在意图管理模块判定出第三预测意图与第二预测意图一致的情况下,表明下一时刻满足预设应答条件,因此,将按照待推送答案即第二答案进行应答处理。
本申请实施例提供了一种应答方法,终端在第一时刻通过语音识别处理确定第一时刻对应的第一目标文字;根据第一目标文字确定第一预测意图和待推送答案;其中,待推送答案用于对语音信息进行应答;继续通过语音识别处理,确定第二时刻对应的第二目标文字和第二预测意图;其中,第二时刻为第一时刻连续的下一个时刻;根据第一预 测意图和第二预测意图,判断是否满足预设应答条件;若判定满足预设应答条件,则按照待推送答案进行应答处理。也就是说,在本申请的实施例中,终端通过实时语音识别处理,对输入的语音信息进行连续意图预测,提前进行答案组装,并将答案暂存,在判定当前满足预设应答条件时,推送答案以实现应答处理。不仅提高了应答处理效率,同时,还克服了意图丢失的缺陷,进一步提高了应答处理的准确性,终端智能性更高。
基于上述实施例,在本申请的另一实施例中,终端根据第一目标文字确定第一预测意图和待推送答案的方法可以包括以下步骤:
步骤201、通过预设预测模型对所述第一目标文字进行预测意图匹配,确定与第一目标文字对应的N个预测意图;其中,预设预测模型为基于深度学习建立的模型,N为大于1的整数。
在本申请的实施例中,终端在确定出第一语音信息对应的第一目标文字之后,通过预设预测模型对确定出的第一目标文字进行预测意图匹配,从而确定出与第一目标文字对应的N个预测意图,其中,该预设预测模型是基于深度学习建立的模型,N为大于1的整数。
需要说明的是,在本申请的实施例中,终端在通过预设预测模型对第一目标文字进行预测意图匹配时,由于是在用户讲话过程中进行意图预测,即通过在各时刻确定出的目标文字进行意图预测,此时,并没有获取用户意图对应的完整的语音信息,进而也不能确定出完整的目标文本信息,各时刻获取到的目标文字也是部分目标文本信息,因此,意图预测模块通过预设预测模型对目标文字,也就是部分目标文本进行意图预测时可能会预测出多个用户意图。
示例性地,在第一目标文字为“我的订单”的情况下,此时用户输入即完整的目标文本信息可能为“我的订单到哪了”,也可能是“我的订单编号是多少”,因此,在通过预设预测模性对目标文字“我的订单”进行预测意图匹配时,得到的预测意图可以为“物流查询”,也可以为“单号查询”,会匹配出多个预测意图。
进一步地,在通过预设预测模型匹配出与第一目标文字对应的N个预测意图之后,可以进一步从N个预测意图中确定唯一一个预测意图。
步骤202、从N个预测意图中确定第一预测意图。
在本申请的实施例中,终端在通过预设预测模型对第一目标文字进行匹配,并确定出第一目标文字对应的N个预测意图之后,可以进一步从N个预测意图中确定出第一预测意图。
需要说明的是,本申请的实施例中,终端通过预设预测模型对第一目标文字进行预测意图匹配时,不仅可以确定出第一目标文字对应的N个预测意图,同时也得到了N个预测意图对应的权重值。在确定出第一目标文字对应的N个预测意图之后,终端需要从多个预测意图中确定出一个准确性最高的预测意图,即第一预测意图。可选的,准确性可以通过预测意图对应的权重进行确定,权重值越大,准确性越高。
进一步地,终端从N个预测意图中确定出第一预测意图之后,第一预测意图可以用 于对待推送答案进行确定。
步骤203、根据第一预测意图确定第一答案,并将第一答案作为待推送答案。
在本申请的实施例中,终端从N个预测意图中确定出第一预测意图之后,可以进一步根据第一预测意图确定出第一答案,并将第一答案作为待推送答案。
需要说明的是,在本申请的实施例中,在确定出第一预测意图之后,终端可以根据第一预测意图提前进行答案组装,确定待推送答案。具体地,提取第一预测意图对应的特征信息,可选的,特征信息可以为关键字;然后获取该关键字对应的所有信息,并通过预设算法对该关键字对应的所有信息进行答案组装,进而得到第一答案,终端将第一答案设置为待推送答案,并存储至答案决策模块。
示例性地,当确定出第一预测意图为“物流查询”之后,终端提取该预测意图对应的特征信息,例如关键字“物流”,随后终端获取关键字“物流”对应的该用户的包裹在各个地方进库、出库以及沿途车辆行驶的信息,并将所有信息通过预设算法进行组装,得到完整的物流信息列表,并将该完整的物流信息列表作为待推送答案进行暂存。
进一步地,基于上述实施例,终端从N个预测意图中确定第一预测意图的方法可以包括以下步骤:
步骤202a、获取N个预测意图对应的N个权重;其中,一个预测意图对应一个权重。
步骤202b、将N个权重中的、数值最大的权重对应的预测意图,确定为第一预测意图。
在本申请的实施例中,终端在确定出第一目标文字对应的N个预测意图之后,通过进一步获取N个预测意图N个预测意图对应的N个权重,并将N个权重中的、数值最大的权重对应的预测意图,确定为第一预测意图。
需要说明的是,在本申请的实施例中,终端通过预设预测模型对第一目标文字进行预测意图匹配时,不仅可以确定出第一目标文字对应的N个预测意图,同时也得到了N个预测意图对应的权重值,该权重值反映了预测意图的准确性,进一步地,由于需要从得到的N个预测意图中确定出与第一目标文字对应的唯一一个预测意图,即可能性较大的明确的用户意图,因此,需要对N个预测意图对应的N个权重值进行比较,将最大权重值对应的预测意图,也就是准确性最高的预测意图作为第一目标文字对应的预测意图,即第一预测意图。
例如,意图预测模块根据第一目标文字“我的订单”确定出了“物流查询”、“单号查询”以及“签收异常”等多个预测意图,且同时得到了“物流查询”的权重为0.45,“单号查询”的权重为0.3,“签收异常”的权重为0.25,由此可见,预测意图“物流查询”对应的权重值最大,表示用户意图较可能为“物流查询”,因此,将第一目标文字“我的订单”对应的多个预测意图中,权重值最大的预测意图“物流查询”作为第一预测意图。
本申请实施例提供了一种应答方法,终端通过实时语音识别处理,对输入的语音信 息进行连续意图预测,提前进行答案组装,并将答案暂存,在判定当前满足预设应答条件时,推送答案以实现应答处理。不仅提高了应答处理效率,同时,还克服了意图丢失的缺陷,进一步提高了应答处理的准确性,终端智能性更高。
基于上述实施例,在本申请的另一实施例中,终端根据第一预测意图和第二预测意图,判断是否满足预设应答条件的方法可以包括以下步骤:
步骤301、确定第一预测意图对应的第一权重,和第二预测意图对应的第二权重。
在本申请的实施例中,终端在确定出第一预测意图和第二预测意图之后,可以进一步确定出第一预测意图对应的第一权重,以及第二预测意图对应的第二权重。
需要说明的是,本申请的实施例中,在通过预设预测模型对目标文字进行预测意图匹配时,可以获取到目标文字对应的预测意图以及该预测意图的权重值,该权重值反映了确定出的预测意图的准确性。也就是说,第一权重为在第一时刻通过预设预测模型匹配出的,准确性最高的第一预测意图对应的权重值,相应地,第二权重为在第二时刻通过预设预测模型匹配出的,准确性最高的第二预测意图对应的权重值。
进一步地,终端确定出第一预测意图对应的第一权重,以及第二预测意图对应的第二权重之后,确定出的第一预测意图对应的第一权重以及第二预测意图对应的第二权重可以进一步用于对是否满足预设应答条件进行判断。
步骤302、当第一预测意图与第二预测意图相同,且第一权重和第二权重均大于预设权重阈值时,判定满足预设应答条件;其中,预设权重阈值用于对预测意图的准确性进行确定。
在本申请的实施例中,在终端确定出第一预测意图对应的第一权重和第二预测意图对应的第二权重之后,可以在第一预测意图与第二预测意图相同,且第一权重和第二权重均大于预设权重阈值时,判定出满足预设应答条件。
需要说明的是,本申请的实施例中,在第一预测意图与第二预测意图一致时,并不会判定满足预设应答条件,而是再进一步根据第一权重和第二权重判断是否满足预设应答条件。具体地,在第一预测意图与第二预测意图相同,且第一权重和第二权重均大于预设权重阈值时,才判定出满足预设应答条件。其中,预设权重阈值为满足预设应答条件的权重值。
示例性地,假定预设权重阈值为0.7,在第一时刻对应的第一预测意图为“物流查询”,第二时刻对应的第二预测意图也是“物流查询”的情况下,可以看出,此时第一预测意图与第二预测意图一致,但是并不能判定出满足预设应答条件,还需要进一步根据第一权重和第二权重判断是否满足预设应答条件,若第一权重为0.75,第二权重为0.81,此时,不仅第一预测意图与第二预测意图一致,第一权重和第二权重也均大于预设权重阈值,则可以判定满足预设应答条件。
进一步的,若第一权重和第二权重中存在至少一个小于预设权重阈值,则可以判定不满足预设应答条件。
本申请实施例提供了一种应答方法,终端通过实时语音识别处理,对输入的语音信 息进行连续意图预测,提前进行答案组装,并将答案暂存,在判定当前满足预设应答条件时,推送答案以实现应答处理。不仅提高了应答处理效率,同时,还克服了意图丢失的缺陷,进一步提高了应答处理的准确性,终端智能性更高。
基于上述实施例,在本申请的另一实施例中,图7为本申请实施例提出的终端的组成结构示意图一,如图7示,本申请实施例提出的终端20可以包括确定部分21,判断部分22,处理部分23,存储部分24以及设置部分25。
所述确定部分21,配置为在第一时刻通过语音识别处理确定所述第一时刻对应的第一目标文字;以及根据所述第一目标文字确定第一预测意图和待推送答案;其中,所述待推送答案配置为对语音信息进行应答;以及继续通过所述语音识别处理,确定第二时刻对应的第二目标文字和第二预测意图;其中,所述第二时刻为所述第一时刻连续的下一个时刻;
所述判断部分22,配置为根据所述第一预测意图和所述第二预测意图,判断是否满足预设应答条件;
所述处理部分23,配置为若判定满足所述预设应答条件,则按照所述待推送答案进行应答处理。
在本申请的实施例中,所述确定部分21,具体配置为获取所述第一时刻对应的第一语音信息;以及对所述第一语音信息进行所述语音识别处理,将所述第一语音信息转换成所述第一目标文字。
在本申请的实施例中,所述确定部分21,还具体配置为通过预设预测模型对所述第一目标文字进行预测意图匹配,确定与所述第一目标文字对应的N个预测意图;其中,所述预设预测模型为基于深度学习建立的模型,N为大于1的整数;以及从所述N个预测意图中确定所述第一预测意图;以及根据所述第一预测意图确定所述第一答案,并将所述第一答案作为所述待推送答案。
在本申请的实施例中,所述确定部分21,还具体配置为获取所述N个预测意图对应的N个权重;其中,一个预测意图对应一个权重;以及将所述N个权重中的、数值最大的权重对应的预测意图,确定为所述第一预测意图。
在本申请的实施例中,所述确定部分21,还具体配置为获取所述第一预测意图对应的特征信息;以及根据所述特征信息和预设算法确定所述第一答案,其中,所述预设算法配置为基于所述特征信息进行答案组装。
在本申请的实施例中,所述存储部分24,配置为将所述第一答案作为所述待推送答案之后,存储所述待推送答案。
在本申请的实施例中,所述确定部分21,还具体配置为获取所述第二时刻对应的第二语音信息;以及对所述第二语音信息进行所述语音识别处理,确定所述第二语音信息对应的实时文字;以及根据所述第一目标文字和所述实时文字确定所述第二目标文字。
在本申请的实施例中,所述判断部分22,具体配置为当所述第一预测意图与所述第二预测意图相同时,判定满足所述预设应答条件;以及当所述第一预测意图与所述第二 预测意图不相同时,判定不满足所述预设应答条件。
在本申请的实施例中,所述判断部分22,还具体配置为确定所述第一预测意图对应的第一权重,和所述第二预测意图对应的第二权重;以及当所述第一预测意图与所述第二预测意图相同,且所述第一权重和所述第二权重均大于预设权重阈值时,判定满足所述预设应答条件;其中,所述预设权重阈值用于对所述预测意图的准确性进行确定。
在本申请的实施例中,所述确定部分21,还配置为根据所述第一预测意图和所述第二预测意图,判断是否满足预设应答条件之后,若判定不满足所述预设应答条件,则根据所述第二预测意图确定第二答案。
在本申请的实施例中,所述设置部分25,配置为将所述第二答案设置为所述待推送答案。
在本申请的实施例中,所述确定部分21,还具体配置为继续通过所述语音识别处理,确定下一时刻对应的第三目标文字和第三预测意图。
在本申请的实施例中,所述判断部分22,还配置为重新根据所述第二预测意图和所述第三预测意图,判断是否满足所述预设应答条件,以继续实现所述应答处理。
在本申请的实施例中,所述处理部分23,具体配置为对所述待推送答案进行语音合成处理,确定目标语音;以及播放所述目标语音,以实现所述应答处理。
在本申请的实施例中,进一步地,图8本申请实施例提出的终端的组成结构示意图二,如图8示,本申请实施例提出的终端20还可以包括处理器26、存储有处理器26可执行指令的存储器27,进一步地,终端20还可以包括通信接口28,和用于连接处理器26、存储器27以及通信接口28的总线29。
在本申请的实施例中,上述处理器26可以为特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital Signal Processing Device,DSPD)、可编程逻辑装置(ProgRAMmable Logic Device,PLD)、现场可编程门阵列(Field ProgRAMmable Gate Array,FPGA)、中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器中的至少一种。可以理解地,对于不同的设备,用于实现上述处理器功能的电子器件还可以为其它,本申请实施例不作具体限定。终端20还可以包括存储器27,该存储器27可以与处理器26连接,其中,存储器27用于存储可执行程序代码,该程序代码包括计算机操作指令,存储器27可能包含高速RAM存储器,也可能还包括非易失性存储器,例如,至少两个磁盘存储器。
在本申请的实施例中,总线29用于连接通信接口28、处理器26以及存储器27以及这些器件之间的相互通信。
在本申请的实施例中,存储器27,用于存储指令和数据。
进一步地,在本申请的实施例中,上述处理器26,用于在第一时刻通过语音识别处理确定所述第一时刻对应的第一目标文字;根据所述第一目标文字确定第一预测意图和待推送答案;其中,所述待推送答案用于对语音信息进行应答;继续通过所述语音识别 处理,确定第二时刻对应的第二目标文字和第二预测意图;其中,所述第二时刻为所述第一时刻连续的下一个时刻;根据所述第一预测意图和所述第二预测意图,判断是否满足预设应答条件;若判定满足所述预设应答条件,则按照所述待推送答案进行应答处理。
在实际应用中,上述存储器27可以是易失性存储器(volatile memory),例如随机存取存储器(Random-Access Memory,RAM);或者非易失性存储器(non-volatile memory),例如只读存储器(Read-Only Memory,ROM),快闪存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,并向处理器26提供指令和数据。
另外,在本实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
集成的单元在以软件功能模块的形式实现并非作为独立的产品进行销售或使用的情况下,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对相关技术中技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请实施例提供的一种终端,该终端在第一时刻通过语音识别处理确定第一时刻对应的第一目标文字;根据第一目标文字确定第一预测意图和待推送答案;其中,待推送答案用于对语音信息进行应答;继续通过语音识别处理,确定第二时刻对应的第二目标文字和第二预测意图;其中,第二时刻为第一时刻连续的下一个时刻;根据第一预测意图和第二预测意图,判断是否满足预设应答条件;若判定满足预设应答条件,则按照待推送答案进行应答处理。也就是说,在本申请的实施例中,终端通过实时语音识别处理,对输入的语音信息进行连续意图预测,提前进行答案组装,并将答案暂存,在判定当前满足预设应答条件时,推送答案以实现应答处理。不仅提高了应答处理效率,同时,还克服了意图丢失的缺陷,进一步提高了应答处理的准确性,终端智能性更高。
本申请实施例提供一种计算机可读存储介质,其上存储有程序,该程序被处理器执行时实现如上所述的应答方法。
具体来讲,本实施例中的一种应答方法对应的程序指令可以被存储在光盘,硬盘,U盘等存储介质上,当存储介质中的与一种应答方法对应的程序指令被一电子设备读取或被执行时,包括如下步骤:
在第一时刻通过语音识别处理确定所述第一时刻对应的第一目标文字;
根据所述第一目标文字确定第一预测意图和待推送答案;其中,所述待推送答案用于对语音信息进行应答;
继续通过所述语音识别处理,确定第二时刻对应的第二目标文字和第二预测意图;其中,所述第二时刻为所述第一时刻连续的下一个时刻;
根据所述第一预测意图和所述第二预测意图,判断是否满足预设应答条件;
若判定满足所述预设应答条件,则按照所述待推送答案进行应答处理。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的实现流程示意图和/或方框图来描述的。应理解可由计算机程序指令实现流程示意图和/或方框图中的每一流程和/或方框、以及实现流程示意图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。

Claims (17)

  1. 一种应答方法,所述方法包括:
    在第一时刻通过语音识别处理确定所述第一时刻对应的第一目标文字;
    根据所述第一目标文字确定第一预测意图和待推送答案;其中,所述待推送答案用于对语音信息进行应答;
    继续通过所述语音识别处理,确定第二时刻对应的第二目标文字和第二预测意图;其中,所述第二时刻为所述第一时刻连续的下一个时刻;
    根据所述第一预测意图和所述第二预测意图,判断是否满足预设应答条件;
    若判定满足所述预设应答条件,则按照所述待推送答案进行应答处理。
  2. 根据权利要求1所述的方法,其中,所述在第一时刻通过语音识别处理确定所述第一时刻对应的第一目标文字,包括:
    获取所述第一时刻对应的第一语音信息;
    对所述第一语音信息进行所述语音识别处理,将所述第一语音信息转换成所述第一目标文字。
  3. 根据权利要求1所述的方法,其中,所述根据所述第一目标文字确定第一预测意图和待推送答案,包括:
    通过预设预测模型对所述第一目标文字进行预测意图匹配,确定与所述第一目标文字对应的N个预测意图;其中,所述预设预测模型为基于深度学习建立的模型,N为大于1的整数;
    从所述N个预测意图中确定所述第一预测意图;
    根据所述第一预测意图确定所述第一答案,并将所述第一答案作为所述待推送答案。
  4. 根据权利要求3所述的方法,其中,所述从所述N个预测意图中确定所述第一预测意图,包括:
    获取所述N个预测意图对应的N个权重;其中,一个预测意图对应一个权重;
    将所述N个权重中的、数值最大的权重对应的预测意图,确定为所述第一预测意图。
  5. 根据权利要求3所述的方法,其中,所述根据所述第一预测意图确定所述第一答案,包括:
    获取所述第一预测意图对应的特征信息;
    根据所述特征信息和预设算法确定所述第一答案,其中,所述预设算法用于基于所述特征信息进行答案组装。
  6. 根据权利要求1所述的方法,其中,所述确定第二时刻对应的第二目标文字,包括:
    获取所述第二时刻对应的第二语音信息;
    对所述第二语音信息进行所述语音识别处理,确定所述第二语音信息对应的实时文 字;
    根据所述第一目标文字和所述实时文字确定所述第二目标文字。
  7. 根据权利要求1所述的方法,其中,所述根据所述第一预测意图和所述第二预测意图,判断是否满足预设应答条件,包括:
    当所述第一预测意图与所述第二预测意图相同时,判定满足所述预设应答条件;
    当所述第一预测意图与所述第二预测意图不相同时,判定不满足所述预设应答条件。
  8. 根据权利要求1所述的方法,其中,所述根据所述第一预测意图和所述第二预测意图,判断是否满足预设应答条件,包括:
    确定所述第一预测意图对应的第一权重,和所述第二预测意图对应的第二权重;
    当所述第一预测意图与所述第二预测意图相同,且所述第一权重和所述第二权重均大于预设权重阈值时,判定满足所述预设应答条件;其中,所述预设权重阈值用于对所述预测意图的准确性进行确定。
  9. 根据权利要求1所述的方法,其中,所述根据所述第一预测意图和所述第二预测意图,判断是否满足预设应答条件之后,所述方法还包括:
    若判定不满足所述预设应答条件,则根据所述第二预测意图确定第二答案;
    将所述第二答案设置为所述待推送答案;
    继续通过所述语音识别处理,确定下一时刻对应的第三目标文字和第三预测意图,重新根据所述第二预测意图和所述第三预测意图,判断是否满足所述预设应答条件,以继续实现所述应答处理。
  10. 根据权利要求1所述的方法,其中,所述按照所述待推送答案进行应答处理,包括:
    对所述待推送答案进行语音合成处理,确定目标语音;
    播放所述目标语音,以实现所述应答处理。
  11. 一种终端,所述终端包括:确定部分,判断部分以及处理部分,
    所述确定部分,配置为在第一时刻通过语音识别处理确定所述第一时刻对应的第一目标文字;以及根据所述第一目标文字确定第一预测意图和待推送答案;其中,所述待推送答案用于对语音信息进行应答;以及继续通过所述语音识别处理,确定第二时刻对应的第二目标文字和第二预测意图;其中,所述第二时刻为所述第一时刻连续的下一个时刻;
    所述判断部分,配置为根据所述第一预测意图和所述第二预测意图,判断是否满足预设应答条件;
    所述处理部分,配置为若判定满足所述预设应答条件,则按照所述待推送答案进行应答处理。
  12. 根据权利要求11所述的终端,其中,
    所述确定部分,具体配置为通过预设预测模型对所述第一目标文字进行预测意图匹配,确定与所述第一目标文字对应的N个预测意图;其中,所述预设预测模型为基于深 度学习建立的模型,N为大于1的整数;以及从所述N个预测意图中确定所述第一预测意图;以及根据所述第一预测意图确定所述第一答案,并将所述第一答案作为所述待推送答案。
  13. 根据权利要求12所述的终端,其中,
    所述确定部分,还具体配置为获取所述N个预测意图对应的N个权重;其中,一个预测意图对应一个权重;以及将所述N个权重中的、数值最大的权重对应的预测意图,确定为所述第一预测意图。
  14. 根据权利要求11所述的终端,其中,
    所述判断部分,具体配置为当所述第一预测意图与所述第二预测意图相同时,判定满足所述预设应答条件;以及当所述第一预测意图与所述第二预测意图不相同时,判定不满足所述预设应答条件。
  15. 根据权利要求11所述的终端,其中,
    所述判断部分,还具体配置为确定所述第一预测意图对应的第一权重,和所述第二预测意图对应的第二权重;以及当所述第一预测意图与所述第二预测意图相同,且所述第一权重和所述第二权重均大于预设权重阈值时,判定满足所述预设应答条件;其中,所述预设权重阈值用于对所述预测意图的准确性进行确定。
  16. 一种终端,所述终端包括处理器、存储有所述处理器可执行指令的存储器,当所述指令被所述处理器执行时,实现如权利要求1-10任一项所述的方法。
  17. 一种计算机可读存储介质,其上存储有程序,应用于终端中,所述程序被处理器执行时,实现如权利要求1-10任一项所述的方法。
PCT/CN2020/111150 2019-11-21 2020-08-25 应答方法、终端及存储介质 WO2021098318A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/775,406 US20220399013A1 (en) 2019-11-21 2020-08-25 Response method, terminal, and storage medium
EP20890060.5A EP4053836A4 (en) 2019-11-21 2020-08-25 RESPONSE PROCEDURE, TERMINAL AND REGISTRATION MEDIA

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911147594.8A CN111739506B (zh) 2019-11-21 2019-11-21 一种应答方法、终端及存储介质
CN201911147594.8 2019-11-21

Publications (1)

Publication Number Publication Date
WO2021098318A1 true WO2021098318A1 (zh) 2021-05-27

Family

ID=72645938

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111150 WO2021098318A1 (zh) 2019-11-21 2020-08-25 应答方法、终端及存储介质

Country Status (4)

Country Link
US (1) US20220399013A1 (zh)
EP (1) EP4053836A4 (zh)
CN (1) CN111739506B (zh)
WO (1) WO2021098318A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112365888B (zh) * 2020-10-14 2023-12-22 深圳追一科技有限公司 意图识别方法、装置、计算机设备和存储介质
CN113779206A (zh) * 2020-11-11 2021-12-10 北京沃东天骏信息技术有限公司 一种数据处理方法、装置、电子设备和存储介质
CN113643696A (zh) * 2021-08-10 2021-11-12 阿波罗智联(北京)科技有限公司 语音处理方法、装置、设备、存储介质及程序

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015004928A (ja) * 2013-06-24 2015-01-08 日本電気株式会社 応答対象音声判定装置、応答対象音声判定方法および応答対象音声判定プログラム
CN106649694A (zh) * 2016-12-19 2017-05-10 北京云知声信息技术有限公司 语音交互中确定用户意图的方法及装置
CN107590120A (zh) * 2016-07-07 2018-01-16 深圳狗尾草智能科技有限公司 人工智能处理方法及装置
CN108257616A (zh) * 2017-12-05 2018-07-06 苏州车萝卜汽车电子科技有限公司 人机对话的检测方法以及装置
WO2019040167A1 (en) * 2017-08-25 2019-02-28 Microsoft Technology Licensing, Llc UNDERSTANDING CONTEXTUAL SPEECH LANGUAGE IN A SPEAKED DIALOGUE SYSTEM
CN109410948A (zh) * 2018-09-07 2019-03-01 北京三快在线科技有限公司 通信方法、装置、系统、计算机设备以及可读存储介质
CN109670020A (zh) * 2018-12-11 2019-04-23 苏州创旅天下信息技术有限公司 一种语音交互方法、系统及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9318108B2 (en) * 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US10297254B2 (en) * 2016-10-03 2019-05-21 Google Llc Task initiation using long-tail voice commands by weighting strength of association of the tasks and their respective commands based on user feedback
CN107146610B (zh) * 2017-04-10 2021-06-15 易视星空科技无锡有限公司 一种用户意图的确定方法及装置
CN109994108B (zh) * 2017-12-29 2023-08-29 微软技术许可有限责任公司 用于聊天机器人和人之间的会话交谈的全双工通信技术
JP6999230B2 (ja) * 2018-02-19 2022-01-18 アルパイン株式会社 情報処理システム及びコンピュータプログラム
CN110046221B (zh) * 2019-03-01 2023-12-22 平安科技(深圳)有限公司 一种机器对话方法、装置、计算机设备及存储介质
CN110060663A (zh) * 2019-04-28 2019-07-26 北京云迹科技有限公司 一种应答服务的方法、装置及系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015004928A (ja) * 2013-06-24 2015-01-08 日本電気株式会社 応答対象音声判定装置、応答対象音声判定方法および応答対象音声判定プログラム
CN107590120A (zh) * 2016-07-07 2018-01-16 深圳狗尾草智能科技有限公司 人工智能处理方法及装置
CN106649694A (zh) * 2016-12-19 2017-05-10 北京云知声信息技术有限公司 语音交互中确定用户意图的方法及装置
WO2019040167A1 (en) * 2017-08-25 2019-02-28 Microsoft Technology Licensing, Llc UNDERSTANDING CONTEXTUAL SPEECH LANGUAGE IN A SPEAKED DIALOGUE SYSTEM
CN108257616A (zh) * 2017-12-05 2018-07-06 苏州车萝卜汽车电子科技有限公司 人机对话的检测方法以及装置
CN109410948A (zh) * 2018-09-07 2019-03-01 北京三快在线科技有限公司 通信方法、装置、系统、计算机设备以及可读存储介质
CN109670020A (zh) * 2018-12-11 2019-04-23 苏州创旅天下信息技术有限公司 一种语音交互方法、系统及装置

Also Published As

Publication number Publication date
EP4053836A1 (en) 2022-09-07
CN111739506A (zh) 2020-10-02
EP4053836A4 (en) 2022-12-28
CN111739506B (zh) 2023-08-04
US20220399013A1 (en) 2022-12-15

Similar Documents

Publication Publication Date Title
US20200258506A1 (en) Domain and intent name feature identification and processing
US11580991B2 (en) Speaker based anaphora resolution
US20240096323A1 (en) Privacy mode based on speaker identifier
US20210027785A1 (en) Conversational recovery for voice user interface
US10917758B1 (en) Voice-based messaging
CN110364143B (zh) 语音唤醒方法、装置及其智能电子设备
WO2021098318A1 (zh) 应答方法、终端及存储介质
US20240153505A1 (en) Proactive command framework
WO2019149108A1 (zh) 语音关键词的识别方法、装置、计算机可读存储介质及计算机设备
US8972260B2 (en) Speech recognition using multiple language models
JP6550068B2 (ja) 音声認識における発音予測
US9558740B1 (en) Disambiguation in speech recognition
WO2017076222A1 (zh) 语音识别方法及装置
US9070367B1 (en) Local speech recognition of frequent utterances
US11189277B2 (en) Dynamic gazetteers for personalized entity recognition
US10506088B1 (en) Phone number verification
CN110097870B (zh) 语音处理方法、装置、设备和存储介质
US10366690B1 (en) Speech recognition entity resolution
US10854191B1 (en) Machine learning models for data driven dialog management
CN111797632B (zh) 信息处理方法、装置及电子设备
US10504512B1 (en) Natural language speech processing application selection
US11410646B1 (en) Processing complex utterances for natural language understanding
US11348579B1 (en) Volume initiated communications
US11693622B1 (en) Context configurable keywords
CN111640423B (zh) 一种词边界估计方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20890060

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020890060

Country of ref document: EP

Effective date: 20220530

NENP Non-entry into the national phase

Ref country code: DE