CN112365892A - Man-machine interaction method, device, electronic device and storage medium - Google Patents

Man-machine interaction method, device, electronic device and storage medium Download PDF

Info

Publication number
CN112365892A
CN112365892A CN202011245627.5A CN202011245627A CN112365892A CN 112365892 A CN112365892 A CN 112365892A CN 202011245627 A CN202011245627 A CN 202011245627A CN 112365892 A CN112365892 A CN 112365892A
Authority
CN
China
Prior art keywords
information
response
state
text
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011245627.5A
Other languages
Chinese (zh)
Inventor
陈粮阳
谢恩宁
曹宇慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dasouche Auto Service Co ltd
Original Assignee
Hangzhou Dasouche Auto Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dasouche Auto Service Co ltd filed Critical Hangzhou Dasouche Auto Service Co ltd
Priority to CN202011245627.5A priority Critical patent/CN112365892A/en
Publication of CN112365892A publication Critical patent/CN112365892A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to a man-machine conversation method, a man-machine conversation device, an electronic device and a storage medium. The man-machine conversation method comprises the following steps: receiving the current turn of dialogue voice of a user, and preprocessing the dialogue voice to obtain text information; processing the text information through a preset semantic analysis model to obtain intention information; acquiring historical response information, and determining the conversation state of the current turn according to the historical response information and the intention information; and configuring response information corresponding to the conversation state according to a preset response configuration model, and generating response voice corresponding to the response information. Through the application, the problems of low conversation efficiency and poor conversation effect of a conversation system in the correlation technique are solved, the quick and effective outward-calling function of the AI robot in each scene is realized, the labor cost is reduced, and the beneficial effects of conversation efficiency and conversation effect are improved.

Description

Man-machine interaction method, device, electronic device and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a human-machine interaction method, device, electronic device, and storage medium.
Background
In recent years, artificial intelligence technology has been rapidly developed, and products related to intelligent voice technology have entered thousands of households. People are increasingly accustomed to talking to machines and have a higher expectation of understanding and answering capabilities of machines. The Speech-based dialog system framework adopts an Automatic Speech Recognition (ASR) model and a Natural Language Understanding (NLU) model, and the work flow comprises the following steps: firstly, the voice of a user is converted into characters through an ASR model, then, the NLU model is used for semantic analysis, and finally, the intention of the user is obtained.
The dialogue system in the related technology needs a large amount of dialogue labeling corpora to carry out model training, and can achieve good dialogue effect after long data accumulation, but as the applied dialogue scenes of the dialogue system increase, the dialogue system is updated iteratively at high frequency, and the long-period dialogue system does not meet the requirements of the dialogue system.
At present, no effective solution is provided for the problems of low conversation efficiency and poor conversation effect of a conversation system in the related art.
Disclosure of Invention
The embodiment of the application provides a man-machine conversation method, a man-machine conversation device, an electronic device and a storage medium, and aims to at least solve the problems of low conversation efficiency and poor conversation effect of a conversation system in the related art.
In a first aspect, an embodiment of the present application provides a man-machine interaction method, including: receiving conversation voice of a current turn of a user, and preprocessing the conversation voice to obtain text information, wherein the preprocessing comprises text conversion and text error correction; processing the text information through a preset semantic analysis model to obtain intention information, wherein the intention information at least comprises an intention corresponding to the current turn of the user; obtaining historical response information, and determining the dialog state of the current turn according to the historical response information and the intention information, wherein the historical response information comprises response information generated according to the dialog state of the dialog of the previous turn; configuring the response information corresponding to the dialogue state according to a preset response configuration model, and generating response voice corresponding to the response information, wherein the preset response configuration model at least comprises one of the following: a dialogue strategy learning model and a knowledge base question-and-answer model.
In some embodiments, preprocessing the conversational speech to obtain text information includes:
performing text conversion processing on the dialogue voice through an automatic voice recognition technology to obtain a text to be processed;
and inputting the text to be processed into a text error correction model for text error correction to obtain the text information, wherein the text error correction model is generated by training according to a first sample text of preset semantic information, a second sample text without text errors and a third sample text with text errors.
In some embodiments, the intention information further includes slot position information, and processing the text information through a preset semantic analysis model to obtain the intention information includes:
performing natural language understanding processing on the text information to obtain candidate intention data, wherein the candidate intention data comprises candidate intentions and candidate slot position information;
detecting first intention data in the candidate intention data according to a preset intention recognition model, wherein the preset intention recognition model at least comprises one of the following items: the method comprises the following steps of (1) a regular matching model, a pre-training semantic matching model and an intention slot position joint model;
in an instance in which the first intent data is detected, determining that the intent information includes the first intent data, wherein the first intent data includes an intent corresponding to the user's current turn and the slot information.
In some of these embodiments, determining the dialog state for the current turn based on the historical answer information and the intent information includes:
inputting the historical response information and the intention information into a dialogue state tracking model, and acquiring a first characteristic value, wherein the first characteristic value comprises a semantic characteristic value associated with the historical response information and the intention information;
and detecting a preset state characteristic value in the first characteristic value, and determining the corresponding state of the current turn according to the preset state characteristic value.
In some embodiments, configuring the response information corresponding to the dialog state according to a preset response configuration model includes:
extracting first state semantic information of the dialog state, wherein the first state semantic information at least comprises state semantics corresponding to the intention information;
inputting the first state semantic information into the preset response configuration model to obtain the response information, wherein the preset response configuration model comprises one of the following: a dialogue strategy learning model and a knowledge base question-and-answer model.
In some embodiments, the preset response configuration model includes a conversation strategy learning model and a knowledge base question-and-answer model, and configuring the response information corresponding to the conversation state according to the preset response configuration model includes:
extracting second state semantic information of the dialog state, wherein the second state semantic information at least comprises state semantics corresponding to the intention information;
inputting the second state semantic information into the dialogue strategy learning model, and inquiring robot speech information corresponding to the second state semantic information, wherein the dialogue strategy learning model is generated by training according to first preset state semantic information and the robot speech information corresponding to the first preset state semantic information;
and under the condition that the robot talk information corresponding to the second state semantic information is not inquired, inputting the second state semantic information into the knowledge base question-answer model, acquiring response text information corresponding to the second state semantic information, and determining that the response text information comprises the response text information, wherein the knowledge base question-answer model comprises second preset state semantic information and response text information corresponding to the second preset state semantic information.
In some embodiments, in a case where the robot speech information corresponding to the second state semantic information is queried, it is determined that the response information includes the robot speech information corresponding to the second state semantic information.
In some of these embodiments, generating the response voice corresponding to the response information includes: and carrying out voice conversion on the response information to generate the response voice.
In a second aspect, an embodiment of the present application provides a human-machine interaction device, including:
the conversion module is used for receiving the conversation voice of the current turn of the user and preprocessing the conversation voice to obtain text information, wherein the preprocessing comprises text conversion and text error correction;
the generating module is used for processing the text information through a preset semantic analysis model to obtain intention information, wherein the intention information at least comprises an intention corresponding to the current turn of the user;
the processing module is used for acquiring historical response information and determining the conversation state of the current turn according to the historical response information and the intention information, wherein the historical response information comprises response information generated according to the conversation state of the conversation of the previous turn;
a response module, configured to configure the response information corresponding to the dialog state according to a preset response configuration model, and generate a response voice corresponding to the response information, where the preset response configuration model at least includes one of: a dialogue strategy learning model and a knowledge base question-and-answer model.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the human-machine interaction method according to the first aspect.
In a fourth aspect, the present application provides a storage medium, in which a computer program is stored, where the computer program is configured to execute the human-computer interaction method according to the first aspect when the computer program runs.
Compared with the related art, the man-machine conversation method, the man-machine conversation device, the electronic device and the storage medium provided by the embodiment of the application receive the conversation voice of the current turn of the user and preprocess the speaking voice to obtain the text information; processing the text information through a preset semantic analysis model to obtain intention information; acquiring historical response information, and determining the conversation state of the current turn according to the historical response information and the intention information; the method and the system have the advantages that the response information corresponding to the conversation state is configured according to the preset response configuration model, the response voice corresponding to the response information is generated, the problems that conversation systems in the related art are low in conversation efficiency and poor in conversation effect are solved, the AI robot outbound function of each scene is achieved quickly and effectively, labor cost is reduced, and conversation efficiency and conversation effect are improved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a block diagram of a hardware configuration of a terminal of a man-machine conversation method of an embodiment of the present invention;
FIG. 2 is a flow diagram of a human-machine dialog method according to an embodiment of the application;
fig. 3 is a block diagram of a human-machine interaction device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
Various techniques described herein may be used for intent recognition, slot information acquisition, dialog state confirmation in a dialog system.
Before describing and explaining embodiments of the present application, a description will be given of the related art used in the present application as follows:
automatic Speech Recognition (ASR) is a technology for converting human Speech into text.
Natural Language Understanding (NLU), which processes a sentence input by a user or a result of speech recognition, extracts a dialog intention of the user and information transferred by the user.
Dialog State Tracking (DST), which infers the current Dialog State and user goals from all Dialog history information.
Dialog Policy Learning (DPL) selects the next appropriate action based on the current Dialog state.
The method comprises the steps of (KBQA) giving natural language questions, carrying out semantic understanding and analysis on the questions, and then utilizing a Knowledge Base to carry out inquiry and reasoning to obtain answers.
Text-To-Speech (TTS) is a technique for converting Text To human Speech.
bert denotes the pre-trained speech characterization model/pre-trained model, and jointbort denotes the intent slot combination model.
The embodiment of the man-machine conversation method provided by the embodiment can be executed in a terminal, a computer or a similar test platform. Taking the operation on the terminal as an example, fig. 1 is a hardware structure block diagram of the man-machine interaction method operation terminal according to the embodiment of the present invention. As shown in fig. 1, the terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used for storing computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the man-machine interaction method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the above-mentioned method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The embodiment provides a man-machine conversation method, and fig. 2 is a flowchart of the man-machine conversation method according to the embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:
step S201, receiving the current turn of dialogue voice of the user, and preprocessing the dialogue voice to obtain text information, wherein the preprocessing comprises text conversion and text error correction.
In this embodiment, the dialog voices of the current turn include the dialog voices of the robot and the dialog voices of the user, and what the dialog system desires to complete in the dialog is to acquire a corresponding intention according to the dialog voices of the user so as to make an action corresponding to the intention, where the corresponding action includes a reply according to the dialog voices of the user.
In this embodiment, after the current turn of the dialog speech of the user is obtained, the dialog system performs ASR recognition on the dialog speech into a text, and performs text error correction; in the ASR recognition process, there will be recognition errors, which further cause a great difference between the generated text and the original semantics of the user, and further require error correction of the text recognized by ASR, for example: aiming at the inquiry of the robot, the answer of the user is 'buy and buy', but in the process of setting ASR, the user can recognize a 'good and good' text, and at the moment, the text information obtained by text error correction is changed into 'buy and good'.
Step S202, processing the text information through a preset semantic analysis model to obtain intention information, wherein the intention information at least comprises the intention corresponding to the current turn of the user.
In this embodiment, natural language understanding is performed on the text information after the text error correction, and a corresponding intention and slot are generated.
In specific embodiments, such as: natural speech understanding (NLU recognition) is carried out on the 'bought and bought' to obtain the intention of 'bought car', and certainly, the intention signal corresponding to the 'bought and bought' conversation speech does not comprise a slot position; for another example: when the inquiry is made to the robot, the user answers "XX 320, how much is you XX 320? ", for" XX320, how much money is your XX 320? "natural speech understanding (NLU recognition) is performed, and the intention of" asking for a car price "and" vehicle type: the slot of XX 320'.
In this embodiment, the generated intent is determined by intent recognition, the intent recognition including at least one of: regular matching, bert semantic matching, jointbert model; wherein the content of the first and second substances,
regular matching and bert semantic matching can achieve the effect of intention identification only by configuring a regular expression and a key sentence, and can be applied to the early cold start stage and the scene of newly added intention.
After the data are accumulated to a certain degree, multi-round intention recognition is carried out by using a pre-trained jointbert model, so that the intention recognition accuracy is improved.
In this embodiment, the generated slot is obtained by slot extraction, where the slot extraction at least includes one of the following: regular matching, a bert entity labeling model and a jointbert model; wherein the content of the first and second substances,
the regular matching and bert entity labeling model can be suitable for the early cold start stage and the scene of a newly added slot, wherein the training data of the bert entity labeling model can adopt an open-source universal data set.
And when the data are accumulated to a certain degree, using a pre-trained jointbert model to perform multi-round slot extraction and improve the accuracy of slot extraction.
Step S203, obtaining historical response information, and determining the dialog state of the current turn according to the historical response information and the intention information, wherein the historical response information comprises response information generated according to the dialog state of the dialog of the previous turn.
In the present embodiment, the confirmation of the dialog state of the current round includes two ways of configuring the dialog state by DST configuration and generating the dialog state by jointbert model, wherein,
DST configuration comprises intention mapping, AI tracking, label reasoning, secondary label generation and the like, and can be applied to scenes of early cold start and newly-added states.
And when the data are accumulated to a certain degree, using the pre-trained jointbert model to perform multi-round state updating and improve the accuracy of state updating.
In this embodiment, determining the dialog state of the current round is based on the robot expression (corresponding to the query before the current round) of the historical round (the dialog before the dialog of the current round) and the answer confirmation corresponding to the dialog voice of the user of the current round, for example: the machine expression for the historical round is: "ask you to have the intention of buying car recently", the robot expression has the semantic of asking about the intention of buying car, and the dialogue voice of the user in the current turn is: under the condition of buying, determining the conversation state of the current turn as 'purchased vehicle' based on the semantics of the intention of inquiring the purchased vehicle and the answer corresponding to the conversation voice of the user; another example is: the machine expression for the historical round is: "what car does you buy? "the expression of the robot has the semantics of inquiring the vehicle type, and the dialogue voice of the user in the current turn is: "XX 320, how much you XX 320? "in the case of the present invention, the dialog state of the current turn is determined to be" car purchased, car price asked, car purchased type: XX320 ″.
Step S204, configuring response information corresponding to the conversation state according to a preset response configuration model, and generating response voice corresponding to the response information, wherein the preset response configuration model at least comprises one of the following: a dialogue strategy learning model and a knowledge base question-and-answer model.
In the present embodiment, configuring the response information corresponding to the dialog state includes configuring through a Dialog Policy (DPL) model and configuring through a knowledge base question and answer (KBAQ) model, wherein,
the DPL configuration includes a global match, branch match, positive negative match, unhealthy, unmatched, and allopathic flow functions for dealing with most of the dialog content dominated by robots, such as: and configuring response information in a robot guiding type inquiry mode.
The KBQA configuration is to adopt NLP2SQL to carry out database query to generate a corresponding robot reply for the query content which is dominated by the user.
In this embodiment, after the response information is configured, the response information of the robot is generated into response voice by TTS technology and is transmitted back to the user terminal.
Through the steps S201 to S204, receiving the conversation voice of the current turn of the user, and preprocessing the conversation voice to obtain text information; processing the text information through a preset semantic analysis model to obtain intention information; acquiring historical response information, and determining the conversation state of the current turn according to the historical response information and the intention information; the method and the system have the advantages that the response information corresponding to the conversation state is configured according to the preset response configuration model, the response voice corresponding to the response information is generated, the problems that conversation systems in the related art are low in conversation efficiency and poor in conversation effect are solved, the AI robot outbound function of each scene is achieved quickly and effectively, labor cost is reduced, and conversation efficiency and conversation effect are improved.
It should be noted that, in the embodiment of the present application, the NLU configuration and the NLU model are combined, so as to improve the implementation efficiency and effect of the intent recognition and the slot extraction; by combining DST configuration and a DST model, the realization efficiency and effect of state updating are improved; DPL configuration and KBQA are combined, so that the realization efficiency and effect of the speech transfer are improved; under the condition that the dialogue data volume is small, both the NLU model and the DST model adopt a model based on bert pre-training, and under the condition that the dialogue data volume is accumulated to a preset data threshold value, a jointbert model is adopted, so that the problems of intention identification, slot position extraction and state updating are solved, and the accuracy is improved.
In some embodiments, the pre-processing of the conversational speech to obtain the textual information comprises the steps of:
step 1, performing text conversion processing on the dialogue voice through an automatic voice recognition technology to obtain a text to be processed.
In this embodiment, after the current turn of the dialog speech of the user is acquired, the dialog system performs ASR recognition on the spoken speech into a text, where the text recognized by the ASR is a text to be processed.
And 2, inputting the text to be processed into a text error correction model for text error correction to obtain text information, wherein the text error correction model is generated by training according to a first sample text of preset semantic information, a second sample text without text errors and a third sample text with text errors.
In this embodiment, in the ASR recognition process, there may be a recognition error, which may cause a great difference between the generated text and the original semantics of the user, and further require error correction on the text recognized by ASR.
Specifically, for the inquiry of the robot, the original semantic of the user is "buy", but in the ASR identification process, the user can recognize the text as "good", and at this time, the text information obtained through text error correction becomes "buy"; and for the text correction model, where the first sample text corresponds to "good", the second sample text corresponds to "good", and the third sample text corresponds to some erroneous text associated with "good".
Performing text conversion processing on the dialogue voice through an automatic voice recognition technology in the steps to obtain a text to be processed; the text to be processed is input into the text error correction model for text error correction to obtain text information, so that the text information of the user dialogue voice can be accurately obtained.
In some embodiments, the intention information further includes slot position information, and processing the text information through a preset semantic analysis model to obtain the intention information includes the following steps:
step 1, natural language understanding processing is carried out on the text information to obtain candidate intention data, wherein the candidate intention data comprises candidate intentions and candidate slot position information.
In this embodiment, natural language understanding is performed on the text information after text error correction, and corresponding candidate intentions and candidate slot position information are generated.
Step 2, detecting first intention data in the candidate intention data according to a preset intention recognition model, wherein the preset intention recognition model at least comprises one of the following items: the system comprises a regular matching model, a pre-training semantic matching model and an intention slot position joint model.
And 3, under the condition that the first intention data is detected, determining that the intention information comprises the first intention data, wherein the first intention data comprises the intention corresponding to the current turn of the user and the slot position information.
Performing natural language understanding processing on the text information in the steps to obtain candidate intention data; detecting first intention data in the candidate intention data according to a preset intention recognition model; under the condition that the first intention data is detected, the intention information is determined to comprise the first intention data, wherein the first intention data comprises the intention and the slot position information corresponding to the current turn of the user, intention identification and slot position extraction are achieved, and the achievement efficiency and effect of intention identification and slot position extraction are improved.
In some embodiments, determining the dialog state for the current turn based on the historical response information and the intent information includes the steps of:
step 1, inputting historical response information and intention information into a dialogue state tracking model, and obtaining a first characteristic value, wherein the first characteristic value comprises semantic characteristic values related to the historical response information and the intention information.
In this embodiment, the first feature value is determined according to the historical response information and the intention information of the current round, and at the same time, the first feature value includes a plurality of semantic feature values associated with the historical response information and the intention information.
And 2, detecting a preset state characteristic value in the first characteristic value, and determining the corresponding state of the current turn according to the preset state characteristic value.
In this embodiment, the preset state feature value is a target state feature value, and the target state feature value is strongly related to the state of the current round, for example: the target status feature value is "buy, how much money the AA320 has", then it can be determined that the status of the current round includes at least the status of the purchase, the ask price, the product model.
Inputting historical response information and intention information into a dialogue state tracking model in the steps to obtain a first characteristic value; the preset state characteristic value is detected in the first characteristic value, the corresponding state of the current turn is determined according to the preset state characteristic value, the conversation state of the current turn is determined according to the historical response information and the intention information, and the realization efficiency and the effect of state updating are improved through the conversation state tracking model.
In some embodiments, configuring the response information corresponding to the dialog state according to the preset response configuration model includes the following steps:
step 1, extracting first state semantic information of a conversation state, wherein the first state semantic information at least comprises state semantics corresponding to intention information.
In this embodiment, the first state semantic information is information for describing a current turn of dialog state, and the first state semantic information includes intention information and slot position information of the user.
Step 2, inputting the first state semantic information into a preset response configuration model to acquire response information, wherein the preset response configuration model comprises one of the following: a dialogue strategy learning model and a knowledge base question-and-answer model.
In the embodiment, the configuration of the response information corresponding to the dialog state is to input the semantic information of the first state as a data source, and configure the corresponding response information according to the input data source through a dialog strategy (DPL) model and/or a knowledge base question answering (KBAQ) model.
In this embodiment, a dialog strategy (DPL) model is adopted, the semantic information of the first state is used as a data source, and an action and a response suitable for a next dialog turn, that is, response information corresponding to a current dialog turn state, are selected.
In this embodiment, a knowledge base question answering (KBAQ) model is adopted, the semantic information of the first state is used as a data basis for performing knowledge base query and reasoning out response information, and the database query is adopted to obtain the response information corresponding to the current round of conversation state.
Extracting first state semantic information of the dialog state in the steps; and inputting the semantic information of the first state into a preset response configuration model, acquiring response information, and configuring response information corresponding to the current turn of conversation state.
In some embodiments, the preset response configuration model includes a conversation strategy learning model and a knowledge base question-and-answer model, and configuring the response information corresponding to the conversation state according to the preset response configuration model includes the following steps:
step 1, extracting second state semantic information of the dialog state, wherein the second state semantic information at least comprises state semantics corresponding to the intention information.
In this embodiment, the second state semantic information is information for describing a current turn of dialog state, and the first state semantic information includes intention information and slot position information of the user.
And 2, inputting the second state semantic information into a dialogue strategy learning model, and inquiring robot dialect information corresponding to the second state semantic information, wherein the dialogue strategy learning model is generated by training according to the first preset state semantic information and the robot dialect information corresponding to the first preset state semantic information.
In this embodiment, a dialogue strategy model is adopted, the semantic information of the second state is used as a data source, and the robot tactical information, that is, the response information corresponding to the current round of dialogue state, is selected.
And 3, under the condition that the robot dialect information corresponding to the second state semantic information is not inquired, inputting the second state semantic information into a knowledge base question-answer model, acquiring response text information corresponding to the second state semantic information, and determining that the response information comprises the response text information, wherein the knowledge base question-answer model comprises second preset state semantic information and response text information corresponding to the second preset state semantic information.
In this embodiment, when the dialog strategy learning (DPL) model is not located and configured to the corresponding dialect of the semantic information of the second state, a database query is performed according to a knowledge base question answering (KBAQ) model, and a response information corresponding to the dialog state of the current turn is obtained by using the database query.
Extracting second state semantic information of the dialog state in the steps; inputting the semantic information of the second state into a dialogue strategy learning model, and inquiring robot dialogue information corresponding to the semantic information of the second state; and under the condition that the robot dialect information corresponding to the second state semantic information is not inquired, inputting the second state semantic information into a knowledge base question-answer model, acquiring response text information corresponding to the second state semantic information, determining that the response text information comprises the response text information, realizing configuration of the response information corresponding to the current turn of dialog state, and combining a dialog strategy learning (DPL) model with a knowledge base question-answer (KBAQ) model to improve the realization efficiency and effect of the dialect circulation.
In some embodiments, configuring the response information corresponding to the dialog state according to the preset response configuration model includes the following steps: and determining that the response information includes the robot speech information corresponding to the second state semantic information when the robot speech information corresponding to the second state semantic information is queried.
In some embodiments, generating the response voice corresponding to the response information comprises the following steps: and carrying out voice conversion on the response information to generate the response voice.
Following human-machine conversation process analysis with the conversation of the specific embodiment
Examples of dialogs are as follows:
robotic surgery: "ask you an intention to have a car recently? "
User conversational speech: "good and good". "
Robotic surgery: "what car does you buy? "
User conversational speech: "AA 320, how much money is you AA 320? "
Robotic surgery: the official guide price of "AA 320 is about 38-40 ten thousand. "
The human-machine conversation process is analyzed as follows:
step 1, the robot asks "ask you for an intention to buy a car recently? "the dialogue voice replied by the user is recognized as 'good' and 'good' by ASR. "changed to" buy, buy by text error correction. ".
And 2, purchasing. "NLU recognition is performed to obtain the intention of" purchased car ".
And 3, automatically updating the conversation state of the current turn into 'purchased car' by the DST.
Step 4, according to the branch of the DPL, locating to the purchased car, configuring the response message (bots) automatically replied to? ".
Step 5, TTS will "what car did you buy? "to a voice reply to the user.
Step 6, the dialogue voice replied by the user is recognized by ASR and corrected with text to become' AA320, how much money is in your AA 320? ".
Step 7, for "AA 320, how much money is you AA 320? "NLU recognition is performed, and the intention of" asking for car price "and" vehicle type: the slot of AA 320'.
Step 8, the DST automatically updates the conversation state of the current turn to be' purchased car, inquiry car price and purchased car type: AA320 ".
And 9, firstly, the configuration of the DPL does not locate the configuration to the ' inquiry vehicle price ', then database inquiry is carried out according to the KBQA, and the ' official guide price of the AA320 corresponding to the current turn conversation state is about 38-40 ten thousand. "is used.
Step 10, the TTS will "the official guide price of AA320 is about 38-40 ten thousand. "to a voice reply to the user.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment also provides a man-machine interaction device, which is used for implementing the above embodiments and preferred embodiments, and the description of the device is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of a human-machine interaction device according to an embodiment of the present application, and as shown in fig. 3, the device includes:
the conversion module 31 is configured to receive a current turn of dialogue speech of a user, and preprocess the dialogue speech to obtain text information, where the preprocessing includes text conversion and text error correction;
the generating module 32 is coupled to the converting module 31 and configured to process the text information through a preset semantic analysis model to obtain intention information, where the intention information at least includes an intention corresponding to the current turn of the user;
the processing module 33 is coupled to the generating module 32, and is configured to acquire historical response information, and determine a dialog state of a current turn according to the historical response information and the intention information, where the historical response information includes response information generated according to a dialog state of a dialog of a previous turn;
the response module 34 is coupled to the processing module 33, and configured to configure response information corresponding to the dialog state according to a preset response configuration model, and generate a response voice corresponding to the response information, where the preset response configuration model at least includes one of: a dialogue strategy learning model and a knowledge base question-and-answer model.
In some embodiments, the conversion module 31 is configured to perform text conversion processing on the dialog speech through an automatic speech recognition technology to obtain a text to be processed; inputting the text to be processed into a text error correction model for text error correction to obtain text information, wherein the text error correction model is generated by training according to a first sample text of preset semantic information, a second sample text without text errors and a third sample text with text errors.
In some embodiments, the intention information further includes slot position information, and the generating module 32 is configured to perform natural language understanding processing on the text information to obtain candidate intention data, where the candidate intention data includes a candidate intention and candidate slot position information; detecting first intention data in the candidate intention data according to a preset intention recognition model, wherein the preset intention recognition model at least comprises one of the following items: the method comprises the following steps of (1) a regular matching model, a pre-training semantic matching model and an intention slot position joint model; in a case where the first intention data is detected, determining that the intention information includes the first intention data, wherein the first intention data includes an intention and slot position information corresponding to a current turn of the user.
In some embodiments, the processing module 33 is configured to input the historical response information and the intention information into the dialog state tracking model, and obtain a first feature value, where the first feature value includes a semantic feature value associated with the historical response information and the intention information; and detecting a preset state characteristic value in the first characteristic value, and determining the corresponding state of the current turn according to the preset state characteristic value.
In some embodiments, the response module 34 is configured to extract first state semantic information of the dialog state, where the first state semantic information includes at least a state semantic corresponding to the intention information; inputting the first state semantic information into a preset response configuration model to acquire response information, wherein the preset response configuration model comprises one of the following: a dialogue strategy learning model and a knowledge base question-and-answer model.
In some embodiments, the preset response configuration model includes a dialogue strategy learning model and a knowledge base question-and-answer model, and the response module 3 is configured to extract second state semantic information of the dialogue state, where the second state semantic information at least includes state semantics corresponding to the intention information; inputting the second state semantic information into a dialogue strategy learning model, and inquiring robot dialect information corresponding to the second state semantic information, wherein the dialogue strategy learning model is generated by training according to the first preset state semantic information and the robot dialect information corresponding to the first preset state semantic information; and under the condition that the robot dialect information corresponding to the second state semantic information is not inquired, inputting the second state semantic information into a knowledge base question-answer model, acquiring response text information corresponding to the second state semantic information, and determining that the response information comprises the response text information, wherein the knowledge base question-answer model comprises second preset state semantic information and response text information corresponding to the second preset state semantic information.
In some embodiments, the response module 3 is configured to determine that the response information includes the robot speech information corresponding to the second state semantic information when the robot speech information corresponding to the second state semantic information is queried.
In some embodiments, the response module 3 is configured to perform voice conversion on the response message to generate a response voice.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
and S1, receiving the current turn of dialogue voice of the user, and preprocessing the dialogue voice to obtain text information, wherein the preprocessing comprises text conversion and text error correction.
And S2, processing the text information through a preset semantic analysis model to obtain intention information, wherein the intention information at least comprises the intention corresponding to the current turn of the user.
And S3, acquiring historical response information, and determining the dialog state of the current turn according to the historical response information and the intention information, wherein the historical response information comprises response information generated according to the dialog state of the dialog of the previous turn.
S4, configuring response information corresponding to the dialogue state according to a preset response configuration model, and generating response voice corresponding to the response information, wherein the preset response configuration model at least comprises one of the following: a dialogue strategy learning model and a knowledge base question-and-answer model.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, in combination with the man-machine interaction method in the foregoing embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the human-machine interaction methods of the above embodiments.
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (11)

1. A method for human-computer interaction, comprising:
receiving conversation voice of a current turn of a user, and preprocessing the conversation voice to obtain text information, wherein the preprocessing comprises text conversion and text error correction;
processing the text information through a preset semantic analysis model to obtain intention information, wherein the intention information at least comprises an intention corresponding to the current turn of the user;
obtaining historical response information, and determining the dialog state of the current turn according to the historical response information and the intention information, wherein the historical response information comprises response information generated according to the dialog state of the dialog of the previous turn;
configuring the response information corresponding to the dialogue state according to a preset response configuration model, and generating response voice corresponding to the response information, wherein the preset response configuration model at least comprises one of the following: a dialogue strategy learning model and a knowledge base question-and-answer model.
2. The human-computer conversation method of claim 1, wherein preprocessing the conversation speech to obtain text information comprises:
performing text conversion processing on the dialogue voice through an automatic voice recognition technology to obtain a text to be processed;
and inputting the text to be processed into a text error correction model for text error correction to obtain the text information, wherein the text error correction model is generated by training according to a first sample text of preset semantic information, a second sample text without text errors and a third sample text with text errors.
3. The human-computer interaction method according to claim 1, wherein the intention information further includes slot position information, and the processing the text information through a preset semantic analysis model to obtain the intention information includes:
performing natural language understanding processing on the text information to obtain candidate intention data, wherein the candidate intention data comprises candidate intentions and candidate slot position information;
detecting first intention data in the candidate intention data according to a preset intention recognition model, wherein the preset intention recognition model at least comprises one of the following items: the method comprises the following steps of (1) a regular matching model, a pre-training semantic matching model and an intention slot position joint model;
in an instance in which the first intent data is detected, determining that the intent information includes the first intent data, wherein the first intent data includes an intent corresponding to the user's current turn and the slot information.
4. The human-computer conversation method of claim 1, wherein determining a conversation state of a current turn based on the historical response information and the intention information comprises:
inputting the historical response information and the intention information into a dialogue state tracking model, and acquiring a first characteristic value, wherein the first characteristic value comprises a semantic characteristic value associated with the historical response information and the intention information;
and detecting a preset state characteristic value in the first characteristic value, and determining the corresponding state of the current turn according to the preset state characteristic value.
5. The human-computer interaction method of claim 1, wherein configuring the response information corresponding to the interaction state according to a preset response configuration model comprises:
extracting first state semantic information of the dialog state, wherein the first state semantic information at least comprises state semantics corresponding to the intention information;
inputting the first state semantic information into the preset response configuration model to obtain the response information, wherein the preset response configuration model comprises one of the following: a dialogue strategy learning model and a knowledge base question-and-answer model.
6. The human-computer conversation method according to claim 1, wherein the preset response configuration model comprises a conversation strategy learning model and a knowledge base question-and-answer model, and configuring the response information corresponding to the conversation state according to the preset response configuration model comprises:
extracting second state semantic information of the dialog state, wherein the second state semantic information at least comprises state semantics corresponding to the intention information;
inputting the second state semantic information into the dialogue strategy learning model, and inquiring robot speech information corresponding to the second state semantic information, wherein the dialogue strategy learning model is generated by training according to first preset state semantic information and the robot speech information corresponding to the first preset state semantic information;
and under the condition that the robot talk information corresponding to the second state semantic information is not inquired, inputting the second state semantic information into the knowledge base question-answer model, acquiring response text information corresponding to the second state semantic information, and determining that the response text information comprises the response text information, wherein the knowledge base question-answer model comprises second preset state semantic information and response text information corresponding to the second preset state semantic information.
7. The human-computer interaction method according to claim 6, wherein in a case where the robot speech information corresponding to the second state semantic information is queried, it is determined that the response information includes the robot speech information corresponding to the second state semantic information.
8. The human-computer conversation method according to claim 1, wherein generating a response voice corresponding to the response information comprises: and carrying out voice conversion on the response information to generate the response voice.
9. A human-computer interaction device, comprising:
the conversion module is used for receiving the conversation voice of the current turn of the user and preprocessing the conversation voice to obtain text information, wherein the preprocessing comprises text conversion and text error correction;
the generating module is used for processing the text information through a preset semantic analysis model to obtain intention information, wherein the intention information at least comprises an intention corresponding to the current turn of the user;
the processing module is used for acquiring historical response information and determining the conversation state of the current turn according to the historical response information and the intention information, wherein the historical response information comprises response information generated according to the conversation state of the conversation of the previous turn;
a response module, configured to configure the response information corresponding to the dialog state according to a preset response configuration model, and generate a response voice corresponding to the response information, where the preset response configuration model at least includes one of: a dialogue strategy learning model and a knowledge base question-and-answer model.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the human-machine interaction method of any one of claims 1 to 7.
11. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the human-computer interaction method of any one of claims 1 to 10 when executed.
CN202011245627.5A 2020-11-10 2020-11-10 Man-machine interaction method, device, electronic device and storage medium Pending CN112365892A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011245627.5A CN112365892A (en) 2020-11-10 2020-11-10 Man-machine interaction method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011245627.5A CN112365892A (en) 2020-11-10 2020-11-10 Man-machine interaction method, device, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN112365892A true CN112365892A (en) 2021-02-12

Family

ID=74508459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011245627.5A Pending CN112365892A (en) 2020-11-10 2020-11-10 Man-machine interaction method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN112365892A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860873A (en) * 2021-03-23 2021-05-28 北京小米移动软件有限公司 Intelligent response method, device and storage medium
CN112966077A (en) * 2021-02-26 2021-06-15 北京三快在线科技有限公司 Method, device and equipment for determining conversation state and storage medium
CN112988997A (en) * 2021-03-12 2021-06-18 中国平安财产保险股份有限公司 Response method and system of intelligent customer service, computer equipment and storage medium
CN113160813A (en) * 2021-02-24 2021-07-23 北京三快在线科技有限公司 Method and device for outputting response information, electronic equipment and storage medium
CN113270103A (en) * 2021-05-27 2021-08-17 平安普惠企业管理有限公司 Intelligent voice dialogue method, device, equipment and medium based on semantic enhancement
CN113360622A (en) * 2021-06-22 2021-09-07 中国平安财产保险股份有限公司 User dialogue information processing method and device and computer equipment
CN113656572A (en) * 2021-08-26 2021-11-16 支付宝(杭州)信息技术有限公司 Conversation processing method and system
CN113821731A (en) * 2021-11-23 2021-12-21 湖北亿咖通科技有限公司 Information push method, device and medium
CN113990302A (en) * 2021-09-14 2022-01-28 北京左医科技有限公司 Telephone follow-up voice recognition method, device and system
CN114490994A (en) * 2022-03-28 2022-05-13 北京沃丰时代数据科技有限公司 Conversation management method and device
CN114936561A (en) * 2022-04-11 2022-08-23 阿里巴巴(中国)有限公司 Voice text processing method and device, storage medium and processor
CN114970559A (en) * 2022-05-18 2022-08-30 马上消费金融股份有限公司 Intelligent response method and device
WO2022252946A1 (en) * 2021-06-03 2022-12-08 广州小鹏汽车科技有限公司 Voice control method, voice control device, server, and storage medium
CN116050427A (en) * 2022-12-30 2023-05-02 北京百度网讯科技有限公司 Information generation method, training device, electronic equipment and storage medium
CN117332072A (en) * 2023-12-01 2024-01-02 阿里云计算有限公司 Dialogue processing, voice abstract extraction and target dialogue model training method

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003316801A (en) * 2002-04-25 2003-11-07 Nec Corp Answering system, answering device, answering method and answering program
CN106354835A (en) * 2016-08-31 2017-01-25 上海交通大学 Artificial dialogue auxiliary system based on context semantic understanding
CN106534548A (en) * 2016-11-17 2017-03-22 科大讯飞股份有限公司 Voice error correction method and device
CN106776578A (en) * 2017-01-03 2017-05-31 竹间智能科技(上海)有限公司 Talk with the method and device of performance for lifting conversational system
CN107369443A (en) * 2017-06-29 2017-11-21 北京百度网讯科技有限公司 Dialogue management method and device based on artificial intelligence
CN107562911A (en) * 2017-09-12 2018-01-09 北京首科长昊医疗科技有限公司 More wheel interaction probabilistic model training methods and auto-answer method
US20180158459A1 (en) * 2016-12-06 2018-06-07 Panasonic Intellectual Property Management Co., Ltd. Information processing method, information processing apparatus, and non-transitory recording medium
CN108304561A (en) * 2018-02-08 2018-07-20 北京信息职业技术学院 A kind of semantic understanding method, equipment and robot based on finite data
CN108369521A (en) * 2015-09-02 2018-08-03 埃丹帝弗有限公司 Intelligent virtual assistance system and correlation technique
CN109344242A (en) * 2018-09-28 2019-02-15 广东工业大学 A kind of dialogue answering method, device, equipment and storage medium
CN109492221A (en) * 2018-10-31 2019-03-19 广东小天才科技有限公司 A kind of information replying method and wearable device based on semantic analysis
CN109688281A (en) * 2018-12-03 2019-04-26 复旦大学 A kind of intelligent sound exchange method and system
CN110083110A (en) * 2019-01-23 2019-08-02 艾肯特公司 End to end control method and control system based on natural intelligence
CN110196901A (en) * 2019-06-28 2019-09-03 北京百度网讯科技有限公司 Construction method, device, computer equipment and the storage medium of conversational system
WO2019174450A1 (en) * 2018-03-15 2019-09-19 北京京东尚科信息技术有限公司 Dialogue generation method and apparatus
CN110765252A (en) * 2019-10-18 2020-02-07 北京邮电大学 Knowledge-driven task-oriented dialogue management method and system easy to configure
CN110888966A (en) * 2018-09-06 2020-03-17 微软技术许可有限责任公司 Natural language question-answer
CN111105782A (en) * 2019-11-27 2020-05-05 深圳追一科技有限公司 Session interaction processing method and device, computer equipment and storage medium
CN111477231A (en) * 2019-01-24 2020-07-31 科沃斯商用机器人有限公司 Man-machine interaction method, device and storage medium
WO2020177592A1 (en) * 2019-03-05 2020-09-10 京东方科技集团股份有限公司 Painting question answering method and device, painting question answering system, and readable storage medium
CN111858884A (en) * 2020-06-24 2020-10-30 南京美桥信息科技有限公司 Method and system for robot to learn real person deep dialogue content
CN111862955A (en) * 2020-06-23 2020-10-30 北京嘀嘀无限科技发展有限公司 Voice recognition method, terminal and computer readable storage medium

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003316801A (en) * 2002-04-25 2003-11-07 Nec Corp Answering system, answering device, answering method and answering program
CN108369521A (en) * 2015-09-02 2018-08-03 埃丹帝弗有限公司 Intelligent virtual assistance system and correlation technique
CN106354835A (en) * 2016-08-31 2017-01-25 上海交通大学 Artificial dialogue auxiliary system based on context semantic understanding
CN106534548A (en) * 2016-11-17 2017-03-22 科大讯飞股份有限公司 Voice error correction method and device
US20180158459A1 (en) * 2016-12-06 2018-06-07 Panasonic Intellectual Property Management Co., Ltd. Information processing method, information processing apparatus, and non-transitory recording medium
CN106776578A (en) * 2017-01-03 2017-05-31 竹间智能科技(上海)有限公司 Talk with the method and device of performance for lifting conversational system
CN107369443A (en) * 2017-06-29 2017-11-21 北京百度网讯科技有限公司 Dialogue management method and device based on artificial intelligence
CN107562911A (en) * 2017-09-12 2018-01-09 北京首科长昊医疗科技有限公司 More wheel interaction probabilistic model training methods and auto-answer method
CN108304561A (en) * 2018-02-08 2018-07-20 北京信息职业技术学院 A kind of semantic understanding method, equipment and robot based on finite data
WO2019174450A1 (en) * 2018-03-15 2019-09-19 北京京东尚科信息技术有限公司 Dialogue generation method and apparatus
CN110888966A (en) * 2018-09-06 2020-03-17 微软技术许可有限责任公司 Natural language question-answer
CN109344242A (en) * 2018-09-28 2019-02-15 广东工业大学 A kind of dialogue answering method, device, equipment and storage medium
CN109492221A (en) * 2018-10-31 2019-03-19 广东小天才科技有限公司 A kind of information replying method and wearable device based on semantic analysis
CN109688281A (en) * 2018-12-03 2019-04-26 复旦大学 A kind of intelligent sound exchange method and system
CN110083110A (en) * 2019-01-23 2019-08-02 艾肯特公司 End to end control method and control system based on natural intelligence
CN111477231A (en) * 2019-01-24 2020-07-31 科沃斯商用机器人有限公司 Man-machine interaction method, device and storage medium
WO2020177592A1 (en) * 2019-03-05 2020-09-10 京东方科技集团股份有限公司 Painting question answering method and device, painting question answering system, and readable storage medium
CN110196901A (en) * 2019-06-28 2019-09-03 北京百度网讯科技有限公司 Construction method, device, computer equipment and the storage medium of conversational system
CN110765252A (en) * 2019-10-18 2020-02-07 北京邮电大学 Knowledge-driven task-oriented dialogue management method and system easy to configure
CN111105782A (en) * 2019-11-27 2020-05-05 深圳追一科技有限公司 Session interaction processing method and device, computer equipment and storage medium
CN111862955A (en) * 2020-06-23 2020-10-30 北京嘀嘀无限科技发展有限公司 Voice recognition method, terminal and computer readable storage medium
CN111858884A (en) * 2020-06-24 2020-10-30 南京美桥信息科技有限公司 Method and system for robot to learn real person deep dialogue content

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GRIOL, D: "Agent Simulation to Develop Interactive and User-Centered Conversational Agents", 《INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE》, 8 April 2011 (2011-04-08) *
甄江杰: "多层次语义模型在多轮对话系统中的研究与实现", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, 15 January 2019 (2019-01-15) *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160813A (en) * 2021-02-24 2021-07-23 北京三快在线科技有限公司 Method and device for outputting response information, electronic equipment and storage medium
CN113160813B (en) * 2021-02-24 2022-12-27 北京三快在线科技有限公司 Method and device for outputting response information, electronic equipment and storage medium
CN112966077B (en) * 2021-02-26 2022-06-07 北京三快在线科技有限公司 Method, device and equipment for determining conversation state and storage medium
CN112966077A (en) * 2021-02-26 2021-06-15 北京三快在线科技有限公司 Method, device and equipment for determining conversation state and storage medium
CN112988997A (en) * 2021-03-12 2021-06-18 中国平安财产保险股份有限公司 Response method and system of intelligent customer service, computer equipment and storage medium
CN112860873A (en) * 2021-03-23 2021-05-28 北京小米移动软件有限公司 Intelligent response method, device and storage medium
CN112860873B (en) * 2021-03-23 2024-03-05 北京小米移动软件有限公司 Intelligent response method, device and storage medium
CN113270103A (en) * 2021-05-27 2021-08-17 平安普惠企业管理有限公司 Intelligent voice dialogue method, device, equipment and medium based on semantic enhancement
WO2022252946A1 (en) * 2021-06-03 2022-12-08 广州小鹏汽车科技有限公司 Voice control method, voice control device, server, and storage medium
CN113360622A (en) * 2021-06-22 2021-09-07 中国平安财产保险股份有限公司 User dialogue information processing method and device and computer equipment
CN113360622B (en) * 2021-06-22 2023-10-24 中国平安财产保险股份有限公司 User dialogue information processing method and device and computer equipment
CN113656572A (en) * 2021-08-26 2021-11-16 支付宝(杭州)信息技术有限公司 Conversation processing method and system
CN113990302A (en) * 2021-09-14 2022-01-28 北京左医科技有限公司 Telephone follow-up voice recognition method, device and system
CN113821731A (en) * 2021-11-23 2021-12-21 湖北亿咖通科技有限公司 Information push method, device and medium
CN114490994A (en) * 2022-03-28 2022-05-13 北京沃丰时代数据科技有限公司 Conversation management method and device
CN114490994B (en) * 2022-03-28 2022-06-28 北京沃丰时代数据科技有限公司 Conversation management method and device
CN114936561A (en) * 2022-04-11 2022-08-23 阿里巴巴(中国)有限公司 Voice text processing method and device, storage medium and processor
CN114970559A (en) * 2022-05-18 2022-08-30 马上消费金融股份有限公司 Intelligent response method and device
CN114970559B (en) * 2022-05-18 2024-02-02 马上消费金融股份有限公司 Intelligent response method and device
CN116050427B (en) * 2022-12-30 2023-10-27 北京百度网讯科技有限公司 Information generation method, training device, electronic equipment and storage medium
CN116050427A (en) * 2022-12-30 2023-05-02 北京百度网讯科技有限公司 Information generation method, training device, electronic equipment and storage medium
CN117332072A (en) * 2023-12-01 2024-01-02 阿里云计算有限公司 Dialogue processing, voice abstract extraction and target dialogue model training method
CN117332072B (en) * 2023-12-01 2024-02-13 阿里云计算有限公司 Dialogue processing, voice abstract extraction and target dialogue model training method

Similar Documents

Publication Publication Date Title
CN112365892A (en) Man-machine interaction method, device, electronic device and storage medium
CN109616108B (en) Multi-turn dialogue interaction processing method and device, electronic equipment and storage medium
US20200301954A1 (en) Reply information obtaining method and apparatus
US11315560B2 (en) Method for conducting dialog between human and computer
CN110019687B (en) Multi-intention recognition system, method, equipment and medium based on knowledge graph
CN111737987B (en) Intention recognition method, device, equipment and storage medium
CN110347863A (en) Talk about art recommended method and device and storage medium
CN112084317B (en) Method and apparatus for pre-training language model
CN112364622B (en) Dialogue text analysis method, device, electronic device and storage medium
CN108628908B (en) Method, device and electronic equipment for classifying user question-answer boundaries
CN117521675A (en) Information processing method, device, equipment and storage medium based on large language model
CN111178081A (en) Semantic recognition method, server, electronic device and computer storage medium
CN114155853A (en) Rejection method, device, equipment and storage medium
CN112256856A (en) Robot dialogue method, device, electronic device and storage medium
CN110517672B (en) User intention recognition method, user instruction execution method, system and equipment
CN113901837A (en) Intention understanding method, device, equipment and storage medium
CN114490955A (en) Intelligent dialogue method, device, equipment and computer storage medium
CN113342945A (en) Voice session processing method and device
CN116052646B (en) Speech recognition method, device, storage medium and computer equipment
CN115658875B (en) Data processing method based on chat service and related products
CN117370512A (en) Method, device, equipment and storage medium for replying to dialogue
CN111128127A (en) Voice recognition processing method and device
CN110222161B (en) Intelligent response method and device for conversation robot
CN111046149A (en) Content recommendation method and device, electronic equipment and storage medium
CN117493582B (en) Model result output method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination