CN116050427B - Information generation method, training device, electronic equipment and storage medium - Google Patents

Information generation method, training device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116050427B
CN116050427B CN202211742317.3A CN202211742317A CN116050427B CN 116050427 B CN116050427 B CN 116050427B CN 202211742317 A CN202211742317 A CN 202211742317A CN 116050427 B CN116050427 B CN 116050427B
Authority
CN
China
Prior art keywords
information
sample
understanding
dialogue
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211742317.3A
Other languages
Chinese (zh)
Other versions
CN116050427A (en
Inventor
李彬
胡江鹭
孙辉丰
孙叔琦
常月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202211742317.3A priority Critical patent/CN116050427B/en
Publication of CN116050427A publication Critical patent/CN116050427A/en
Application granted granted Critical
Publication of CN116050427B publication Critical patent/CN116050427B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The disclosure provides an information generation method, a training method, an information generation device, an electronic device and a storage medium, relates to the technical field of artificial intelligence, and particularly relates to the technical field of natural language processing and deep learning. The specific implementation scheme is as follows: semantic understanding is carried out on the query information to obtain understanding information, wherein the query information comprises historical dialogue information, and the understanding information comprises object actions and dialogue states; responding to the detected auxiliary request instruction, and obtaining auxiliary request information according to the query information and the understanding information; and generating dialogue response information according to the query information, the understanding information and the auxiliary request information.

Description

Information generation method, training device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to the field of natural language processing and deep learning. And in particular, to an information generation method, a training method, an apparatus, an electronic device, and a storage medium.
Background
With the development of artificial intelligence technology, task-based conversations (Task Oriented Dialogue, TOD) may be implemented using artificial intelligence technology.
Task-type conversations may refer to systems that need to go back through a limited conversation and access an external database to guide users through conversational tasks and achieve conversational goals. For example, the conversational tasks may include at least one of: query tasks, recommended tasks, and reservation tasks. The dialog purposes may include at least one of: inquiring weather, recommending scenic spots, reserving hotels, and the like.
Disclosure of Invention
The disclosure provides an information generation method, a training method, an apparatus, an electronic device and a storage medium.
According to an aspect of the present disclosure, there is provided an information generating method including: semantic understanding is carried out on query information to obtain understanding information, wherein the query information comprises historical dialogue information, and the understanding information comprises object actions and dialogue states; responding to the detection of the auxiliary request instruction, and obtaining auxiliary request information according to the query information and the understanding information; and generating dialogue response information according to the query information, the understanding information and the auxiliary request information.
According to another aspect of the present disclosure, there is provided a training method of a pre-training model, including: semantic understanding is carried out on first sample query information to obtain first sample understanding information, wherein the first sample query information comprises first sample history dialogue information, and the first sample understanding information comprises first sample object actions and first sample dialogue states; obtaining first sample auxiliary request information according to the first sample inquiry information and the first sample understanding information; generating first sample opposite-speaking response information according to the first sample query information, the first sample understanding information and the first sample auxiliary request information; and training a pre-training dialogue generation model by using the first sample query information, the first sample understanding information and the first sample opposite-speaking response information to obtain an information generation model.
According to another aspect of the present disclosure, there is provided an information generating apparatus including: the first semantic understanding module is used for carrying out semantic understanding on the query information to obtain understanding information, wherein the query information comprises historical dialogue information, and the understanding information comprises object actions and dialogue states; the first obtaining module is used for responding to the detection of the auxiliary request instruction and obtaining auxiliary request information according to the query information and the understanding information; and a first generation module for generating dialogue response information according to the query information, the understanding information and the auxiliary request information.
According to another aspect of the present disclosure, there is provided a training apparatus of a pre-training model, including: the second semantic understanding module is used for carrying out semantic understanding on the first sample query information to obtain first sample understanding information, wherein the first sample query information comprises first sample history dialogue information, and the first sample understanding information comprises first sample object actions and first sample dialogue states; the second obtaining module is used for obtaining first sample auxiliary request information according to the first sample inquiry information and the first sample understanding information; the second generating module is used for generating first sample opposite call response information according to the first sample query information, the first sample understanding information and the first sample auxiliary request information; and the training module is used for training a pre-training dialogue generation model by using the first sample query information, the first sample understanding information and the first sample opposite-speaking response information to obtain an information generation model.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods described in the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer as described above to perform a method as described in the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described in the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 schematically illustrates an exemplary system architecture to which information generation methods, training methods of pre-training models, and apparatus may be applied, according to embodiments of the present disclosure;
FIG. 2 schematically illustrates a flow chart of an information generation method according to an embodiment of the disclosure;
FIG. 3A schematically illustrates an example schematic diagram of an information generation process according to an embodiment of the disclosure;
FIG. 3B schematically illustrates an example schematic diagram of an information generation process according to another embodiment of the present disclosure;
FIG. 4A schematically illustrates an example schematic diagram of generating dialog response information based on first fusion information in accordance with an embodiment of the disclosure;
FIG. 4B schematically illustrates an example schematic diagram of generating dialogue response information from first fusion information according to another embodiment of the disclosure;
FIG. 4C schematically illustrates an example schematic diagram of generating dialogue response information from first fusion information according to another embodiment of the disclosure;
FIG. 4D schematically illustrates an example schematic diagram of generating dialogue response information from first fusion information according to another embodiment of the present disclosure;
FIG. 5A schematically illustrates an example schematic diagram of semantic understanding of query information resulting in understanding information according to an embodiment of the present disclosure;
FIG. 5B schematically illustrates an example schematic diagram of semantic understanding of query information resulting in understanding information according to another embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a training method of a pre-training model according to an embodiment of the disclosure;
FIG. 7A schematically illustrates an example schematic diagram of a method of generating a real corpus in accordance with an embodiment of the disclosure;
FIG. 7B schematically illustrates an example schematic diagram of a method of generating a simulated corpus in accordance with an embodiment of the disclosure;
fig. 8 schematically shows a block diagram of an information generating apparatus according to an embodiment of the present disclosure;
FIG. 9 schematically illustrates a block diagram of a training apparatus of a pre-training model according to an embodiment of the present disclosure; and
fig. 10 schematically illustrates a block diagram of an electronic device adapted to implement the information generation method and the training method of the pre-training model, according to an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
For this reason, the embodiment of the present disclosure proposes an information generation scheme. For example, semantic understanding is performed on the query information to obtain understanding information. The query information includes historical dialog information, and the understanding information includes object actions and dialog states. And responding to the detection of the auxiliary request instruction, and obtaining auxiliary request information according to the query information and the understanding information. And generating dialogue response information according to the query information, the understanding information and the auxiliary request information.
According to the embodiment of the disclosure, since the understanding information is obtained by semantically understanding the query information, the understanding information of the dialogue understanding can be obtained. Since the auxiliary request information is obtained from the query information and the understanding information in response to detecting the auxiliary request instruction, the external knowledge can be effectively utilized. On the basis, the dialogue response information is generated according to the query information, the understanding information and the auxiliary request information, so that the accuracy of the dialogue response information is improved.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.
Fig. 1 schematically illustrates an exemplary system architecture to which information generation methods, training methods of pre-training models, and apparatuses may be applied, according to embodiments of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the information generating method, the training method of the pre-training model, and the apparatus may be applied may include a terminal device, but the terminal device may implement the information generating method, the training method of the pre-training model, and the apparatus provided by the embodiments of the present disclosure without interacting with a server.
As shown in fig. 1, a system architecture 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types. Such as at least one of a wired and wireless communication link, etc. The terminal device may comprise at least one of a first terminal device 101, a second terminal device 102 and a third terminal device 103.
The user may interact with the server 105 through the network 104 using at least one of the first terminal device 101, the second terminal device 102, and the third terminal device 103 to receive or send messages or the like. At least one of the first terminal device 101, the second terminal device 102, and the third terminal device 103 may be installed with various communication client applications. For example, at least one of a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and social platform software, and the like.
The first terminal device 101, the second terminal device 102, and the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing. For example, the electronic device may include at least one of a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like.
The server 105 may be a server providing various services. For example, the server 105 may be a cloud server, also called a cloud computing server or a cloud host, which is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical hosts and VPS services (Virtual Private Server, virtual private servers).
Note that the information generating method provided by the embodiment of the present disclosure may be generally performed by one of the first terminal apparatus 101, the second terminal apparatus 102, and the third terminal apparatus 103. Accordingly, the information generating apparatus provided by the embodiment of the present disclosure may also be provided to one of the first terminal device 101, the second terminal device 102, and the third terminal device 103.
Alternatively, the information generation method provided by the embodiments of the present disclosure may also be generally performed by the server 105. Accordingly, the information generating apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The information generating method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. Accordingly, the information generating apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105.
It should be noted that, the training method of the pre-training model provided by the embodiments of the present disclosure may also be generally performed by the server 105. Accordingly, the training apparatus of the pre-training model provided by the embodiments of the present disclosure may be generally disposed in the server 105. The training method of the pre-training model provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. Accordingly, the training apparatus of the pre-training model provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105.
Alternatively, the training method of the pre-training model provided by the embodiments of the present disclosure may be generally performed by one of the first terminal device 101, the second terminal device 102, and the third terminal device 103. Accordingly, the training apparatus of the pre-training model provided in the embodiment of the present disclosure may also be provided in one of the first terminal device 101, the second terminal device 102, and the third terminal device 103.
It should be understood that the number of first terminal device, second terminal device, third terminal device networks and servers in fig. 1 is merely illustrative. There may be any number of first, second, third, network and server terminals, as desired for implementation.
It should be noted that the sequence numbers of the respective operations in the following methods are merely representative of the operations for the purpose of description, and should not be construed as representing the order of execution of the respective operations. The method need not be performed in the exact order shown unless explicitly stated.
Fig. 2 schematically shows a flowchart of an information generation method according to an embodiment of the present disclosure.
As shown in fig. 2, the method 200 includes operations S210 to S230.
In operation S210, semantic understanding is performed on the query information, and understanding information is obtained, wherein the query information includes historical dialog information, and the understanding information includes object actions and dialog states.
In response to detecting the assistance request instruction, assistance request information is obtained from the query information and the understanding information in operation S220.
In operation S230, dialogue response information is generated based on the query information, the understanding information, and the auxiliary request information.
According to embodiments of the present disclosure, in response to receiving a dialog instruction for an object, dialog response information may be generated. In the process of generating the dialogue response information, the query information may be acquired according to the dialogue instruction. The query information may be obtained by real-time acquisition. For example, the voice information of the object can be acquired by collecting the voice information of the object. Alternatively, the query information may be obtained from a data source. The data source may include at least one of: local databases, cloud databases, and network resources. For example, a data interface may be invoked, with which query information is obtained from a data source. Alternatively, the query information may be received from other terminal devices. The embodiment of the disclosure does not limit the acquisition mode of the query information.
According to embodiments of the present disclosure, the query information may include historical dialog information. The historical dialog information may originate from the interactive process of the system and the object. The historical dialog information may include at least one of: object history dialogue information and system history dialogue information by the current system time. The object history dialogue information may refer to user history dialogue information. The system history dialogue information may refer to system history reply information.
According to embodiments of the present disclosure, after query information is obtained, the query information may be parsed into structured, machine-readable understanding information by semantic understanding of the query information. The manner of semantic understanding may include at least one of: natural language understanding (Natural LanguageUnderstanding, NLU), dialog state tracking (Dialog State Tracking, DST), dialog policy learning (Dialogue Policy learning, DPL), and Natural language generation (NeuralLanguage Generation, NLG).
According to embodiments of the present disclosure, the understanding information may include at least one of: query domain (i.e., domain), query intent (i.e., intent), and query word slot (i.e., slot). The query field may refer to a semantic understanding scenario. The semantic understanding scenario may have corresponding query intent and query word slots. The semantic understanding scenario may include at least one of: chat, weather, map, radio, translation, stories, alarm clocks, people, news, music, and movies. Query intent may refer to the purpose an object is to express by interactive input. The query term slot may refer to a constraint imposed by an object under query intent.
According to embodiments of the present disclosure, natural language understanding may be used to understand query intent and query behavior of an object. Natural language understanding may include at least one of: segmentation processing, part-Of-Speech tagging (POS tag) processing, named entity recognition (Named Entity Recognition, NER) processing, syntax parsing processing, emotion analysis processing, keyword and abstract extraction processing and text analysis processing.
According to embodiments of the present disclosure, dialog state tracking may be used to extract entities and attributes from understood object semantic information to track completion of a current task in order to determine dialog states based on the completion. The dialog state may refer to a data structure including historical dialog information from zero time to current time, query fields, query intent, and query word slots. The manner of dialog state tracking may include at least one of: rules-based dialog state tracking, generative model-based dialog state tracking, and discriminant model-based dialog state tracking.
According to embodiments of the present disclosure, dialog policy learning may be used to determine object actions corresponding to dialog states in a knowledge base from the dialog states. An object action may refer to the purpose that an object corresponding to a query intent is to express through interactive input. Natural language generation may be used to generate a system reply in the form of natural language.
According to embodiments of the present disclosure, after obtaining the understanding information, the type of the request instruction may be determined. For example, the type of the request instruction may be determined from object annotation information in the dialog instruction. The type of request instruction may include one of: non-auxiliary request instructions and auxiliary request instructions. The non-auxiliary request instruction may refer to a request instruction that does not require an external resource to be invoked in the process of generating the call response information. In response to detecting the non-auxiliary request instruction, dialogue response information may be generated directly from the query information and the understanding information.
According to an embodiment of the present disclosure, the auxiliary request instruction may refer to a request instruction that needs to call an external resource in the process of generating the call response information. In response to detecting the auxiliary request instruction, auxiliary request information can be obtained according to the query information and the understanding information. The auxiliary request information may include an interface request parameter corresponding to the external resource. On this basis, dialogue response information can be generated based on the query information, the understanding information, and the auxiliary request information.
According to the embodiment of the disclosure, since the understanding information is obtained by semantically understanding the query information, the understanding information of the dialogue understanding can be obtained by reserving the process of the understanding stage. Since the auxiliary request information is obtained from the query information and the understanding information in response to detecting the auxiliary request instruction, the external knowledge can be effectively utilized. On the basis, the dialogue response information is generated according to the query information, the understanding information and the auxiliary request information, so that the accuracy of the dialogue response information is improved.
The information generating method 200 according to an embodiment of the present disclosure is further described below with reference to fig. 3A, 3B, 4A, 4B, 4C, 4D, 5A, and 5B.
According to an embodiment of the present disclosure, the query information further includes a query word slot.
According to embodiments of the present disclosure, a slot may refer to the information that is required to complete the conversion of an initial object intent into an explicit object instruction during a multi-round dialog. The slot may comprise at least one slot position. At least one slot has a corresponding slot filling mode. The properties of the slots may include one of: word slots and interface slots. Word slots may refer to a slot filling manner in which information is obtained through keywords of a user conversation. Interface slots may refer to slot filling schemes that obtain information by other means. For example, a query word slot may belong to a word slot.
According to embodiments of the present disclosure, a padding can be understood as a Sequence Tagging (Sequence tag) problem. Sequence labeling may refer to the process of giving an input sequence, labeling each position of the input sequence with a corresponding label using a first predetermined model, i.e., assigning each word in a continuous sequence to a corresponding semantic category label. The first predetermined model may be configured according to an actual service requirement, and may only need to implement a sequence labeling function, which is not limited herein.
For example, the first predetermined model may include at least one of: a first predetermined model based on the generative model and a first predetermined model based on the discriminant model. The first predetermined model based on the generative model may include at least one of: hidden Markov models (Hidden Markov Model, HMM) and hidden vector state (Hidden Vector State, HVS) models. The first predetermined model based on the discriminant model may include at least one of: conditional random field (Conditional Random Field, CRF) model, maximum entropy markov model (Maximum Entropy Markov Model, MEMM) and support vector machine (Support Vector Machine, SVM) model.
According to an embodiment of the present disclosure, operation S230 may include the following operations.
Auxiliary response information corresponding to the auxiliary request information is determined from the data source. And generating dialogue response information according to the query information, the understanding information and the auxiliary response information.
According to an embodiment of the present disclosure, the data source may include at least one of: a database and a knowledge base. The Knowledge Base (i.e., knowledgebase) may refer to a structured, easy to operate, easy to use, and fully organized Knowledge cluster in Knowledge engineering. Knowledge bases are a need for solving one or more domain problems. The knowledge base may comprise a collection of interrelated knowledge pieces stored, organized, managed, and used in computer memory in some way or ways of knowledge representation. The knowledge pieces in the knowledge piece set may include theoretical knowledge and fact data related to the domain. Alternatively, the knowledge pieces in the knowledge piece set may also include heuristic knowledge obtained from expert experience, such as definitions, theories and algorithms related in a field, common sense knowledge, and so on.
According to the embodiment of the present disclosure, after the assistance request information is obtained, the assistance response information corresponding to the assistance request information may be determined from the data source according to the assistance request information. For example, in the case where the data source is a knowledge base, the auxiliary response information may be used to characterize knowledge base results corresponding to the auxiliary request information. In this case, the dialogue response information may be generated based on the knowledge base result, the query information, the object action, and the dialogue state.
According to an embodiment of the present disclosure, the response aiding information is determined from a data source according to the request aiding information, the data source including at least one of a database and a knowledge base, and thus external knowledge can be effectively utilized. On this basis, since the dialogue response information is generated based on the query information, the understanding information, and the auxiliary response information, the update of the knowledge can be completed without retraining the model.
Fig. 3A schematically illustrates an example schematic diagram of an information generation process according to an embodiment of the present disclosure.
As shown in fig. 3A, in 300A, query information 301 may be semantically understood to obtain understanding information 302. In response to detecting the assistance request instruction, assistance request information 303 may be derived from query information 301 and understanding information 302.
After the auxiliary request information 303 is obtained, auxiliary response information 304 corresponding to the auxiliary request information 303 may be determined from the data source. After the auxiliary response information 304 is obtained, the dialogue response information 305 may be generated from the query information 301, the understanding information 302, and the auxiliary response information 304.
According to an embodiment of the present disclosure, generating dialogue response information from query information, understanding information, and auxiliary response information may include the following operations.
And fusing the query information, the understanding information and the auxiliary response information to obtain first fused information. And generating dialogue response information according to the first fusion information.
According to the embodiment of the disclosure, after the auxiliary response information is obtained, the query information, the understanding information and the auxiliary response information can be fused to obtain first fused information. The fusion may include at least one of: splicing and adding. For example, the query information, the understanding information and the auxiliary response information may be spliced to obtain the first fusion information. Alternatively, the query information, the understanding information, and the auxiliary response information may be subjected to addition processing to obtain the first fusion information. Alternatively, the query information, the understanding information, and the auxiliary response information may be subjected to a splicing process and an adding process, to obtain the first fusion information.
According to the embodiment of the disclosure, after the first fusion information is obtained, the first fusion information may be input into a second predetermined model to obtain the dialogue response information. The second predetermined model may include at least one of: a recurrent neural network (Recurrent Neural Networks, RNN) model, a Long Short-Term Memory (LSTM) model, and a transducer model. The second predetermined model may be configured according to actual service requirements, and may only need to be capable of implementing a function of generating the session response information, which is not limited herein.
Fig. 3B schematically illustrates an example schematic diagram of an information generation process according to another embodiment of the present disclosure.
As shown in fig. 3B, in 300B, query information 306 may be semantically understood, resulting in understanding information 307, as shown in fig. 3B. In response to detecting the assistance request instruction, assistance request information 308 may be derived from query information 306 and understanding information 307.
After obtaining the assistance request information 308, assistance reply information 309 corresponding to the assistance request information 308 may be determined from the data source. After obtaining the auxiliary response information 309, the query information 306, the understanding information 307, and the auxiliary response information 309 may be fused to obtain first fused information 310. After the first fused information 310 is obtained, dialogue response information 311 may be generated from the first fused information 310.
According to an embodiment of the present disclosure, generating dialogue response information according to the first fusion information may include the following operations.
And encoding the first fusion information to obtain first encoded information. And performing self-decoding on the first encoded information to obtain intermediate decoding information. And generating dialogue response information according to the first coding information and the intermediate decoding information.
According to the embodiment of the disclosure, after the first fusion information is obtained, the first fusion information may be input into a third predetermined model to obtain the dialogue response information. The third predetermined model may be configured according to an actual service requirement, and may only need to implement a function of generating the dialogue response information, which is not limited herein. For example, the third predetermined model may include a BoB (BERT over BERT) model. The third predetermined model may include a first encoder, a first decoder, and a second decoder. Multiple rounds of training may be performed on the first encoder, the first decoder, and the second decoder until a predetermined condition is met. The trained first encoder, first decoder and second decoder are determined as a third predetermined model.
According to an embodiment of the present disclosure, the first encoder may include at least one of: a first bidirectional long and short term memory network (Bidirectional Long Short Term Memory, biLSTM), a first gating loop unit (Gated Recurrent Unit, GRU), a first convolutional neural network (Convolutional Neural Networks, CNN), a first long and short term memory network, and a first recurrent neural network. The first encoder may include a first input layer and a first concealment layer. The first encoder may employ a linear transformation function. The first encoder may be configured to encode the first fusion information to obtain first encoded information. For example, the first fusion information may be encoded using a first input layer of a first encoder, resulting in a first intermediate vector. And processing the first intermediate vector by using a first hidden layer of the first encoder to obtain first encoded information.
According to an embodiment of the present disclosure, the first decoder may include at least one of: the system comprises a second bidirectional long-short-term memory network, a second gating circulation unit, a second convolution neural network, a second long-short-term memory network and a second circulation neural network. The first decoder may comprise an autoregressive decoder. The first decoder may include a second concealment layer and a first output layer. The first decoder may employ a linear transformation function. The first decoder may be configured to reconstruct the first encoded information to obtain intermediate decoded information to implement a response dialog reply. For example, the first encoded information may be self-decoded by using the second concealment layer of the first decoder to obtain the first auxiliary decoding information. And processing the first auxiliary decoding information by using a first output layer of the first decoder to obtain intermediate decoding information.
According to an embodiment of the present disclosure, the second decoder may include at least one of: the system comprises a third two-way long-short-term memory network, a third gating circulation unit, a third convolution neural network, a third long-short-term memory network and a third circulation neural network. The second decoder may include a third concealment layer and a second output layer. The second decoder may employ a ulikelihood function. The second decoder may be configured to reconstruct the first encoded information and the intermediate decoded information to generate the session reply information to achieve a consistent understanding. For example, the first encoded information and the intermediate decoded information may be processed by a third hidden layer of the second decoder to obtain the first auxiliary session response information. The first auxiliary dialogue response information is processed by a second output layer of the second decoder to generate dialogue response information.
Fig. 4A schematically illustrates an example schematic diagram of generating dialogue response information according to first fusion information according to an embodiment of the present disclosure.
As shown in fig. 4A, in 400A, after the first fusion information 401 is obtained, the first fusion information 401 may be encoded, resulting in first encoded information 402. After the first encoded information 402 is obtained, the first encoded information 402 may be self-decoded to obtain intermediate decoded information 403. After obtaining the intermediate decoding information 403, the dialogue response information 404 may be generated from the first encoding information 402 and the intermediate decoding information 403.
According to an embodiment of the present disclosure, generating dialogue response information according to the first fusion information may include the following operations.
And encoding the first fusion information to obtain second encoded information. And decoding the second encoded information to obtain the dialogue response information.
According to the embodiment of the disclosure, after the first fusion information is obtained, the first fusion information may be input into a fourth predetermined model to obtain the dialogue response information. The fourth predetermined model may be configured according to an actual service requirement, and may only need to implement a function of generating the dialogue response information, which is not limited herein. For example, the fourth predetermined model may include a model based on a TransfOrmer-ED (i.e., transformer's Encoder and Decoder) structure. The fourth predetermined model may include a second encoder and a third decoder. Multiple rounds of training may be performed on the second encoder and the third decoder until a predetermined condition is met. The trained second encoder and third decoder are determined as a fourth predetermined model.
According to an embodiment of the present disclosure, the second encoder may include at least one of: the system comprises a fourth two-way long-short-period memory network, a fourth gating circulation unit, a fourth convolution neural network, a fourth long-period memory network and a fourth circulation neural network. The second encoder may include a second input layer and a fourth concealment layer. The second encoder may employ a linear transformation function. The second encoder may be configured to encode the first fusion information to obtain second encoded information. For example, the first fusion information may be encoded using a second input layer of a second encoder to obtain a second intermediate vector. And processing the second intermediate vector by using a fourth hidden layer of the second encoder to obtain second encoded information.
According to an embodiment of the present disclosure, the third decoder may include at least one of: a fifth two-way long-short-term memory network, a fifth gating loop unit, a fifth convolutional neural network, a fifth long-short-term memory network and a fifth recurrent neural network. The third decoder may include a fifth hidden layer and a third output layer. The third decoder may employ a linear transformation function. The third decoder may be configured to reconstruct the first encoded information to obtain the session reply information, so as to implement a response session reply. For example, the second encoded information may be decoded using a fifth hidden layer of the triple decoder to obtain second auxiliary decoding information. And processing the second auxiliary decoding information by using a third output layer of the third decoder to obtain dialogue response information.
Fig. 4B schematically illustrates an example schematic diagram of generating dialogue response information according to first fusion information according to another embodiment of the present disclosure.
As shown in fig. 4B, after the first fused information 405 is obtained, the first fused information 405 may be encoded to obtain second encoded information 406 in 400B. After the second encoded information 406 is obtained, the second encoded information 406 may be decoded to obtain dialogue response information 407.
According to an embodiment of the present disclosure, generating dialogue response information according to the first fusion information may include the following operations.
At least one first candidate dialog response message is generated based on the first fusion message. And respectively fusing the at least one first candidate dialogue response information and the first fusion information to obtain at least one second fusion information. And determining dialogue response information from the at least one first candidate dialogue response information according to the at least one second fusion information.
According to the embodiment of the disclosure, after the first fusion information is obtained, the first fusion information may be input into a fifth predetermined model to obtain the dialogue response information. The fifth predetermined model may be configured according to actual service requirements, and may only need to be capable of implementing a function of generating the dialogue response information, which is not limited herein. For example, the fifth predetermined model may include a model based on a transducer-Dec (i.e., a Decoder of the transducer) structure. The fifth predetermined Model may include a first language sub-Model (dialect Model) and a first maximum mutual information scoring function sub-Model (Maximum Mutual Informationscoring fuunction, MMI Model). The first language submodel may include a third encoder and a fourth decoder. Multiple rounds of training may be performed on the first language sub-model and the first maximum mutual information scoring function sub-model until a predetermined condition is met. The trained first language sub-model and the first maximum mutual information scoring function sub-model are determined as a fifth predetermined model.
According to an embodiment of the present disclosure, the third encoder may include at least one of: a sixth two-way long-short-term memory network, a sixth gating cyclic unit, a sixth convolutional neural network, a sixth long-short-term memory network and a sixth cyclic neural network. The third encoder may include a third input layer and a sixth hidden layer. The third encoder may be configured to encode the first fusion information to obtain third encoded information. For example, the first fusion information may be encoded using a third input layer of a third encoder, resulting in a third intermediate vector. And processing the third intermediate vector by using a sixth hidden layer of the third encoder to obtain third encoded information.
According to an embodiment of the present disclosure, the fourth decoder may include at least one of: a seventh two-way long-short-term memory network, a seventh gating loop unit, a seventh convolutional neural network, a seventh long-short-term memory network, and a seventh recurrent neural network. The fourth decoder may include a seventh hidden layer and a fourth output layer. The fourth decoder may be configured to reconstruct the third encoded information to generate at least one first candidate dialog response information. For example, the third encoded information may be decoded using the seventh hidden layer of the four decoder to obtain third auxiliary decoding information. The third auxiliary decoding information is processed by a fourth output layer of the fourth decoder to generate at least one first candidate dialog response information.
According to the embodiment of the disclosure, after the at least one first candidate dialog response information is obtained, the at least one first candidate dialog response information and the first fusion information may be respectively fused to obtain at least one second fusion information. The fusion may include at least one of: splicing and adding. After obtaining the at least one second fused information, dialog response information may be determined from the at least one first candidate dialog response information using the first maximum mutual information scoring function sub-model based on the at least one second fused information. For example, the at least one second fusion information may be processed by using a sub-model of a first maximum mutual information scoring function, so as to obtain a first maximum mutual information score corresponding to each of the at least one first candidate dialogue response information. The dialogue response information is determined from the at least one first candidate dialogue response information based on a first maximum mutual information score corresponding to each of the at least one first candidate dialogue response information.
Fig. 4C schematically illustrates an example schematic diagram of generating dialogue response information according to first fusion information according to another embodiment of the present disclosure.
As shown in fig. 4C, at 400C, after the first fused information 408 is obtained, at least one first candidate dialog response information 409 may be generated according to the first fused information 408. The at least one first candidate dialog response information 409 may include first candidate dialog response information 4091, first candidate dialog response information 409_2, a. M may be an integer greater than or equal to 1, M e {1,2, (M-1), M }.
After obtaining the at least one first candidate dialog response information 409, the first candidate dialog response information 409_1, the first candidate dialog response information 409_2, the first candidate dialog response information 409_m, and the first fusion information 408 may be respectively fused to obtain at least one second fusion information. The at least one second fusion information may include second fusion information 410_1, second fusion information 410_2, second fusion information 410_m.
After the at least one second fusion information is obtained, the dialog response information 411 may be determined from the first candidate dialog response information 409_1, the first candidate dialog response information 409_2, the first candidate dialog response information 409_m, the second fusion information 410_2, the second fusion information 410_m, the first candidate dialog response information 409_m.
According to an embodiment of the present disclosure, generating dialogue response information according to the first fusion information may include the following operations.
And respectively fusing the at least one first hidden variable information and the first fusion information to obtain at least one third fusion information. At least one second candidate dialog response message is generated based on the at least one third fusion message. And determining dialogue response information from the at least one second candidate dialogue response information according to the evaluation value corresponding to the at least one second candidate dialogue response information.
According to the embodiment of the disclosure, after the first fusion information is obtained, the first fusion information may be input into a sixth predetermined model to obtain the dialogue response information. The sixth predetermined model may be configured according to an actual service requirement, and may be capable of implementing a function of generating the session response information, which is not limited herein. For example, the sixth predetermined model may include a model based on the UniLM-based (i.e., unified Language Model Pre-training for Natural Language Understanding and Generation) structure. The sixth predetermined model may include a fourth encoder, a fifth decoder, and a first evaluator. Multiple rounds of training may be performed on the fourth encoder and the fifth decoder until a predetermined condition is met. The trained fourth encoder and fifth decoder are determined as a sixth predetermined model.
According to an embodiment of the present disclosure, the first hidden variable information corresponding to each of the at least one dialog round may be generated according to dialog content (i.e., djalogue Context) and dialog Response (i.e., response) corresponding to each of the at least one dialog round. The dialog content and dialog response corresponding to each of the at least one dialog turn number should be able to reflect the first hidden variable information corresponding to the dialog turn number. After obtaining the at least one first hidden variable information, the at least one first hidden variable information and the first fusion information may be fused respectively to obtain at least one third fusion information. The fusion may include at least one of: splicing and adding.
According to an embodiment of the present disclosure, the fourth encoder may include at least one of: an eighth two-way long-short-term memory network, an eighth gating cyclic unit, an eighth convolutional neural network, an eighth long-short-term memory network and an eighth cyclic neural network. The fourth encoder may include a fourth input layer and an eighth concealment layer. The fourth encoder may be configured to encode the at least one third fusion information respectively, to obtain fourth encoded information corresponding to each of the at least one third fusion information. For example, the third fusion information may be encoded using a fourth input layer of a fourth encoder to obtain a fourth intermediate vector. And processing the fourth intermediate vector by using an eighth hidden layer of the fourth encoder to obtain fourth coding information corresponding to each of the at least one third fusion information.
According to an embodiment of the present disclosure, the fifth decoder may include at least one of: a ninth two-way long-short-term memory network, a ninth gating loop unit, a ninth convolutional neural network, a ninth long-short-term memory network, and a ninth recurrent neural network. The fifth decoder may include a ninth hidden layer and a fifth output layer. The fifth decoder may be configured to reconstruct fourth encoded information corresponding to each of the at least one third fusion information, respectively, to generate at least one second candidate session response information. For example, the fourth encoded information corresponding to each of the at least one third fusion information may be decoded by using the ninth hidden layer of the fifth decoder to obtain fourth auxiliary decoded information corresponding to each of the at least one third fusion information. And processing fourth auxiliary decoding information corresponding to each of the at least one third fusion information by using a fifth output layer of the fifth decoder to generate at least one second candidate dialogue response information.
According to an embodiment of the present disclosure, after the at least one second candidate dialog response information is obtained, the at least one second candidate dialog response information may be processed using the first evaluator to obtain evaluation values corresponding to the at least one second candidate dialog response information, respectively. The first evaluator may be trained based on NSP (i.e., next Sentence Prediction) tasks and MLM (i.e., mask Language Model) tasks. After the evaluation values respectively corresponding to the at least one second candidate dialog response information are obtained, dialog response information may be determined from the at least one second candidate dialog response information based on the evaluation values corresponding to the at least one second candidate dialog response information.
Fig. 4D schematically illustrates an example schematic diagram of generating dialogue response information according to first fusion information according to another embodiment of the present disclosure.
As shown in fig. 4D, at least one first hidden variable information 412 may be determined after the first fusion information is obtained in 400D. The at least one first hidden variable information 412 may include first hidden variable information 412_1, first hidden variable information 412_2, first hidden variable information 412_n. N may be an integer greater than or equal to 1, N e {1,2, (N-1), N }.
After obtaining the at least one first hidden variable information 412, the first hidden variable information 412_1, the first hidden variable information 412_2, the first hidden variable information 412_n, and the first fusion information may be respectively fused to obtain at least one third fusion information. The at least one third fused information may include third fused information 413_1, third fused information 413_2, third fused information 413_n.
After the at least one third fused information is obtained, at least one second candidate dialog response information may be generated according to the third fused information 4131, the third fused information 413_2, the third fused information 413_n, and the third fused information 413_n. The at least one second candidate dialog response information may include second candidate dialog response information 414_1, second candidate dialog response information 414_2, a.
After obtaining the at least one second candidate dialog response information, evaluation values corresponding to each of the second candidate dialog response information 414_1, the second candidate dialog response information 414_2, the second candidate dialog response information 414_n, and the second candidate dialog response information 414_n may be determined. The at least one evaluation value may include an evaluation value 415_1, an evaluation value 415_2, an evaluation value 415_n.
After obtaining the evaluation values each corresponding to the at least one second candidate dialog response information, the dialog response information 416 may be determined from the second candidate dialog response information 414_1, the second candidate dialog response information 414_2, the second candidate dialog response information 414_n, the third candidate dialog response information 414_n, according to the evaluation values 415_1, 415_2, 415_n.
According to an embodiment of the present disclosure, operation S210 may include the following operations.
And respectively fusing the at least one second hidden variable information and the query information to obtain at least one fourth fused information. At least one first candidate understanding information is generated according to the at least one fourth fusion information. The understanding information is determined from the at least one first candidate understanding information based on the evaluation value corresponding to the at least one first candidate understanding information.
According to an embodiment of the present disclosure, after obtaining the query information, the query information may be input into a seventh predetermined model, resulting in the understanding information. The seventh predetermined model may be configured according to actual service requirements, and may be capable of implementing a function of determining understanding information, which is not limited herein. For example, the seventh predetermined model may include a fifth encoder, a sixth decoder, and a second evaluator. Multiple rounds of training may be performed on the fifth encoder and the sixth decoder until a predetermined condition is met. The trained fifth encoder and sixth decoder are determined as a seventh predetermined model.
According to an embodiment of the present disclosure, second hidden variable information corresponding to each of the at least one dialog turn may be generated according to dialog content and dialog responses corresponding to each of the at least one dialog turn. The dialog content and dialog response corresponding to each of the at least one dialog turn number should be able to reflect the second hidden variable information corresponding to the dialog turn number. After obtaining the at least one second hidden variable information, the at least one second hidden variable information and the query information may be fused, respectively, to obtain at least one fourth fused information. The fusion may include at least one of: splicing and adding.
According to an embodiment of the present disclosure, the fifth encoder may include at least one of: the fifth encoder may include a fifth input layer and a tenth concealment layer. The fifth encoder may be configured to encode the at least one fourth fusion information respectively, to obtain fifth encoded information corresponding to each of the at least one fourth fusion information. For example, the fourth fusion information may be encoded using a fifth input layer of a fifth encoder, resulting in a fifth intermediate vector. And processing the fifth intermediate vector by using a tenth hidden layer of the fifth encoder to obtain fifth encoded information corresponding to each of the at least one fourth fusion information.
According to an embodiment of the present disclosure, the sixth decoder may include an eleventh concealment layer and a sixth output layer. The sixth decoder may be configured to reconstruct fifth encoded information corresponding to each of the at least one fourth fusion information, respectively, to generate the at least one first candidate understanding information. For example, the fifth encoded information corresponding to each of the at least one fourth fusion information may be decoded by using the eleventh hidden layer of the sixth decoder to obtain fifth auxiliary decoded information corresponding to each of the at least one fourth fusion information. And processing fifth auxiliary decoding information corresponding to each of the at least one fourth fusion information by using a sixth output layer of the sixth decoder to generate at least one first candidate understanding information.
According to the embodiment of the disclosure, after the at least one first candidate understanding information is obtained, the at least one first candidate understanding information may be processed by using the second evaluator to obtain evaluation values corresponding to the at least one first candidate understanding information, respectively. The second evaluator may be trained based on NSP tasks and MLM tasks. After the evaluation values corresponding to the at least one first candidate understanding information respectively are obtained, the understanding information may be determined from the at least one first candidate understanding information based on the evaluation values corresponding to the at least one first candidate understanding information.
Fig. 5A schematically illustrates an example schematic diagram for semantic understanding of query information, resulting in understanding information, according to an embodiment of the present disclosure.
As shown in fig. 5A, at 500A, after obtaining query information, at least one second hidden variable information 501 may be determined. The at least one second hidden variable information 501 may include second hidden variable information 501_1, second hidden variable information 501_2, second hidden variable information 501_p. P may be an integer greater than or equal to 1, P e {1,2, (P-1), P }.
After obtaining the at least one second hidden variable information 501, the second hidden variable information 501_1, the second hidden variable information 501_2, the second hidden variable information 501_p, and the query information may be fused, respectively, to obtain at least one fourth fused information. The at least one fourth fusion information may include fourth fusion information 502_1, fourth fusion information 502_2, fourth fusion information 502_p.
After obtaining the at least one fourth fusion information, at least one first candidate understanding information may be generated according to the fourth fusion information 502_1, the fourth fusion information 502_2, the fourth fusion information 502_p. The at least one first candidate understanding information may include first candidate understanding information 503_1, first candidate understanding information 503_2, first candidate understanding information 503_p.
After obtaining the at least one first candidate understanding information, evaluation values corresponding to the first candidate understanding information 503_1, the first candidate understanding information 503_2, the first candidate understanding information 503_p, and the first candidate understanding information 503_p, respectively, may be determined. The at least one evaluation value may include an evaluation value 504_1, an evaluation value 504_2, an evaluation value 504_p.
After obtaining the evaluation value corresponding to the at least one first candidate understanding information, the understanding information 505 may be determined from the first candidate understanding information 503_1, the first candidate understanding information 503_2, the first candidate understanding information 503—p, according to the evaluation value 504_1, the evaluation value 504—2, the evaluation value 504—p.
According to an embodiment of the present disclosure, operation S210 may include the following operations.
At least one second candidate understanding information is generated according to the query information. And respectively fusing the at least one second candidate understanding information and the query information to obtain at least one fifth fused information. And determining understanding information from the at least one second candidate understanding information according to the at least one fifth fusion information.
According to an embodiment of the present disclosure, after obtaining the query information, the query information may be input into an eighth predetermined model, resulting in the understanding information. The eighth predetermined model may be configured according to actual service requirements, and may be capable of implementing a function of determining understanding information, which is not limited herein. For example, the eighth predetermined model may include a model based on a transducer-Dec structure. The eighth predetermined model may include a second language sub-model and a second maximum mutual information scoring function sub-model. The second language submodel may include a sixth encoder and a seventh decoder. Multiple rounds of training may be performed on the second language sub-model and the second maximum mutual information scoring function sub-model until a predetermined condition is met. The trained second language sub-model and the second maximum mutual information scoring function sub-model are determined as an eighth predetermined model.
According to an embodiment of the present disclosure, the sixth encoder may include a sixth input layer and a twelfth hidden layer. The sixth encoder may be configured to encode the query information to obtain sixth encoded information. For example, the query information may be encoded using a sixth input layer of a sixth encoder, resulting in a sixth intermediate vector. And processing the sixth intermediate vector by using a twelfth hidden layer of the sixth encoder to obtain sixth encoded information.
According to an embodiment of the present disclosure, the seventh decoder may include a thirteenth concealment layer and a seventh output layer. The seventh decoder may be configured to reconstruct the sixth encoded information to generate at least one second candidate understanding information. For example, the sixth encoded information may be decoded using a thirteenth hidden layer of the seven decoder, resulting in sixth auxiliary decoding information. The sixth auxiliary decoding information is processed by a seventh output layer of the seventh decoder to generate at least one second candidate understanding information.
According to the embodiment of the disclosure, after the at least one second candidate understanding information is obtained, the at least one second candidate understanding information and the query information may be respectively fused to obtain at least one fifth fused information. The fusion may include at least one of: splicing and adding. After obtaining the at least one fifth fused information, the understanding information may be determined from the at least one second candidate understanding information using a second maximum mutual information scoring function sub-model, based on the at least one fifth fused information. For example, the at least one fifth fusion information may be processed by using a second maximum mutual information scoring function sub-model, to obtain second maximum mutual information scores corresponding to the at least one second candidate understanding information, respectively. And determining understanding information from the at least one second candidate understanding information according to the second maximum mutual information scores respectively corresponding to the at least one second candidate understanding information.
According to embodiments of the present disclosure, the historical dialog information may include system historical dialog information and object historical dialog information. The system history dialogue information may include a system identification and history dialogue information corresponding to the system identification. The object history dialogue information may include an object identification and history dialogue information corresponding to the object identification. The understanding information may also include an action identifier corresponding to the object action and a state identifier corresponding to the dialog state.
In accordance with embodiments of the present disclosure, as shown in Table 1 below, a "system" characterization system identification may be used. The object identification is characterized using "user". The action identity is characterized using "user_action". The status identity is characterized using "belief".
TABLE 1
According to the embodiment of the disclosure, since the system identifier can be used for representing the system history dialogue information and the object identifier can be used for representing the object history dialogue information, clear role distinction between the system and the object in the multi-round dialogue is realized, so that the continuity of the information generation method is improved, and the accuracy of dialogue response information is further improved.
FIG. 5B schematically illustrates an example schematic diagram of semantic understanding of query information resulting in understanding information according to another embodiment of the present disclosure.
As shown in fig. 5B, at 500B, after query information 506 is obtained, at least one second candidate understanding information may be generated from query information 506. The at least one second candidate understanding information may include second candidate understanding information 507_1, second candidate understanding information 507_2, …, second candidate understanding information 507_q, a. Q may be an integer greater than or equal to 1, Q ε {1,2, …, (Q-1), Q }.
After obtaining the at least one second candidate understanding information, the second candidate understanding information 507_1, the second candidate understanding information 507_2, the second candidate understanding information 507_q, and the query information 506 may be fused, respectively, to obtain at least one fifth fused information. The at least one fifth fused information may include fifth fused information 508_1, fifth fused information 508_2, a.
After obtaining the at least one fifth fused information, the understanding information 509 may be determined from the second candidate understanding information 507_1, the second candidate understanding information 507_2, the second candidate understanding information 507_q, the third candidate understanding information 507_q, the fourth candidate understanding information 507_q, and the fifth fused information 508_q according to the fifth fused information 508_1, the fifth fused information 508_2, the third, the fifth fused information 508_q, the third, and the fifth fused information 508_q.
According to an embodiment of the present disclosure, the information generating method 200 may further include the following operations.
In response to detecting the non-auxiliary request instruction, dialogue response information is generated directly according to the query information and the understanding information.
According to an embodiment of the present disclosure, the non-auxiliary request instruction may refer to a request instruction that does not require an external resource to be invoked in the process of generating the call response information. In the case of detecting a non-auxiliary request instruction, since no external resource needs to be invoked, it may not be necessary to determine auxiliary request information, which may be identified as NULL. In this case, the dialogue response information may be generated directly from the query information and the understanding information.
Table 2 may be used to characterize an example illustrative process of generating dialog response information in response to detecting a non-auxiliary request instruction, in accordance with an embodiment of the present disclosure. As shown in table 2, the auxiliary request information may include an auxiliary request information identifier corresponding to the auxiliary request information, and the auxiliary request information identifier may be characterized using "callapi". The auxiliary response information may include an auxiliary response information identification with the auxiliary response information, and the auxiliary response information identification may be characterized using "kb". The dialogue response information may include a system action, a system action identifier corresponding to the system action, a system response identifier corresponding to the system response, and the system action identifier may be characterized using "system action". The system reply identity is characterized using "response".
TABLE 2
Table 3 may be used to characterize an example illustrative process of generating dialog response information in response to detecting an auxiliary request instruction, in accordance with an embodiment of the present disclosure.
TABLE 3 Table 3
The above is only an exemplary embodiment, but is not limited thereto, and other information generation methods known in the art may be included as long as the accuracy of the dialogue response information can be improved.
Fig. 6 schematically illustrates a flowchart of a training method of a pre-training model according to an embodiment of the present disclosure.
As shown in fig. 6, the method 600 includes operations S610 to S640.
In operation S610, semantic understanding is performed on the first sample query information, resulting in first sample understanding information, wherein the first sample query information includes first sample history dialogue information, and the first sample understanding information includes first sample object actions and first sample dialogue states.
In operation S620, first sample assistance request information is obtained from the first sample query information and the first sample understanding information.
In operation S630, first sample-to-talk response information is generated according to the first sample query information, the first sample understanding information, and the first sample auxiliary request information.
In operation S640, a pre-trained dialog generation model is trained using the first sample query information, the first sample understanding information, and the first sample-to-talk response information, resulting in an information generation model.
According to an embodiment of the present disclosure, after the information generation model is obtained, the information generation method 200 may be performed using the information generation model.
According to embodiments of the present disclosure, for the description of the first sample query information, the first sample understanding information, the first sample history dialogue information, the first sample object action, the first sample dialogue state, the first sample auxiliary request information, and the first sample dialogue response information, reference may be made to the relevant contents for the query information, the understanding information, the history dialogue information, the object action, the dialogue state, the auxiliary request information, and the dialogue response information, which are not repeated herein.
According to embodiments of the present disclosure, a first training sample may be constructed from the first sample historical dialog information, the first sample object actions, and the first sample session state. For example, the first training sample may be "first sample historical dialog information- > first sample object action + first sample dialog state". The second training sample may be constructed based on the first sample historical dialog information, the first sample object action, the first sample session state, the first sample assistance request information, and the first sample session response information. For example, the first training sample may be "first sample history dialogue information+first sample object action+first sample dialogue state+first sample auxiliary request information- > first sample dialogue response information".
According to the embodiment of the disclosure, since the first sample understanding information is obtained by semantically understanding the first sample query information, the first sample understanding information of dialogue understanding can be obtained. Since the first sample assistance request information is obtained from the first sample inquiry information and the first sample understanding information, external knowledge can be effectively utilized. In addition, the first sample-to-talk response information is generated based on the first sample query information, the first sample understanding information, and the first sample auxiliary request information, and thus, the accuracy of the first sample-to-talk response information is improved. On the basis, the pre-training dialogue generation model is trained by utilizing the first sample query information, the first sample understanding information and the first sample opposite call response information to obtain the integrated information generation model, and the sharing among different task parameters can be realized due to the fact that the semantic understanding and the information generation are integrated by the information generation model, so that the representation learning capacity of the information generation model is improved, and the information generation capacity of the information generation model is further improved.
A training method 600 of a pre-training model according to an embodiment of the present disclosure is further described below with reference to fig. 7A and 7B.
The training method 600 of the pre-training model may further include the following operations according to embodiments of the present disclosure.
Obtaining sample historical dialog information from a corpus, wherein the corpus comprises at least one of: a real corpus and a simulated corpus.
According to embodiments of the present disclosure, a real corpus may be used to characterize a dataset from a public dataset. The simulated corpus may be used to characterize an artificially generated corpus. Raw sample historical dialog information may be obtained from a corpus. After the original sample history dialogue information is obtained, the original sample history dialogue information may be processed to obtain sample history dialogue information.
According to an embodiment of the disclosure, the real corpus may include a first real corpus and a second real corpus, where the second real corpus is obtained by translating a third real corpus, and languages of the first real corpus and the second real corpus are the same.
According to embodiments of the present disclosure, the simulated corpus may be generated based on at least one of: generated based on predetermined text parameters and generated based on generating the countermeasure network model processing predetermined random noise data.
According to embodiments of the present disclosure, a first real corpus and a third real corpus may be obtained from a public dataset. The first real corpus and the third real corpus are different in language. After the third real corpus is obtained, translation may be performed on the third real corpus to obtain a second real corpus identical to the first real corpus in language.
According to embodiments of the present disclosure, a simulated corpus may be generated manually using a program based on predetermined text parameters to expand the coverage of the simulated corpus. Alternatively, the simulated corpus may be derived by inputting predetermined random noise data to generate an countermeasure network model. Generating the countermeasure network model may include deep convolution generating the countermeasure network model, generating the countermeasure network model based on bulldozer distance, or conditionally generating the countermeasure network model, etc. Generating the countermeasure network model may include a generator and a arbiter. The generator and the arbiter may comprise a neural network model. The generator can be used for generating a simulated corpus and learning the data distribution of the simulated corpus through the continuous training generator, so that samples conforming to the data distribution of the simulated corpus can be generated from none to none, and the confusion discriminator can be removed as far as possible. The discriminant may be used to distinguish between a simulated corpus and a real corpus.
According to an embodiment of the present disclosure, generating the convergence condition against the network model may include the generator converging, the generator and the arbiter both converging, or the iteration reaching the termination condition may include the number of iterations being equal to the predetermined number of iterations.
According to the embodiment of the disclosure, since the sample historical dialogue information is obtained from the corpus, the corpus comprises at least one of a real corpus and a simulated corpus, training expectation can be increased, so that cross-type dialogue capability of the information generation model is improved, and universality of the information generation model is improved.
Fig. 7A schematically illustrates an example schematic diagram of a method of generating a real corpus according to an embodiment of the disclosure.
As shown in fig. 7A, in 700A, a third real corpus 701 may be obtained. After the third real corpus 701 is obtained, translation may be performed on the third real corpus 701 to obtain a second real corpus 702.
After obtaining the second real corpus 702, a first real corpus 703 having the same language as the second real corpus 702 may be determined according to the second real corpus 702. After obtaining the first real corpus 703, a real corpus 704 may be determined from the first real corpus 703 and the second real corpus 702.
Fig. 7B schematically illustrates an example schematic diagram of a method of generating a simulated corpus according to an embodiment of the disclosure.
As shown in fig. 7B, in 700B, a predetermined text parameter 705 and predetermined random noise data 706 may be preset. A first simulated corpus 707 may be generated based on predetermined text parameters 705. A second simulated corpus 708 may be generated based on the predetermined random noise data 706.
After obtaining the first simulated corpus 707 and the second simulated corpus 708, a simulated corpus 709 may be determined from the first simulated corpus 707 and the second simulated corpus 708.
According to an embodiment of the present disclosure, operation S640 may include the following operations.
And training the dialogue generating model by using the positive sample and the negative sample to obtain the information generating model.
According to embodiments of the present disclosure, the positive samples may include first sample query information, first sample understanding tag information, first sample pair-word answer information, and first sample pair-word answer tag information.
According to embodiments of the present disclosure, the negative samples may include first sample query information, first sample understanding information, second sample understanding tag information, first sample session response information, and second sample session response tag information.
According to embodiments of the present disclosure, the first sample understanding tag information may be used to characterize normal tag information. The first sample session answer tag information may be used to characterize normal tag information. At least one of the second sample understanding tag information and the second sample dialogue response tag information may be abnormal tag information.
According to an embodiment of the present disclosure, in contrast learning, a child sample obtained by data enhancement of a parent sample is considered as a positive sample for the parent sample, because the child sample and the parent sample are the same in category, maintaining the same semantic information as each other. A parent sample may refer to a sample that is the subject of data enhancement processing. For the same parent sample, multiple data enhancements may be performed on the parent sample, resulting in multiple child samples. Although it is a plurality of sub-samples for the same parent sample, there is also a slight distinction between the plurality of sub-samples, i.e., the plurality of sub-samples are not completely identical. Negative samples may refer to other samples that are of a different class than the parent sample. Positive samples in embodiments of the present disclosure may include parent samples and positive samples resulting from data enhancement of parent samples.
According to the embodiment of the disclosure, the dialogue generating model is trained by utilizing the positive sample and the negative sample, so that the information generating model is obtained, and the learning capability of the information generating model is improved.
The above is only an exemplary embodiment, but is not limited thereto, and other training methods of the pre-training model known in the art may be included as long as the information generating capability of the information generating model can be improved.
Fig. 8 schematically shows a block diagram of an information generating apparatus according to an embodiment of the present disclosure.
As shown in fig. 8, the information generating apparatus 800 may include a first semantic understanding module 810, a first obtaining module 820, and a first generating module 830.
The first semantic understanding module 810 is configured to perform semantic understanding on query information to obtain understanding information, where the query information includes historical dialogue information, and the understanding information includes object actions and dialogue states.
The first obtaining module 820 is configured to obtain the auxiliary request information according to the query information and the understanding information in response to detecting the auxiliary request instruction.
The first generation module 830 is configured to generate dialogue response information according to the query information, the understanding information, and the auxiliary request information.
According to an embodiment of the present disclosure, the first generation module 830 may include a first determination sub-module and a first generation sub-module.
And the first determination submodule is used for determining auxiliary response information corresponding to the auxiliary request information from the data source.
And the first generation sub-module is used for generating dialogue response information according to the query information, the understanding information and the auxiliary response information.
According to an embodiment of the present disclosure, the first generation sub-module may include a fusion unit and a generation unit.
And the fusion unit is used for fusing the query information, the understanding information and the auxiliary response information to obtain first fusion information.
And the generating unit is used for generating dialogue response information according to the first fusion information.
According to an embodiment of the present disclosure, the generating unit may include a first encoding subunit, a first decoding subunit, and a first generating subunit.
And the first coding subunit is used for coding the first fusion information to obtain first coding information.
And the first decoding subunit is used for self-decoding the first coding information to obtain intermediate decoding information.
And the first generation subunit is used for generating dialogue response information according to the first coding information and the intermediate decoding information.
According to an embodiment of the present disclosure, the generating unit may include a second encoding subunit and a second decoding subunit.
And the second coding subunit is used for coding the first fusion information to obtain second coding information.
And the second decoding subunit is used for decoding the second coding information to obtain dialogue response information.
According to an embodiment of the present disclosure, the generating unit may include a second generating subunit, a first fusing subunit, and a first determining subunit.
And the second generation subunit is used for generating at least one first candidate dialogue response message according to the first fusion message.
And the first fusion subunit is used for respectively fusing the at least one first candidate dialogue response information and the first fusion information to obtain at least one second fusion information.
And the first determination subunit is used for determining the dialogue response information from the at least one first candidate dialogue response information according to the at least one second fusion information.
According to an embodiment of the present disclosure, the generating unit may include a second fusing subunit, a third generating subunit, and a second determining subunit.
And the second fusion subunit is used for respectively fusing the at least one first hidden variable information and the first fusion information to obtain at least one third fusion information.
And the third generation subunit is used for generating at least one second candidate dialogue response message according to the at least one third fusion message.
And a second determination subunit configured to determine dialogue response information from the at least one second candidate dialogue response information according to the evaluation value corresponding to the at least one second candidate dialogue response information.
According to an embodiment of the present disclosure, the first semantic understanding module 810 may include a first fusion sub-module, a first generation sub-module, and a second determination sub-module.
And the first fusion sub-module is used for respectively fusing the at least one second hidden variable information and the query information to obtain at least one fourth fusion information.
The first generation sub-module is used for generating at least one first candidate understanding information according to at least one fourth fusion information.
And the second determination submodule is used for determining the understanding information from the at least one first candidate understanding information according to the evaluation value corresponding to the at least one first candidate understanding information.
According to an embodiment of the present disclosure, the first semantic understanding module 810 may include a second generation sub-module, a second fusion sub-module, and a third determination sub-module.
And the second generation sub-module is used for generating at least one piece of second candidate understanding information according to the query information.
And the second fusion sub-module is used for respectively fusing the at least one second candidate understanding information and the query information to obtain at least one fifth fusion information.
And a third determining sub-module for determining understanding information from the at least one second candidate understanding information according to the at least one fifth fusion information.
According to an embodiment of the present disclosure, the information generating apparatus 800 may further include a third generating module.
And the third generation module is used for responding to the detection of the non-auxiliary request instruction and directly generating dialogue response information according to the query information and the understanding information.
According to an embodiment of the present disclosure, the query information further includes a query word slot.
According to an embodiment of the present disclosure, the history dialogue information includes system history dialogue information including a system identification and history dialogue information corresponding to the system identification, and the object history dialogue information including an object identification and history dialogue information corresponding to the object identification.
According to an embodiment of the present disclosure, the understanding information further includes an action identifier corresponding to the object action and a state identifier corresponding to the dialog state.
Fig. 9 schematically illustrates a block diagram of a training apparatus of a pre-training model according to an embodiment of the present disclosure.
As shown in fig. 9, the training apparatus 900 of the pre-training model may include a second semantic understanding module 910, a second obtaining module 920, a second generating module 930, and a training module 940.
The second semantic understanding module 910 is configured to perform semantic understanding on the first sample query information to obtain first sample understanding information, where the first sample query information includes first sample history dialogue information, and the first sample understanding information includes a first sample object action and a first sample dialogue state.
The second obtaining module 920 is configured to obtain the first sample auxiliary request information according to the first sample query information and the first sample understanding information.
The second generating module 930 is configured to generate first sample answering information according to the first sample query information, the first sample understanding information, and the first sample auxiliary request information.
The training module 940 is configured to train the pre-training dialogue generation model by using the first sample query information, the first sample understanding information and the first sample dialogue response information to obtain an information generation model.
According to an embodiment of the present disclosure, the training apparatus 900 of the pre-training model may further comprise an acquisition module.
The acquisition module is used for acquiring sample historical dialogue information from a corpus, wherein the corpus comprises at least one of the following: a real corpus and a simulated corpus.
According to the embodiment of the disclosure, the real corpus comprises a first real corpus and a second real corpus, wherein the second real corpus is obtained by translating a third real corpus, and languages of the first real corpus and the second real corpus are the same.
According to an embodiment of the present disclosure, the simulated corpus is generated based on at least one of: generated based on predetermined text parameters and generated based on generating the countermeasure network model processing predetermined random noise data.
According to embodiments of the present disclosure, training module 940 may include a training sub-module.
And the training sub-module is used for training the dialogue generating model by utilizing the positive sample and the negative sample to obtain the information generating model.
According to an embodiment of the present disclosure, the positive sample includes first sample query information, first sample understanding tag information, first sample answering information, and first sample answering tag information.
According to an embodiment of the present disclosure, the negative sample includes first sample query information, first sample understanding information, second sample understanding tag information, first sample dialogue response information, and second sample dialogue response tag information, at least one of the second sample understanding tag information and the second sample dialogue response tag information being abnormal tag information.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods as described in the present disclosure.
According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method as described in the present disclosure.
According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as described in the present disclosure.
Fig. 10 schematically illustrates a block diagram of an electronic device adapted to implement the information generation method and the training method of the pre-training model, according to an embodiment of the disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 10, the electronic device 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the electronic apparatus 1000 can also be stored. The computing unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
Various components in the electronic device 1000 are connected to the I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and communication unit 1009 such as a network card, modem, wireless communication transceiver, etc. Communication unit 1009 allows electronic device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The computing unit 1001 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1001 performs the respective methods and processes described above, such as the information generating method and the training method of the pre-training model. For example, in some embodiments, the information generation method and the training method of the pre-training model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1000 via the ROM 1002 and/or the communication unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the information generating method and the training method of the pre-training model described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the information generation method and the training method of the pre-training model in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (20)

1. An information generation method, comprising:
semantic understanding is carried out on query information and second hidden variable information corresponding to at least one dialogue round number respectively to obtain understanding information, wherein the query information comprises historical dialogue information, the understanding information comprises object actions and dialogue states, and the second hidden variable information is generated according to dialogue contents and dialogue responses corresponding to the at least one dialogue round number respectively;
Responding to the detection of an auxiliary request instruction, and obtaining auxiliary request information according to the query information and the understanding information; and
generating dialogue response information according to the query information, the understanding information and the auxiliary request information;
the semantic understanding of the query information and the second hidden variable information corresponding to at least one dialogue round number respectively is carried out to obtain understanding information, and the method comprises the following steps:
respectively fusing at least one second hidden variable information and the query information to obtain at least one fourth fused information;
generating at least one first candidate understanding information according to the at least one fourth fusion information; and
and determining the understanding information from the at least one first candidate understanding information according to the evaluation value corresponding to the at least one first candidate understanding information.
2. The method of claim 1, wherein the generating dialogue response information from the query information, the understanding information, and the auxiliary request information comprises:
determining auxiliary response information corresponding to the auxiliary request information from a data source; and
and generating the dialogue response information according to the query information, the understanding information and the auxiliary response information.
3. The method of claim 2, wherein the generating the dialogue response information from the query information, the understanding information, and the auxiliary response information comprises:
fusing the query information, the understanding information and the auxiliary response information to obtain first fused information; and
and generating the dialogue response information according to the first fusion information.
4. A method according to claim 3, wherein said generating said dialogue response information from said first fusion information comprises:
encoding the first fusion information to obtain first encoded information;
performing self-decoding on the first coding information to obtain intermediate decoding information; and
and generating the dialogue response information according to the first coding information and the intermediate decoding information.
5. A method according to claim 3, wherein said generating said dialogue response information from said first fusion information comprises:
encoding the first fusion information to obtain second encoded information; and
and decoding the second encoded information to obtain the dialogue response information.
6. A method according to claim 3, wherein said generating said dialogue response information from said first fusion information comprises:
Generating at least one first candidate dialogue response message according to the first fusion message;
respectively fusing the at least one first candidate dialogue response information and the first fusion information to obtain at least one second fusion information; and
and determining the dialogue response information from the at least one first candidate dialogue response information according to the at least one second fusion information.
7. A method according to claim 3, wherein said generating said dialogue response information from said first fusion information comprises:
respectively fusing at least one first hidden variable information and the first fusion information to obtain at least one third fusion information;
generating at least one second candidate dialogue response message according to the at least one third fusion message; and
and determining the dialogue response information from the at least one second candidate dialogue response information according to the evaluation value corresponding to the at least one second candidate dialogue response information.
8. The method as recited in claim 1, further comprising:
generating at least one piece of second candidate understanding information according to the query information;
respectively fusing the at least one second candidate understanding information and the query information to obtain at least one fifth fused information; and
And determining the understanding information from the at least one second candidate understanding information according to the at least one fifth fusion information.
9. The method of claim 1, further comprising:
and responding to the detection of the non-auxiliary request instruction, and generating the dialogue response information directly according to the query information and the understanding information.
10. The method of claim 1, wherein the query information further comprises a query word slot.
11. The method of any of claims 1-10, wherein the historical dialog information includes system historical dialog information and object historical dialog information, the system historical dialog information including a system identification and historical dialog information corresponding to the system identification, the object historical dialog information including an object identification and historical dialog information corresponding to the object identification;
the understanding information further includes an action identifier corresponding to the object action and a state identifier corresponding to the dialog state.
12. A training method of a pre-training model, comprising:
semantic understanding is carried out on the first sample query information and second sample hidden variable information corresponding to at least one dialogue round number respectively to obtain first sample understanding information, wherein the first sample query information comprises first sample historical dialogue information, the first sample understanding information comprises first sample object actions and first sample dialogue states, and the second sample hidden variable information is generated according to dialogue contents and dialogue responses corresponding to the at least one dialogue round number respectively;
Obtaining first sample auxiliary request information according to the first sample inquiry information and the first sample understanding information;
generating first sample opposite-speaking response information according to the first sample query information, the first sample understanding information and the first sample auxiliary request information; and
training a pre-training dialogue generation model by using the first sample query information, the first sample understanding information and the first sample opposite-speaking response information to obtain an information generation model;
the semantic understanding of the first sample query information and the second sample hidden variable information corresponding to at least one dialogue round number respectively is carried out to obtain first sample understanding information, and the semantic understanding method comprises the following steps:
respectively fusing at least one second sample hidden variable information and the first sample query information to obtain at least one fourth sample fusion information;
generating at least one first candidate sample understanding information according to the at least one fourth sample fusion information; and
and determining the first sample understanding information from the at least one first candidate sample understanding information according to the evaluation value corresponding to the at least one first candidate sample understanding information.
13. The method of claim 12, further comprising:
obtaining the sample historical dialog information from a corpus, wherein the corpus comprises at least one of: a real corpus and a simulated corpus.
14. The method of claim 13, wherein the real corpus comprises a first real corpus and a second real corpus, wherein the second real corpus is obtained by translating a third real corpus, and languages of the first real corpus and the second real corpus are the same;
the simulated corpus is generated based on at least one of: generated based on predetermined text parameters and generated based on generating the countermeasure network model processing predetermined random noise data.
15. The method of any of claims 12-14, wherein the training a pre-trained dialog generation model using the first sample query information, the first sample understanding information, and the first sample pair-call answer information to obtain an information generation model comprises:
training the dialogue generating model by using a positive sample and a negative sample to obtain the information generating model;
wherein the positive sample includes the first sample query information, the first sample understanding information, first sample understanding tag information, the first sample opposite-speaking reply information, and first sample opposite-speaking reply tag information;
Wherein the negative sample includes the first sample query information, the first sample understanding information, second sample understanding tag information, the first sample session response information, and second sample session response tag information, at least one of the second sample understanding tag information and the second sample session response tag information being abnormal tag information.
16. An information generating apparatus comprising:
the first semantic understanding module is used for carrying out semantic understanding on query information and second hidden variable information corresponding to at least one dialogue round number respectively to obtain understanding information, wherein the query information comprises historical dialogue information, the understanding information comprises object actions and dialogue states, and the second hidden variable information is generated according to dialogue contents and dialogue responses corresponding to the at least one dialogue round number respectively;
the first obtaining module is used for responding to the detection of the auxiliary request instruction and obtaining auxiliary request information according to the query information and the understanding information; and
the first generation module is used for generating dialogue response information according to the query information, the understanding information and the auxiliary request information;
The semantic understanding of the query information and the second hidden variable information corresponding to at least one dialogue round number respectively is carried out to obtain understanding information, and the method comprises the following steps:
respectively fusing at least one second hidden variable information and the query information to obtain at least one fourth fused information;
generating at least one first candidate understanding information according to the at least one fourth fusion information; and
and determining the understanding information from the at least one first candidate understanding information according to the evaluation value corresponding to the at least one first candidate understanding information.
17. A training device for a pre-training model, comprising:
the second semantic understanding module is used for carrying out semantic understanding on the first sample query information and second sample hidden variable information corresponding to at least one dialogue round number respectively to obtain first sample understanding information, wherein the first sample query information comprises first sample historical dialogue information, the first sample understanding information comprises first sample object actions and first sample dialogue states, and the second sample hidden variable information is generated according to dialogue contents and dialogue responses corresponding to the at least one dialogue round number respectively;
The second obtaining module is used for obtaining first sample auxiliary request information according to the first sample inquiry information and the first sample understanding information;
the second generation module is used for generating first sample opposite-speaking response information according to the first sample query information, the first sample understanding information and the first sample auxiliary request information; and
the training module is used for training a pre-training dialogue generation model by using the first sample query information, the first sample understanding information and the first sample opposite-speaking response information to obtain an information generation model;
the semantic understanding of the first sample query information and the second sample hidden variable information corresponding to at least one dialogue round number respectively is carried out to obtain first sample understanding information, and the semantic understanding method comprises the following steps:
respectively fusing at least one second sample hidden variable information and the first sample query information to obtain at least one fourth sample fusion information;
generating at least one first candidate sample understanding information according to the at least one fourth sample fusion information; and
and determining the first sample understanding information from the at least one first candidate sample understanding information according to the evaluation value corresponding to the at least one first candidate sample understanding information.
18. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 11 or claims 12 to 15.
19. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-11 or claims 12-15.
20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 11 or claims 12 to 15.
CN202211742317.3A 2022-12-30 2022-12-30 Information generation method, training device, electronic equipment and storage medium Active CN116050427B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211742317.3A CN116050427B (en) 2022-12-30 2022-12-30 Information generation method, training device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211742317.3A CN116050427B (en) 2022-12-30 2022-12-30 Information generation method, training device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116050427A CN116050427A (en) 2023-05-02
CN116050427B true CN116050427B (en) 2023-10-27

Family

ID=86123256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211742317.3A Active CN116050427B (en) 2022-12-30 2022-12-30 Information generation method, training device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116050427B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200092455A (en) * 2019-01-04 2020-08-04 주식회사 케이티 Server, method and computer program for predicting intention of user
CN111666385A (en) * 2019-03-07 2020-09-15 南京邮电大学 Customer service question-answering system based on deep learning and implementation method
CN112365892A (en) * 2020-11-10 2021-02-12 杭州大搜车汽车服务有限公司 Man-machine interaction method, device, electronic device and storage medium
WO2022057712A1 (en) * 2020-09-15 2022-03-24 华为技术有限公司 Electronic device and semantic parsing method therefor, medium, and human-machine dialog system
CN114490985A (en) * 2022-01-25 2022-05-13 北京百度网讯科技有限公司 Dialog generation method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8275803B2 (en) * 2008-05-14 2012-09-25 International Business Machines Corporation System and method for providing answers to questions
CN107612814A (en) * 2017-09-08 2018-01-19 北京百度网讯科技有限公司 Method and apparatus for generating candidate's return information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200092455A (en) * 2019-01-04 2020-08-04 주식회사 케이티 Server, method and computer program for predicting intention of user
CN111666385A (en) * 2019-03-07 2020-09-15 南京邮电大学 Customer service question-answering system based on deep learning and implementation method
WO2022057712A1 (en) * 2020-09-15 2022-03-24 华为技术有限公司 Electronic device and semantic parsing method therefor, medium, and human-machine dialog system
CN112365892A (en) * 2020-11-10 2021-02-12 杭州大搜车汽车服务有限公司 Man-machine interaction method, device, electronic device and storage medium
CN114490985A (en) * 2022-01-25 2022-05-13 北京百度网讯科技有限公司 Dialog generation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116050427A (en) 2023-05-02

Similar Documents

Publication Publication Date Title
US20220358292A1 (en) Method and apparatus for recognizing entity, electronic device and storage medium
CN112487173A (en) Man-machine conversation method, device and storage medium
CN112507706B (en) Training method and device for knowledge pre-training model and electronic equipment
CN115309877B (en) Dialogue generation method, dialogue model training method and device
CN112307188B (en) Dialog generation method, system, electronic device and readable storage medium
CN114861889B (en) Deep learning model training method, target object detection method and device
CN113408272A (en) Method, device, equipment and storage medium for training abstract generation model
CN112528641A (en) Method and device for establishing information extraction model, electronic equipment and readable storage medium
US20220358955A1 (en) Method for detecting voice, method for training, and electronic devices
CN113641805A (en) Acquisition method of structured question-answering model, question-answering method and corresponding device
CN113836278A (en) Training and dialogue generating method and device for general dialogue model
CN112528654A (en) Natural language processing method and device and electronic equipment
CN115481229A (en) Method and device for pushing answer call, electronic equipment and storage medium
CN115481227A (en) Man-machine interaction dialogue method, device and equipment
CN115062718A (en) Language model training method and device, electronic equipment and storage medium
CN113239157B (en) Method, device, equipment and storage medium for training conversation model
CN116049370A (en) Information query method and training method and device of information generation model
CN114758649B (en) Voice recognition method, device, equipment and medium
CN114416941B (en) Knowledge graph-fused dialogue knowledge point determination model generation method and device
CN116050427B (en) Information generation method, training device, electronic equipment and storage medium
CN113360590B (en) Method and device for updating interest point information, electronic equipment and storage medium
CN116010916A (en) User identity information identification method and device, electronic equipment and storage medium
CN113468857B (en) Training method and device for style conversion model, electronic equipment and storage medium
CN113806541A (en) Emotion classification method and emotion classification model training method and device
CN113553413A (en) Dialog state generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant