CN114817467A - Intention recognition response method, device, equipment and storage medium - Google Patents

Intention recognition response method, device, equipment and storage medium Download PDF

Info

Publication number
CN114817467A
CN114817467A CN202210415455.4A CN202210415455A CN114817467A CN 114817467 A CN114817467 A CN 114817467A CN 202210415455 A CN202210415455 A CN 202210415455A CN 114817467 A CN114817467 A CN 114817467A
Authority
CN
China
Prior art keywords
knowledge
query vector
representation
intention
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210415455.4A
Other languages
Chinese (zh)
Inventor
黄健
张友根
王敏
程永靖
闫凯
谢伟
袁山洞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210415455.4A priority Critical patent/CN114817467A/en
Publication of CN114817467A publication Critical patent/CN114817467A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an intention recognition response method, which comprises the following steps: analyzing each acquired dialog pair into a dialog sequence containing a single question sequence and a response sequence, and performing multi-round coding to generate a context hidden state and a query vector; analyzing each piece of knowledge information in the constructed external knowledge base into a knowledge triple; multi-hop query is carried out in an external knowledge base, a query vector is updated, and a global pointer is generated; calculating similarity between the query vector and the screened knowledge triple and the model deterministic knowledge, capturing target knowledge and generating intention perception representation; performing fine grain decoding and coarse grain decoding according to the intention perception representation to output a response sequence with the maximum probability; the invention also discloses an intention identification response device, corresponding equipment and a storage medium; according to the invention, through the intention reasoning network, the intention perception representation of the concept mark is obtained by capturing the target knowledge, and the intention of the user can be effectively identified to generate an accurate response.

Description

Intention recognition response method, device, equipment and storage medium
Technical Field
The present application relates to the field of end-to-end intent recognition response technologies, and in particular, to an intent recognition response method, apparatus, device, and storage medium.
Background
Since the beginning of artificial intelligence research, efforts have been made to develop highly intelligent human-machine dialog systems. The first generation of dialogue systems, which are mainly rule-based dialogue systems, have the advantages of transparent internal logic, easy analysis and debugging, high dependence on manual intervention of experts, and poor flexibility and expansibility. With the rise of big data technology, a data-driven second-generation dialogue system based on a statistical method, which is integrated with reinforcement learning, appears, is a modular system, avoids high dependence on experts, but has the defects that the model is difficult to maintain and the expansibility is relatively limited. In recent years, with the major breakthrough of deep learning in the fields of images, voice and texts, a third-generation dialogue system using deep learning as a main method appears, the system still continues the framework of a statistical dialogue system, but each module adopts a neural network model, and the acquisition of the dialogue state is not obtained by Bayes posterior judgment any more, but the maximum conditional probability is directly calculated. Deep reinforcement learning models have also begun to be used in the optimization of conversational strategies.
Task-based conversations are typically intended to satisfy users with specific goals, such as flow rate, call charges, ordering, booking tickets, consulting, etc. Because the user's demand is comparatively complicated, need to divide into many rounds of interdynamic under normal conditions, the user also can be in the conversation process constantly revises and perfect own demand, and task type robot needs help the user to clarify the purpose through inquiry, clarification and affirmation. With the gradual improvement of the requirements of users on product experience, the actual conversation scene is more complex, how to effectively deal with the change of user behaviors on the basis of the prior art, and the reasoning ability of the system for retaining the identification ability of user intentions and determining knowledge is improved, which is a problem to be solved urgently in a task-based conversation system.
Task-based dialog may be understood as a sequential decision-making process, in which a machine needs to update and maintain the internal dialog state by understanding the user statements, and then select the next best action (e.g., confirm the requirement, ask the constraint, provide the result, etc.) according to the current dialog state, thereby completing the task. The task-based dialog system can be structurally divided into two types, one type is a pipeline system, a modular structure is adopted, the pipeline system comprises a natural language understanding module, a dialog management module and a natural language generating module, the interpretability of the modular system structure is strong, the pipeline system is easy to fall to the ground, and most of the practical task-based dialog systems in the industry adopt the structure. But the defects are that the method is not flexible enough, and the modules are relatively independent, difficult to combine and optimize and adapt to the changing application scene. And since errors between modules accumulate from layer to layer, upgrading of a single module may also require the entire system to be adjusted together. The other realization of the task-based dialog system is an end-to-end system, which can directly output system response taking plain text as input and is a more popular direction in the recent academic world, and the structure hopes to train an overall mapping relation from user-side natural language input to machine-side natural language output, has the characteristics of strong flexibility and high expansibility, reduces the labor cost in the design process, and breaks the isolation between traditional modules. In recent years, with the development of an end-to-end neural generation model, an end-to-end trainable framework is constructed for a task-oriented dialog system.
Unlike traditional pipeline models, the end-to-end model uses one module and interacts with a structured external database. In the field of conversational systems, there are several reasons for using end-to-end systems:
firstly, the things to be done by the conversation and the seq input seq output are very similar;
pipeline of a dialogue system is a pipeline with four modules connected in series, so errors of the front module are amplified by the rear module, and no filtering or noise eliminating measures exist. The use of an end-to-end system will provide a local or global noise reduction for the whole process, perhaps to help improve accuracy;
and the design of the end-to-end system is not so complex, thus being beneficial to training and use.
However, the task-based dialog system requires strong "informativeness", which is not only to logically strive for a certain service, but also to interact with some structured information, and requires a large amount of data to train, so that the quality of the generated response of the existing model is still limited.
Therefore, in order to meet the requirement of expandability of the task-based dialog system and improve the quality of response generated by the system in practical application, a task-based dialog model which can perform fine-grained reasoning on deterministic knowledge, can capture concept transition in a cross-task scene and can identify the true intention of a user needs to be provided. Many memory-enhanced end-to-end models that have been proposed to date, using dialog history and domain-specific Knowledge Bases (KB) to consolidate KB information and perform knowledge-based reasoning, can achieve better performance, but still have two major limitations. On the one hand, the model relies heavily on a soft attention mechanism to generate responses by taking as an output representation an embedded weighted sum of memory triples (from the dialog history and external KB). Since the representation obtained in this way is dispersed by context, it is difficult to model certain conceptual tokens of deterministic knowledge. On the other hand, the soft attention mechanism is not suitable for performing fine-grained (label-level) multi-hop reasoning itself, which makes it difficult to capture the true intent of a user to generate an accurate response, especially in complex cross-task scenarios where concept transfer may occur. Existing attention-based models are generally unable to perform such label-level multi-hop reasoning, which prevents them from obtaining accurate responses.
Disclosure of Invention
In view of at least one of the drawbacks or needs for improvement of the related art, the present invention provides an intention recognition response method, apparatus, device, and storage medium for capturing concept transformations involved in task-oriented dialogs, with the purpose of efficiently recognizing user intentions and generating accurate responses.
To achieve the above object, according to one aspect of the present invention, there is provided an intention recognition response method including the steps of:
acquiring each round of conversation, and analyzing each round of conversation into an initial semantic representation of a conversation sequence comprising a single question sequence and a single response sequence; performing multiple rounds of encoding on the initial semantic representation to generate a context hidden state and a query vector of each pair of dialog sequences; wherein the query vector of the previous pair of conversational sequences acts on the code of the initial semantic representation of the next pair of conversational sequences;
constructing an external knowledge base based on multi-field conversation history, merging user key information according to the conversation history, and analyzing each piece of knowledge in the external knowledge base into a knowledge triple;
taking the context hidden state of the last pair of conversation sequences as an initial query vector to perform multi-hop query in the external knowledge base and update the query vector, calculating the similarity between the query vector and each piece of knowledge in the external knowledge base, and classifying the similarity to generate a global pointer;
carrying out knowledge screening in an external knowledge base according to the global pointer, calculating similarity among a query vector, a screened knowledge triple and model deterministic knowledge, capturing target knowledge and generating intention perception representation;
and performing fine grain decoding and coarse grain decoding according to the intention perception representation to respectively obtain respective context perception response sequences and corresponding probabilities thereof, and taking the response sequence with the maximum probability as an output response sequence.
Further, the intention-recognition response method further includes:
and representing each knowledge in the external knowledge base as a knowledge triple comprising a head entity, a relation and a tail entity, and embedding each knowledge triple into a memory network to obtain a corresponding knowledge embedding matrix.
Further, in the above intention identifying and responding method, the multi-hop query and the update of the query vector in the external knowledge base with the context hidden state of the last dialog sequence as the initial query vector specifically include:
performing cyclic multi-hop in the knowledge embedding matrix according to the initial query vector, and respectively calculating attention weight values corresponding to the knowledge embedding matrix of each hop;
and reading the memory network according to the weighted sum of the attention weight value corresponding to the knowledge embedding matrix of the current hop and the knowledge embedding matrix of the next hop, and obtaining the query vector of the next hop.
Further, in the above intention identifying response method, the calculating the similarity between the query vector and each piece of knowledge in the external knowledge base specifically includes:
and performing dot product on the query vector and the knowledge triples to obtain the similarity between the query vector and each piece of knowledge in an external knowledge base.
Further, in the intention recognition response method, the similarity between the query vector, the screened knowledge triple and the model deterministic knowledge is calculated, the target knowledge is captured, and an intention perception representation is generated, specifically:
and calculating the similarity between the query vector, the head entity of the knowledge triple and the model deterministic knowledge, and integrating the tail entity of the related knowledge triple according to the similarity to obtain the target knowledge and generate the intention perception representation.
Further, in the above intention identifying response method, the coarse-grained decoding specifically includes:
and calculating the conversation history attention representation in the hidden state, and connecting the hidden state with the conversation history attention representation to obtain an output representation of coarse-grained context sensing.
Further, in the above intention identifying response method, the fine-grained decoding specifically includes:
and calculating conversation history attention representation in a hidden state, connecting the hidden state with the conversation history attention representation and fusing the intention perception representation to obtain output representation of fine-grained context perception.
According to a second aspect of the present invention, there is also provided an intention-recognition responding apparatus comprising:
the context dialogue history editor is used for acquiring each round of dialogue and analyzing each round of dialogue into an initial semantic representation of a dialogue sequence comprising a single question sequence and a single response sequence; performing multiple rounds of encoding on the initial semantic representation to generate a context hidden state and a query vector of each pair of dialog sequences; wherein the query vector of the previous pair of conversational sequences acts on the code of the initial semantic representation of the next pair of conversational sequences;
the external knowledge base is constructed based on multi-field conversation history, key information of a user is blended according to the conversation history, and each piece of knowledge information in the external knowledge base is analyzed into a knowledge triple;
taking the context hidden state of the last pair of conversation sequences as an initial query vector to perform multi-hop query in the external knowledge base and update the query vector, calculating the similarity between the query vector and each piece of knowledge in the external knowledge base, and classifying the similarity to generate global pointers;
the intention reasoning module is used for screening knowledge in an external knowledge base according to the global pointer, calculating similarity among a query vector, a screened knowledge triple and model deterministic knowledge, capturing target knowledge and generating intention perception representation;
and the layered reply decoder is used for performing fine-grained decoding and coarse-grained decoding according to the intention perception representation to respectively obtain the response sequence of the respective context perception and the corresponding probability thereof, and taking the response sequence with the maximum probability as an output response sequence.
According to a third aspect of the present invention, there is also provided an intention-recognition responding device comprising at least one processing unit, and at least one memory unit, wherein the memory unit stores a computer program which, when executed by the processing unit, causes the processing unit to carry out the steps of any of the methods described above.
According to a fourth aspect of the present invention, there is also provided a storage medium storing a computer program executable by an access authentication apparatus, the computer program causing the access authentication apparatus to perform the steps of any one of the methods described above when run on the access authentication apparatus.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
(1) according to the intention recognition response method, the device, the equipment and the storage medium, the similarity between the query vector, the knowledge triple and the model deterministic knowledge is calculated, the deterministic knowledge is subjected to modeling specific conceptual marking, the intention perception representation is generated, the intention recognition response method can be used for capturing concept conversion related in a task-oriented dialogue, context information is not dispersed, and therefore the intention of a user is effectively recognized and a more accurate response is generated;
(2) according to the intention recognition response method, the intention recognition response device, the intention recognition response equipment and the intention recognition response storage medium, each pair of dialogs is analyzed into a dialog sequence comprising a single question sequence and a single response sequence, the problem of modeling by using a long dialog text is solved, the context hidden state and the query vector of each pair of dialog sequence are obtained through multiple rounds of coding, bidirectional semantic dependence can be captured better, and more accurate response is generated.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow chart of an intention identification response method according to the present embodiment;
fig. 2 is a structural diagram of an intention recognition responding apparatus according to the present embodiment.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
In other instances, well-known or widely used techniques, elements, structures and processes may not have been described or shown in detail to avoid obscuring the understanding of the present invention by the skilled artisan. Although the drawings represent exemplary embodiments of the present invention, the drawings are not necessarily to scale and certain features may be exaggerated or omitted in order to better illustrate and explain the present invention.
Aiming at the defects of dispersed context, high difficulty in modeling specific conceptual labels of deterministic knowledge, incapability of executing label-level multi-hop reasoning, concept transfer in a complex cross-task scene and the like of the existing task-based dialogue, the application provides an end-to-end task-oriented dialogue scheme based on an intention reasoning network, which can effectively utilize the dialogue history and external knowledge to accurately model the jump relation of an entity among multiple domains, and further accurately identify the intention of a user and generate an entity knowledge reply with intention perception. The method overcomes the defect that effective high-quality response can not be generated due to frequent occurrence of concept and intention conversion in daily conversation scenes in the conventional conversation system.
The meaning of the modules and parameters referred to in this application is first explained below:
(1) y is a response sequence, Y ═ Y 1 ,y 2 ,...,y n };
(2) Each round of dialog sequence is (Q) p ,Y p ) Wherein Q is p And Y p Respectively representing a p-th question sequence (with m marks) and a response sequence (with n marks);
(3) the external memory module is M, wherein X; b is]=(m 1 ,m 2 ,...,m l ) Where X is the dialog history, B is the knowledge base, and each entity in M is represented by a triplet, i.e., M i =(h,r,t);
(4)
Figure BDA0003605718940000071
Representation of a dialog sequence, in which
Figure BDA0003605718940000072
A representation of a sequence of questions is represented,
Figure BDA0003605718940000073
a representation representing a sequence of responses;
(5) context hidden state h enc =(h enc,1 ,h enc,2 ,...,h enc,i );
(6) Embeddable external matrix C ═ (C) 1 ,...,c k+1 );
Initial query vector
Figure BDA0003605718940000074
Figure BDA0003605718940000075
Is the query vector for the k-th hop;
in one aspect, the present embodiment provides an intention identification response method, and fig. 1 is a flowchart of the intention identification response method provided in the present embodiment, please refer to fig. 1, where the method includes:
(1) acquiring each round of conversation, and analyzing each round of conversation into an initial semantic representation of a conversation sequence comprising a single question sequence and a response sequence; performing multiple rounds of encoding on the initial semantic representation to generate a context hidden state and a query vector of each pair of dialog sequences; wherein the query vector of the previous pair of conversational sequences acts on the code of the initial semantic representation of the next pair of conversational sequences;
unlike conventional dialog system processing, the representation of the input is not obtained in units of all dialog sequences, but rather in a single question sequence Q p And a response sequence Y p The composed conversation sequence is a unit, and the conversation sequence can be specifically expressed as (Q) p Yp) in which Q p And Y p The p-th question sequence (with m markers) and the response sequence (with n markers) are indicated separately.
The set of response sequences Y can be represented by equation (1),
Y={y 1 ,y 2 ,...,y n } (1)
in this embodiment, the semantic role notation SRL tool concatenates each turn of the dialog sequence as "[ CLS]Q p [SEP]Y p [SEP]", wherein, [ CLS]The symbol is located at the beginning of the sentence, [ SEP]The flags are used to separate the input sequences. Will "[ CLS]Q p [SEP]Y p [SEP]"input to the BERT model, obtain the initial semantic representation of each round of dialog sequence, and output the initial semantic representation of the dialog sequence, see equation (2),
Figure BDA0003605718940000081
wherein the content of the first and second substances,
Figure BDA0003605718940000082
for the purpose of the presentation of a dialog sequence,
Figure BDA0003605718940000083
in order to be representative of the sequence of problems,
Figure BDA0003605718940000084
in order to express the response sequence, m and n are both natural numbers;
in a specific example, the initial semantic representation of the dialog sequence is input into a bidirectional long-short term memory network BiSTMCoding in too many rounds to generate the full context hidden state h of each dialog sequence enc,i And can be represented by formula (3).
h enc =(h enc,1 ,h enc,2 ,...,h enc,i ) (3)
Each dialog sequence round generates a query vector and acts on the initial semantic representation of the next dialog sequence round, finally a query vector based on all dialog sequences is obtained, which can be expressed as formula (4),
Figure BDA0003605718940000085
(2) constructing an external knowledge base based on multi-field conversation history, merging user key information according to the conversation history, and analyzing each piece of knowledge in the external knowledge base into a knowledge triple;
in one specific example, to perform fine-grained representation learning, each entity in the external knowledge base is parsed by the BERT network model into a knowledge triplet, including a head entity, a relationship entity, and a tail entity, which may be represented by equation (5),
m i =BERT(h,r,t) (5)
wherein h represents a head entity, r represents a relationship entity between the head entity and the tail entity, and t represents the tail entity.
The memory module of the external knowledge base is M, which is composed of knowledge triplets, i.e., (h, r, t) e.g., M, which can be expressed as formula (66),
M=[X;B]=(m 1 ,m 2 ,...,m l ) (6)
where X represents the dialog history and B represents the knowledge base.
In order to better encode each knowledge in the external knowledge base and make the knowledge more suitable for multi-hop reasoning and vector calculation, the method embeds the knowledge triples into a word vector space with strong entity relationship and semantic displacement information to obtain corresponding memory embedding (e) h ,e r ,e t ) See formula (7), and can be embedded into an external matrix to obtain a pairCorresponding knowledge embedding matrix
Figure BDA0003605718940000091
m i =(h,r,t);(e h ,e r ,e t ) (7)
C=(c 1 ,..., k+1 ) (8)
(3) Taking the context hidden state of the last pair of conversation sequences as an initial query vector to perform multi-hop query in the external knowledge base and update the query vector, calculating the similarity between the query vector and each piece of knowledge in the external knowledge base, and classifying the similarity to generate a global pointer;
specifically, circulating multi-hop is carried out in the knowledge embedding matrix according to the initial query vector, and attention weight values corresponding to the knowledge embedding matrix of each hop are respectively calculated;
and reading the memory network according to the weighted sum of the attention weight value corresponding to the knowledge embedding matrix of the current hop and the knowledge embedding matrix of the next hop, and obtaining the query vector of the next hop.
Hiding the context of the last pair of talk sequences in a state h enc,m+n As an initial query vector
Figure BDA0003605718940000092
In this embodiment, a natural number k is given, based on the query vector
Figure BDA0003605718940000093
Performing cyclic k-jump in the knowledge embedding matrix, and calculating the current knowledge embedding matrix
Figure BDA0003605718940000094
Corresponding attention weight
Figure BDA0003605718940000095
Can be represented by the formula (9),
Figure BDA0003605718940000096
embedding a matrix based on knowledge of a current hop
Figure BDA0003605718940000101
Corresponding attention weight
Figure BDA0003605718940000102
Knowledge embedding matrix with next hop
Figure BDA0003605718940000103
Weighted sum read memory network
Figure BDA0003605718940000104
Can be represented by the formula (10),
Figure BDA0003605718940000105
according to the current query vector
Figure BDA0003605718940000106
And read memory network
Figure BDA0003605718940000107
The next hop query vector, which can be represented as equation (11),
Figure BDA0003605718940000108
will query the vector
Figure BDA0003605718940000109
And knowledge triple vectors in the constructed external knowledge base
Figure BDA00036057189400001010
And performing dot product to obtain the similarity between the query vector and each piece of knowledge in the external knowledge base, and searching possible results in the constructed external knowledge base.
According to the query vector
Figure BDA00036057189400001011
And knowledge embedding matrix
Figure BDA00036057189400001012
And (3) generating an external knowledge global pointer G of two categories by the similarity between the two categories, namely an equation (12) and an equation (13), filtering out worthless external knowledge by the external knowledge global pointer, and selecting a plurality of pieces of knowledge with the highest value for further decoding.
Figure BDA00036057189400001013
G=(g 1 ,...g l ) (12)
In a specific embodiment, a Sigmoid function is used to convert the multi-tag classification problem into a binary problem of a single tag so as to obtain an external knowledge global pointer.
(4) Carrying out knowledge screening in an external knowledge base according to the global pointer, calculating similarity among a query vector, a screened knowledge triple and model deterministic knowledge, capturing target knowledge and generating intention perception representation;
specifically, in order to model a specific conceptual marker for deterministic knowledge, similarity between a query vector and a head entity marker of a screened knowledge triple and model deterministic knowledge is calculated, and tail entity markers of related knowledge triples are integrated, so that context information is not scattered. The label-cascade consensus inference and the multi-hop inference are responsible for capturing specific target information from breadth and depth, respectively, to generate an intent-perception representation.
(5) And performing fine grain decoding and coarse grain decoding according to the intention perception representation to respectively obtain respective context perception response sequences and corresponding probabilities thereof, and taking the response sequence with the maximum probability as an output response sequence.
For coarse grain decoding, a context hiding state h is computed dec,t Dialog history note of denotes h' dec,t And hide the context in state h dec,t With dialogue historical attention representing h' dec,t Concatenating to obtain a coarse-grained context-aware output representation
Figure BDA0003605718940000111
Can be represented by the formula (14),
Figure BDA0003605718940000112
for fine-grained decoding, a context-hidden state h is computed dec,t Dialog history note of denotes h' dec,t Will hide the state h dec,t With dialogue historical attention representing h' dec,t Connecting and merging intent-aware representations I dec,t Obtaining an output representation o of fine-grained context sensing c,t Which can be represented by the formula (15),
o c,t =W[h dec,t ,h′ dec,t ,I dec,t ] (15)
the intention inference network IR-Net provided in this embodiment is actually a memory-enhanced Seq2Seq model, and aims to generate the most likely response sequence, calculate the probabilities of the two decoding methods according to the response sequences obtained by the fine-grain decoding and the coarse-grain decoding, respectively, define the probability of the response sequence Y as shown in equation (16),
Figure BDA0003605718940000113
and taking the response sequence corresponding to the probability maximum value obtained according to the two decoding methods as a final output result of the decoder, namely a final generated response sequence.
It should be noted that although in the above-described embodiments, the operations of the methods of the embodiments of the present specification are described in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
On the other hand, the embodiment further provides an intention identification responding apparatus, fig. 2 is a structural diagram of the intention identification responding apparatus provided by the embodiment, please refer to fig. 2, the apparatus includes:
a contextual dialog history editor for obtaining each turn of dialog and parsing each turn of dialog into an initial semantic representation of a dialog sequence comprising a single question sequence and a single response sequence; performing multiple rounds of encoding on the initial semantic representation to generate a context hidden state and a query vector of each pair of dialog sequences; wherein the query vector of the previous pair of conversational sequences acts on the code of the initial semantic representation of the next pair of conversational sequences;
in a specific embodiment, the context dialog history encoder encodes history dialogues for multiple times one by one through a bidirectional long-short term memory network BiSTM to capture bidirectional semantic dependence; the initial semantic representation of the input is obtained by taking a dialogue sequence consisting of a single question sequence and a response sequence as a unit, and the defect of modeling a long dialogue text is overcome.
The external knowledge base is constructed based on multi-field conversation history, key information of a user is blended according to the conversation history, and each piece of knowledge information in the external knowledge base is analyzed into a knowledge triple;
taking the context hidden state of the last pair of conversation sequences as an initial query vector to perform multi-hop query in the external knowledge base and update the query vector, calculating the similarity between the query vector and each piece of knowledge in the external knowledge base, and classifying the similarity to generate a global pointer;
the external knowledge base module is a key part of a task-oriented dialog system as a basis for knowledge query. The external knowledge base adopts a Memory Network (MN) to store global cross-domain knowledge shared by the encoder and the encoder, fully applies the encoder and the decoder, is constructed based on multi-domain conversation history, and is dependent on the conversation history to blend key information of a user into the external knowledge base so as to obtain more accurate response.
The intention reasoning module is used for screening knowledge in an external knowledge base according to the global pointer, calculating similarity among a query vector, the screened knowledge triple and model deterministic knowledge, capturing target knowledge and generating intention perception representation;
the integrity and accuracy of the generated response are improved by mining potential inference chains at a fine granularity. Meanwhile, the module can well deal with concept transfer which may occur in a complex cross-task scene, and can clearly excavate entity jump relations among multiple fields to generate replies with intention perception.
And the layered reply decoder is used for performing fine-grained decoding and coarse-grained decoding according to the intention perception representation, respectively obtaining the response sequence of the respective context perception and the corresponding probability thereof, and taking the response sequence with the maximum probability as an output response sequence.
The module utilizes a hierarchical structure mechanism to decode the response sequence, when a dialogue sequence is decoded, a coarse-grained LSTM decoder and a fine-grained LSTM decoder are used for simultaneously calculating the probability, and finally, the sequence corresponding to the maximum probability value obtained by the two decoders is taken as the final output result of the decoder, namely the finally generated response sequence.
In one particular embodiment, there are the following dialogs:
User:Please check the temperature for me today.
IR-Net:?
the specific treatment process comprises the following steps:
(1) inputting the current turn of dialog to a context history dialog encoder module, generating an initial semantic representation of a query vector and a triplet (is);
(2) inputting the initial semantic representation into an external knowledge base for query and selection to generate a global pointer G;
(3) combining the generated global pointer G with an external knowledge base to obtain a triple representation of related knowledge (Monday, low _ temp,20f) and knowledge (Monday, high _ temp,30f), and combining joint reasoning and multi-hop reasoning to obtain a corresponding intention perception representation;
(4) inputting the above-mentioned intention perception representation into a hierarchical reply decoder, and selecting a final response sequence (today, temperature,20f-30f) obtained by coarse-grained and fine-grained analysis, and outputting the final response sequence, namely:
User:Please check the temperature for me today.
IR-Net:Toady’s temperature is 20f-30f.
the embodiment also provides an intention identification response device, which includes at least one processor and at least one memory, where the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the intention identification response method in the first embodiment, and specific steps refer to the first embodiment and are not described herein again; in this embodiment, the types of the processor and the memory are not particularly limited, for example: the processor may be a microprocessor, digital information processor, on-chip programmable logic system, or the like; the memory may be volatile memory, non-volatile memory, a combination thereof, or the like.
The intent recognition response device may also communicate with one or more external devices (e.g., keyboard, pointing terminal, display, etc.), with one or more terminals that enable a user to interact with the intent recognition response device, and/or with any terminals (e.g., network card, modem, etc.) that enable the intent recognition response device to communicate with one or more other computing terminals. Such communication may be through an input/output (I/O) interface. Also, the intent recognition response device may also communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network, such as the internet) via a Network adapter.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method. The computer-readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some service interfaces, devices or units, and may be an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, and the memory may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above description is only an exemplary embodiment of the present disclosure, and the scope of the present disclosure should not be limited thereby. That is, all equivalent changes and modifications made in accordance with the teachings of the present disclosure are intended to be included within the scope of the present disclosure. Embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. An intent recognition response method, comprising the steps of:
acquiring each round of conversation, and analyzing each round of conversation into an initial semantic representation of a conversation sequence comprising a single question sequence and a single response sequence; performing multiple rounds of encoding on the initial semantic representation to generate a context hidden state and a query vector of each pair of dialog sequences; wherein the query vector of the previous pair of conversational sequences acts on the code of the initial semantic representation of the next pair of conversational sequences;
constructing an external knowledge base based on multi-field conversation history, merging user key information according to the conversation history, and analyzing each piece of knowledge in the external knowledge base into a knowledge triple;
taking the context hidden state of the last pair of conversation sequences as an initial query vector to perform multi-hop query in the external knowledge base and update the query vector, calculating the similarity between the query vector and each piece of knowledge in the external knowledge base, and classifying the similarity to generate a global pointer;
carrying out knowledge screening in an external knowledge base according to the global pointer, calculating similarity among a query vector, a screened knowledge triple and model deterministic knowledge, capturing target knowledge and generating intention perception representation;
and performing fine grain decoding and coarse grain decoding according to the intention perception representation to respectively obtain respective context perception response sequences and corresponding probabilities thereof, and taking the response sequence with the maximum probability as an output response sequence.
2. The intention-recognition response method of claim 1, further comprising:
and representing each knowledge in the external knowledge base as a knowledge triple comprising a head entity, a relation and a tail entity, and embedding each knowledge triple into a memory network to obtain a corresponding knowledge embedding matrix.
3. The method for recognizing and responding as claimed in claim 2, wherein the multi-hop query and updating the query vector in the external knowledge base using the context hidden state of the last pair of dialog sequences as the initial query vector are specifically:
performing circulating multi-hop in a knowledge embedding matrix according to the initial query vector, and respectively calculating attention weight values corresponding to the knowledge embedding matrix of each hop;
and reading the memory network according to the weighted sum of the attention weight value corresponding to the knowledge embedding matrix of the current hop and the knowledge embedding matrix of the next hop, and obtaining the query vector of the next hop.
4. The intent recognition response method according to claim 1, wherein the calculating the similarity between the query vector and each piece of knowledge in the external knowledge base is specifically:
and performing dot product on the query vector and the knowledge triples to obtain the similarity between the query vector and each piece of knowledge in an external knowledge base.
5. The intent recognition response method according to claim 1, wherein the similarity between the query vector, the screened knowledge triples and the model deterministic knowledge is calculated, the target knowledge is captured, and the intent perception representation is generated, specifically:
and calculating the similarity between the query vector, the head entity of the knowledge triple and the model deterministic knowledge, and integrating the tail entity of the related knowledge triple according to the similarity to obtain the target knowledge and generate the intention perception representation.
6. The intent recognition response method of claim 1, wherein the coarse-grained decoding is specifically:
and calculating the conversation history attention representation in the hidden state, and connecting the hidden state with the conversation history attention representation to obtain an output representation of coarse-grained context sensing.
7. The intent recognition response method of claim 1, wherein the fine-grained decoding is specifically:
and calculating conversation history attention representation in a hidden state, connecting the hidden state with the conversation history attention representation and fusing the intention perception representation to obtain output representation of fine-grained context perception.
8. An intent recognition response device, comprising:
the context dialogue history editor is used for acquiring each round of dialogue and analyzing each round of dialogue into an initial semantic representation of a dialogue sequence comprising a single question sequence and a single response sequence; performing multiple rounds of encoding on the initial semantic representation to generate a context hidden state and a query vector of each pair of dialog sequences; wherein the query vector of the previous pair of conversational sequences acts on the code of the initial semantic representation of the next pair of conversational sequences;
the external knowledge base is constructed based on multi-field conversation history, key information of a user is merged according to the conversation history, and each piece of knowledge information in the external knowledge base is analyzed into a knowledge triple;
taking the context hidden state of the last pair of conversation sequences as an initial query vector to perform multi-hop query in the external knowledge base and update the query vector, calculating the similarity between the query vector and each piece of knowledge in the external knowledge base, and classifying the similarity to generate a global pointer;
the intention reasoning module is used for screening knowledge in an external knowledge base according to the global pointer, calculating similarity among a query vector, a screened knowledge triple and model deterministic knowledge, capturing target knowledge and generating intention perception representation;
and the layered reply decoder is used for performing fine-grained decoding and coarse-grained decoding according to the intention perception representation to respectively obtain the response sequence of the respective context perception and the corresponding probability thereof, and taking the response sequence with the maximum probability as an output response sequence.
9. An intention-recognition-response device, comprising at least one processing unit and at least one memory unit, wherein the memory unit stores a computer program which, when executed by the processing unit, causes the processing unit to carry out the steps of the method according to any one of claims 1 to 7.
10. A storage medium storing a computer program executable by an access authentication device, the computer program causing the access authentication device to perform the steps of the method of any one of claims 1 to 7 when run on the access authentication device.
CN202210415455.4A 2022-04-20 2022-04-20 Intention recognition response method, device, equipment and storage medium Pending CN114817467A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210415455.4A CN114817467A (en) 2022-04-20 2022-04-20 Intention recognition response method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210415455.4A CN114817467A (en) 2022-04-20 2022-04-20 Intention recognition response method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114817467A true CN114817467A (en) 2022-07-29

Family

ID=82505343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210415455.4A Pending CN114817467A (en) 2022-04-20 2022-04-20 Intention recognition response method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114817467A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438170A (en) * 2022-11-09 2022-12-06 北京红棉小冰科技有限公司 Dialog model generation method, dialog model application method, dialog model generation system, dialog model application system, dialog model generation equipment and dialog model application equipment
CN115545853A (en) * 2022-12-02 2022-12-30 云筑信息科技(成都)有限公司 Searching method for searching suppliers

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438170A (en) * 2022-11-09 2022-12-06 北京红棉小冰科技有限公司 Dialog model generation method, dialog model application method, dialog model generation system, dialog model application system, dialog model generation equipment and dialog model application equipment
CN115545853A (en) * 2022-12-02 2022-12-30 云筑信息科技(成都)有限公司 Searching method for searching suppliers

Similar Documents

Publication Publication Date Title
US11657230B2 (en) Referring image segmentation
CN110334339B (en) Sequence labeling model and labeling method based on position perception self-attention mechanism
US11381651B2 (en) Interpretable user modeling from unstructured user data
CN112270379A (en) Training method of classification model, sample classification method, device and equipment
CN114817467A (en) Intention recognition response method, device, equipment and storage medium
CN112307168B (en) Artificial intelligence-based inquiry session processing method and device and computer equipment
CN113268609A (en) Dialog content recommendation method, device, equipment and medium based on knowledge graph
US20180365594A1 (en) Systems and methods for generative learning
CN111783873B (en) User portrait method and device based on increment naive Bayes model
CN112200664A (en) Repayment prediction method based on ERNIE model and DCNN model
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN111563161B (en) Statement identification method, statement identification device and intelligent equipment
CN115034201A (en) Augmenting textual data for sentence classification using weakly supervised multi-reward reinforcement learning
CN115186147B (en) Dialogue content generation method and device, storage medium and terminal
CN116484024A (en) Multi-level knowledge base construction method based on knowledge graph
CN111966811A (en) Intention recognition and slot filling method and device, readable storage medium and terminal equipment
Shin et al. End-to-end task dependent recurrent entity network for goal-oriented dialog learning
CN112183062B (en) Spoken language understanding method based on alternate decoding, electronic equipment and storage medium
CN112560440A (en) Deep learning-based syntax dependence method for aspect-level emotion analysis
CN116702765A (en) Event extraction method and device and electronic equipment
CN116362242A (en) Small sample slot value extraction method, device, equipment and storage medium
CN115495566A (en) Dialog generation method and system for enhancing text features
CN113849634B (en) Method for improving interpretability of depth model recommendation scheme
CN117371447A (en) Named entity recognition model training method, device and storage medium
CN112328774A (en) Method for realizing task type man-machine conversation task based on multiple documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination