CN116010583A - Cascade coupling knowledge enhancement dialogue generation method - Google Patents

Cascade coupling knowledge enhancement dialogue generation method Download PDF

Info

Publication number
CN116010583A
CN116010583A CN202310260375.0A CN202310260375A CN116010583A CN 116010583 A CN116010583 A CN 116010583A CN 202310260375 A CN202310260375 A CN 202310260375A CN 116010583 A CN116010583 A CN 116010583A
Authority
CN
China
Prior art keywords
entity
knowledge
dialogue
model
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310260375.0A
Other languages
Chinese (zh)
Other versions
CN116010583B (en
Inventor
周世奉
程典
孙晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Original Assignee
Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Artificial Intelligence of Hefei Comprehensive National Science Center filed Critical Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority to CN202310260375.0A priority Critical patent/CN116010583B/en
Publication of CN116010583A publication Critical patent/CN116010583A/en
Application granted granted Critical
Publication of CN116010583B publication Critical patent/CN116010583B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to the field of natural language processing and discloses a cascade coupling knowledge enhancement dialogue generation method. The entity identification module, the entity selection module and the entity attribute selection module designed by the invention are independent in form and have no coincident relation dependence, can independently provide services for tasks such as various entity identifications, entity selections and the like, and the three modules work together to generate background real knowledge for dialogue tasks, so that the information quantity of the generated dialogue is improved.

Description

Cascade coupling knowledge enhancement dialogue generation method
Technical Field
The invention relates to the field of natural language processing, in particular to a cascade coupling knowledge enhancement dialogue generation method.
Background
The basic task of the dialog system is to simulate the daily communication of humans, replying to the user with appropriate sentences. Because of the characteristic of machine automation, the system can replace expensive manual service, liberate labor force and save cost for social operation. Early dialog systems generated dialog replies according to specific grammar rules designed by experts, which lacked flexibility and failed the Turing test. With the rapid development of the internet industry, platforms such as microblog and WeChat provide virtualized communication platforms for people, and people step forward a large proportion of conversations to a PC end and a mobile end, so that massive conversational data can be obtained. The deep learning technology in 2015 starts to enlarge the wonderful colors in the fields of computer vision, natural language processing and recommended search, and can obviously surpass the traditional machine learning method in many tasks. The combination of massive dialogue data and rapid deep learning technology breeds a seq2seq (Sequence to Sequence) model based on end to end, and the model has the advantages of being completely based on data driving and needing no artificial design rule characteristics, and becomes a mainstream technology for automatic dialogue generation.
The neural network dialogue generating system is completely based on data driving without the grammar rule of a designated language, and optimizes the maximum likelihood estimation probability based on a large-scale text corpus training model. Research experiments find that the occurrence frequency of some general replies is very high in the generation result of the method, such as general reply texts like "haha", "i am also" and the like. These general replies, while considered an acceptable result when evaluated experimentally, in a practical scenario tend to lack semantic relevance and co-situational expressive capabilities of the conversations above, severely affecting the user experience and participation, resulting in eventual loss of patience and interest for the user. While sequence-to-sequence based dialog generation systems may generate grammatically correct answers, these answers tend to be generic, lacking substantial information, rather than specific, co-morbid. How to enhance the information amount of dialogue reply and how to maintain the correlation of dialogue generation is one difficulty of current natural language generation research.
The current dialog model based on seq2seq, the training process of the model is actually to learn the distribution probability of words under specific conditions, but has no autonomous cognitive ability; the model is more dependent on co-occurrence relationships between the terms of the training corpus to learn how to generate the distribution probabilities of the terms. The lack of comments on the background information and related knowledge can cause serious degradation of the quality of the conversation generated by the model, so that the conversation generated by the model is greatly different from the conversation generated by human beings, and the model often generates a reply with low information content and repeated and tedious property. The knowledge graph has rich information and clear structure, contains various entities and various attribute information of the entities in real life, provides knowledge data support for the fields of recommendation systems, natural language understanding, computer vision and the like, and can improve the performance and the interpretability of the model in the corresponding knowledge field, so that the knowledge graph can provide powerful data support for the dialogue model, and further improve the cognitive ability of the model.
Given the above information and the vast knowledge graph of a conversation, finding knowledge triples that fit the content of the conversation and generating corresponding conversation replies on the basis of these are a complex and difficult task.
Disclosure of Invention
In order to solve the technical problems, the invention provides a cascade-coupled knowledge enhancement dialogue generation method, which comprises an entity identification module, an entity selection module, an entity attribute selection module and a knowledge dialogue generation module. The four modules are mutually independent and operate together to form a knowledge-driven dialogue pipeline, and finally knowledge-driven dialogue reply taking the knowledge graph as background information is output.
In order to solve the technical problems, the invention adopts the following technical scheme:
cascade coupling knowledge enhancement dialogue generation method, and dialogue up and downText (A)
Figure SMS_3
,/>
Figure SMS_4
Representing dialog context +.>
Figure SMS_7
N is the dialogue context +.>
Figure SMS_2
The total number of sentences in (a); knowledge pattern->
Figure SMS_6
From multiple knowledge triples->
Figure SMS_9
Constitution of->
Figure SMS_10
Representing entity name->
Figure SMS_1
Representing attribute name->
Figure SMS_5
Representing attribute values; searching knowledge triples related to current dialogue intention from knowledge graph through dialogue generating model and according to dialogue context of current dialogue
Figure SMS_8
As background information for dialog generation, generating knowledge-driven dialog replies;
the dialog generation model includes: the system comprises an entity identification module, an entity selection module, an entity attribute selection module and a knowledge dialogue generation module;
the entity recognition module is used for recognizing all possible entities in sentences in the dialogue as candidate entities; the entity identification module comprises a knowledge entity identification NER main model, and the knowledge entity identification NER main model identifies the entity in the dialogue through the BERT pre-training model and the conditional random field as a candidate entity;
the entity selection module selects an optimal entity capable of reflecting topic intention of the current dialogue from candidate entities, and specifically comprises the following steps: the method comprises the steps of splicing a current dialogue text, candidate entities and all attribute names of the candidate entities, then transmitting the spliced current dialogue text, the candidate entities and all attribute names of the candidate entities into a BERT pre-training model, outputting matching scores of the current dialogue text and each candidate entity through a CLS vector and a layer of full-connection network, and taking the candidate entity corresponding to the highest matching score as the best entity capable of reflecting topic intention of the current dialogue;
the entity attribute selection module is used for splicing the current dialogue text with the entity name of the optimal entity removed and the entity attributes of the optimal entity, then transmitting the spliced current dialogue text and the entity attributes of the optimal entity into the BERT pre-training model, outputting the matching scores of the current dialogue text and the entity attributes through a CLS vector and a layer of full-connection network, and taking the entity attribute corresponding to the highest matching score as the optimal entity attribute; the entity attribute comprises an attribute name and an attribute value of the entity;
knowledge dialogue generation module for forming knowledge triples of dialogue context, best entity and best entity attribute
Figure SMS_11
And inputting a BART pre-training model after splicing to generate dialogue replies.
Specifically, the training process for knowledge entity recognition NER main model includes the following steps:
step one, training data input knowledge entity identifies NER main model, training data includes text T and correct entity e;
step two, BIOS coding is carried out on the text according to the position of the correct entity e in the text T, and the coded output is used as a training labeling label
Figure SMS_12
Thirdly, converting the sequence after text T coding through a BERT pre-training model to obtain a corresponding hidden vector v:
Figure SMS_13
step four, passing through a conditional random field
Figure SMS_14
Outputting the final distribution probability
Figure SMS_15
Step five, calculating a loss function
Figure SMS_16
Iteratively optimizing weights of the BERT pre-training model and the conditional random field by a gradient descent method; />
Figure SMS_17
Is a cross entropy loss function.
Specifically, the entity recognition model further comprises a jieba word segmentation screening recognition model and a rule screening recognition model; the jieba word segmentation screening and identifying model sets the part of speech of all entity nouns of the knowledge graph as 'entity' in a jieba word list library in advance, and words with the part of speech of 'entity' in the conversation are screened out through the jieba word segmentation and serve as candidate entities; the rule screening recognition model uses nouns within the title and quotation marks in the dialog context as candidate entities.
Specifically, before training the BERT pre-training model of the entity selection module, determining a required negative sample by a hybrid negative sampling method specifically includes:
(1) Adding the obtained candidate entities to a negative sample set after removing the correct entities from all the candidate entities in the dialogue output by the entity recognition module;
(2) L entities are randomly sampled in the entity set in the field of the positive sample and added to the negative sample set.
Specifically, the entity selection module and the ERNIE pre-training model adopted by the BERT pre-training model in the entity attribute selection module lose the function
Figure SMS_18
Wherein the method comprises the steps of
Figure SMS_19
Representing positive sample fraction, ++>
Figure SMS_20
Representing negative sample score, ++>
Figure SMS_21
The spacing of positive and negative samples is preset for the model.
Specifically, before the knowledge dialogue generation module generates dialogue reply, the dialogue context and the knowledge triples are only determined when the topic intention of the dialogue is judged to need to be added with knowledge
Figure SMS_22
And inputting a BART pre-training model after splicing, and generating dialogue replies.
Compared with the prior art, the invention has the beneficial technical effects that:
according to the invention, through the multipath entity identification module, the mixed negative sampling optimized entity selection module and the entity attribute selection module of the dialogue generation model, the intention entity and the specific entity attribute of the dialogue can be output, and background knowledge data support can be provided for the subsequent knowledge dialogue generation module. The entity identification module, the entity selection module and the entity attribute selection module designed by the invention are independent in form and have no coincident relation dependence, can independently provide services for tasks such as various entity identifications, entity selections and the like, and the three modules work together to generate background real knowledge for dialogue tasks, so that the information quantity of the generated dialogue is improved. Meanwhile, the assembly line knowledge extraction framework formed by the three modules can be used as a plug-and-play plug-in framework, and only a knowledge graph data set in the related field is needed to be provided, so that reference external knowledge can be provided for any dialogue text, and the auxiliary model can generate a dialogue with more abundant information.
Drawings
FIG. 1 is a flow chart of a dialog generation model of the present invention;
FIG. 2 is a schematic diagram of an entity selection module according to the present invention;
FIG. 3 is a schematic diagram of an entity attribute selection module according to the present invention;
FIG. 4 is a schematic diagram of an intent discrimination model of the present invention;
fig. 5 is a schematic diagram of the knowledge session generation module of the present invention.
Detailed Description
A preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.
Problem definition: given dialog context information for a current user dialog
Figure SMS_26
,/>
Figure SMS_27
Representing dialog context +.>
Figure SMS_31
N is the dialogue context +.>
Figure SMS_25
The total number of sentences in (a); knowledge pattern->
Figure SMS_28
,/>
Figure SMS_34
For knowledge graph->
Figure SMS_36
The total amount of the middle knowledge triples, wherein the knowledge graph is +.>
Figure SMS_23
From multiple knowledge triples->
Figure SMS_29
Composition, triplet->
Figure SMS_32
The name of the entity is indicated and,
Figure SMS_35
representing attribute names,/>
Figure SMS_24
Representing attribute values; it is necessary to find knowledge triples about the user's intention from a huge knowledge graph according to the dialog context of the current user dialog>
Figure SMS_30
Will->
Figure SMS_33
As background information for dialog generation, a knowledge-driven dialog reply Y is generated.
In order to enhance the information quantity and semantic relativity in a dialogue system, the invention designs a dialogue generation model, and a flow chart of the dialogue generation model is shown in figure 1. The dialog generation model consists of the following four parts: (1) an entity identification module; (2) an entity selection module; (3) an entity attribute selection module; (4) a knowledge dialogue generation module.
The entity identification module is used for identifying all possible topic entities in the conversation as candidate entities;
the entity selection module is used for selecting the most suitable entity (the best entity) from candidate entities;
the entity attribute selection module is used for selecting the most suitable entity attribute (the best entity attribute) from the attribute relation set of the best entity;
the knowledge dialogue generating module firstly judges whether the current dialogue needs knowledge or not, if not, the knowledge dialogue generating module generates a boring dialogue reply, and if so, the knowledge-driven dialogue reply is generated.
The specific implementation and mechanism of action of each module will be described in detail in turn.
1. Entity recognition module
The purpose of entity identification is to find all possible candidate entities from all sentences in the conversation above. The invention provides a multipath entity identification module which can rapidly screen entities and ensure the recall rate of correct entities.
The multiple entity recognition module includes a knowledge entity recognition NER master model and two auxiliary rule recognition models that collectively and comprehensively output all possible candidate entities in the dialog context. Knowledge entity identification NER master model identifies the knowledge entity in the output dialogue through the BERT pre-training model and the conditional random field.
The main training process for knowledge entity identification NER main model comprises the following steps:
step 1, inputting training data, wherein the training data comprises a text T and a correct entity e;
step 2, BIOS coding the text according to the position of the correct entity e in the text T, and outputting the coded text as a training label
Figure SMS_37
The sequence of Step 3 and text T is converted into corresponding hidden vectors v through a BERT pre-training model:
Figure SMS_38
step 4, outputting final distribution probability through a conditional random field
Figure SMS_39
Step 5, calculating the loss function
Figure SMS_40
Iteratively optimizing weights of the BERT pre-training model and the conditional random field by a gradient descent method; />
Figure SMS_41
Is a cross entropy loss function.
The BIOS coding is a standard coding method for entity identification, and its rule is to assign a corresponding coding symbol to each character in a text, label the initial character of an entity fragment in the text as B (Begin), label other positions of the entity as I (intermediate), label a single-character entity as S (single), and label other fragments except the entity as O (other). For example, < text: is you aware of the composer's Zhang Sando? < entity: zhang Sanj can be encoded as "OOOOOBIOO".
Meanwhile, the invention keeps in mind that the character length of most entities in the knowledge graph is greater than 2, and the knowledge entities with single characters exist in the actual dialogue, such as 'do you recite the first poem of' first? "the main entity" nail "in" is provided. Because most of the entities of the training set are entities with the length being more than 2, the trained model is insensitive to single character entities, and the recognition effect is not satisfactory. At the same time, for some entities with nested relationships, the model may be partially careless, such as for "do you read the book of the three-text set? The model outputs the wrong entity Zhang Sanning, ignoring the correct entity Zhang Sanwenji. The invention hopes that the model can accurately screen out knowledge entities with various lengths to ensure the recall rate of correct entities, so that two auxiliary rule recognition models are additionally designed to recognize NER main model spam on the knowledge entities, and the main model is prevented from neglecting correct entities.
The auxiliary rule recognition model comprises a jieba word segmentation screening recognition model and a rule screening recognition model:
(1) The jieba word segmentation screening recognition model screens entities through jieba word segmentation, a jieba word list library is a current most popular Chinese word segmentation tool, the part of speech of all entity nouns of a knowledge graph is set as 'entity' in the jieba word list library in advance, and words with the part of speech of 'entity' are screened out through the jieba word segmentation to serve as candidate entities.
(2) The rule screening recognition model screens out nouns in the title number, quotation marks as candidate entities because the entities such as films, novels, music and the like are generally accompanied by special symbols of the title number, quotation marks and the like in Chinese dialogues through the rule screening entities.
2. Entity selection module for hybrid negative sampling optimization
The entity identification module screens out all possible candidate entities, and the entity selection module selects an entity capable of determining topic intention of the current dialogue from the candidate entities. The difficulty faced by the entity selection module is that the human beings are free and unconstrained when in dialogue exchange, a plurality of entities are often involved, and at the same time, the phenomena of thinking jump, topic switching of dialogue, backtracking of the topics and the like, which are intended to be converted, often occur. Whereas the dataset has only dialog text and its corresponding correct knowledge triples, which means that the dataset of the task has only positive samples and no negative samples. Usually, training for deep learning requires a large number of positive and negative samples to perform training optimization iteration to give the model the ability to distinguish between positive and negative samples, so that the invention requires sampling a certain number of negative samples to perform training. Since the quality of the negative sample is an important factor in determining the upper limit of the current task model, how to design a negative sampling strategy for an entity, and how to scale the samples is of great concern to the present invention.
The invention designs a mixed negative sampling method aiming at an entity selection module. The negative sample consists of two parts, wherein the first part is dialog data of the training set, all candidate entities of the dialog are comprehensively output through the multipath entity recognition module mentioned in the previous section, the candidate entities outside the positive body are removed as the negative sample, the method simulates a negative sample set which needs to be distinguished when the model is inferred, and to a certain extent, the negative sample of the training set and an interference sample encountered when the model is inferred are generated by the same entity recognition model and belong to the same distribution sample. The negative samples distributed simultaneously with the model reasoning can maximize the accuracy of entity selection during the model reasoning. The second part is that L entities are randomly sampled in an entity set in the field of positive samples to serve as negative samples, the negative samples in the same field are generally samples which are easy to confuse during reasoning of the model, and the stability and generalization of the model can be improved by adding the difficult sample learning. It should be noted that the domain where each entity is located cannot be distinguished in the knowledge graph, and the entity with the same attribute as the correct entity is reversely found out from the correct attribute to be used as the domain entity set. By generating negative samples together by the two negative sampling methods, training of the model can be facilitated.
The entity selection module is shown in fig. 2, after positive and negative samples are determined, the [ < CLS > current dialogue text < SEP > entity and all attribute names of the entity ] are transmitted into the BERT pre-training model, the matching score of the current dialogue text and the entity is output through a layer of fully connected network by the CLS vector, the entity with the highest matching score is used as the best entity, wherein < CLS > is a classification symbol of the BERT pre-training model, and < SEP > is an interval identifier of the two texts. The BERT pre-training model maps the input text sequence into a vector E, and the final representation vector T is obtained through multi-layer coding. After model training, matching scores (Score) of the text can be deduced.
In the view of figure 2,
Figure SMS_44
an initial vector representing dialog text, +.>
Figure SMS_46
An initial vector representing the name of an entity,/>
Figure SMS_49
An initial vector representing attribute names, ++>
Figure SMS_43
Final representation vector representing dialog text, +.>
Figure SMS_45
A final representation vector representing the entity name, +.>
Figure SMS_48
Final representation vector representing attribute names, +.>
Figure SMS_51
Representing class symbols<CLS>Is>
Figure SMS_42
Representing class symbols<SEP>Is>
Figure SMS_47
Representing the final classification vector, +.>
Figure SMS_50
Representing symbols<SEP>Is used to represent the vector.
The BERT model adopted in the task is an ERNIE model published by hundred-degree open sources, a Knowledge Masking pre-training mode is additionally added to a BERT pre-training mechanism, the BERT model is sensitive to a knowledge entity, and the BERT model is matched with the current entity selection task. The loss function uses MarginLoss loss in a pair-wise mode, compared with the loss function in a single-point mode, the MarginLoss loss focuses on score fitting of samples, the pair-wise mode focuses on size sorting relation between positive and negative samples, and sorting tasks on entity scores in model reasoning are more relevant and suitable for current tasks. Loss function
Figure SMS_52
Wherein the method comprises the steps of
Figure SMS_53
Representing positive sample fraction, ++>
Figure SMS_54
Representing negative sample score, ++>
Figure SMS_55
The larger the margin value is, the larger the distinguishing force of the model on the positive sample and the negative sample is.
3. Entity attribute selection module
The core idea of the entity attribute selection module is to match the similarity of the dialogue and the knowledge triples, and obtain the optimal entity attribute according to the similarity score ranking. Meanwhile, the main task of the module is to match the current dialogue text with the entity attribute, and the module should focus on matching the attribute, but the invention discovers that the entity names in a certain proportion of dialogue samples appear in the current dialogue text, thereby interfering the performance of the entity attribute selection module. Therefore, the invention adopts the operation of removing the entity name of the knowledge triplet and covering the entity name in the current dialogue text, so that the entity attribute selection module focuses on the matching of the dialogue and the entity attribute. The entity attribute selection module is shown in fig. 3, and the module flow is to transfer the current dialogue text < SEP > entity attribute name and attribute value after removing the entity name into the BERT pre-training model, output the matching Score (Score) of the current dialogue text and the entity attribute through a layer of full-connection network by CLS vector, and select the entity attribute with the highest matching Score as the best entity attribute. The loss function of the entity attribute selection module uses the same MarginLoss loss as the entity selection module.
Optimal entity and optimal entity attributes compose a knowledge triplet related to user intent
Figure SMS_56
In the view of figure 3 of the drawings,
Figure SMS_57
an initial vector representing attribute values, +.>
Figure SMS_58
Representing the final representation vector of attribute values.
4. Knowledge dialogue generation module
The knowledge dialogue generation module is used for obtaining the knowledge triples
Figure SMS_59
As background information for the dialog system, knowledge-driven dialog replies are generated.
The knowledge dialog generation module cannot blindly incorporate background knowledge because dialogs such as "thank you", "bye you" do not contain substantial intent information themselves, and do not require any background knowledge. Therefore, the invention designs an intention judging model firstly, and as shown in fig. 4, whether knowledge is added or not is selected by training a classification model capable of judging the intention of a user. The intention judging model outputs probability P, when P is smaller than a set threshold value, knowledge is selected to be added in the knowledge dialogue generating module, and otherwise, knowledge is not required to be added.
Knowledge dialogue generation moduleThe block is shown in fig. 5, the invention adopts a BART pre-training model of open source of complex denier university, and adopts a pre-training mode of a full-transform network, which is particularly suitable for dialogue generation tasks. The invention relates to dialogue context
Figure SMS_61
And knowledge triplet->
Figure SMS_64
After splicing, get->
Figure SMS_66
Wherein
Figure SMS_62
Representing the sentence head symbol, < >>
Figure SMS_65
Symbols are separated for text. After coding by the coder, the generated words are decoded by the decoder in turn, and the decoding process encounters the termination symbol +.>
Figure SMS_67
After that, finally obtaining the dialogue reply
Figure SMS_68
. In order to enhance the diversity of the generated results, the decoding strategy adopts top-k and top-p decoding strategies. />
Figure SMS_60
Representing the i-th word in the dialog reply, < +.>
Figure SMS_63
Is the text length of the dialog reply.
The performance of the invention was verified on the kdconv dialogue dataset, and as can be seen from table 1, the invention is higher than other algorithms on the BLEU, generation _f1 automatic index. The BLEU-1, BLEU-2, BLEU-4 and generation_F1 are common automatic indexes for measuring the coincidence degree of the dialogue reply generated by the model and the reference dialogue reply, and the higher the BLEU, generation _F1 value is, the better the dialogue effect generated by the model is.
TABLE 1 automated evaluation index comparison Table for the inventive and better performing algorithms
Figure SMS_69
Daily conversations are various, flexible and changeable in form, and have no unified standard. The automation index based on overlap does not necessarily enable accurate assessment of dialog generation model performance. Therefore, the invention also carries out a manual evaluation and verification link, invites a plurality of personnel with a research background in the field of natural language processing to carry out manual evaluation, scores the semantic relevance and expression consistency of the generated dialogue, and the score range [0,1,2], wherein 0 indicates that the effect cannot be achieved, 1 indicates that the effect is general, and 2 indicates that the effect is very good. The results of the manual evaluation are shown in Table 2, and the model of the invention is superior to the reference model in terms of dialogue semantic relevance and consistency.
Table 2 manual evaluation results table
Figure SMS_70
Table 3 is a partial sample of the generation, and it can be found that a data-driven based encoder-decoder algorithm can generate fluent and reasonable conversations, but the replies are devoid of substantial information, even though some conversations appear to have corresponding knowledge information, but these information are randomly generated by the model, are inaccurate, and tend to mislead the user. The lack of information or information errors in the dialogue reply may degrade the user experience. The dialogue generating model extracts reasonable knowledge triples from the knowledge graph and generates corresponding knowledge dialogue replies, so that the generated dialogue is smooth and coherent, and the communicated semantic information accords with life common sense and dialogue context logic.
Table 3 dialogue sample table generated by different models
Figure SMS_71
In summary, the cascade coupled knowledge enhancement dialogue generation method provided by the invention combines the intention of the dialogue, selects the corresponding knowledge triples as background knowledge, and generates the knowledge-driven dialogue, so that the generated dialogue has related semantics, rich content and coherent expression.
The method can be widely applied to man-machine interaction dialogue scenes, such as chat robots, knowledge questions and answers, intelligent customer service platforms and the like.
In the figure, current query represents the current dialog text, attr_names represents attribute names, attr_values represents attribute values, knowledges represents knowledge triples, encodings represents encoders, and decoders represents decoders.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a single embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to specific embodiments, and that the embodiments may be combined appropriately to form other embodiments that will be understood by those skilled in the art.

Claims (6)

1. A cascade-coupled knowledge-enhanced dialog generation method, characterized by a dialog context
Figure QLYQS_2
,/>
Figure QLYQS_4
Representing dialog context +.>
Figure QLYQS_7
N is the dialogue context +.>
Figure QLYQS_3
The total number of sentences in (a); knowledge pattern->
Figure QLYQS_5
From multiple knowledge triples->
Figure QLYQS_9
A constitution in which
Figure QLYQS_10
Representing entity name->
Figure QLYQS_1
Representing attribute name->
Figure QLYQS_6
Representing attribute values; searching for a knowledge triplet related to the current dialog intention from the knowledge graph by means of a dialog generation model and in dependence on the dialog context of the current dialog>
Figure QLYQS_8
As background information for dialog generation, generating knowledge-driven dialog replies;
the dialog generation model includes: the system comprises an entity identification module, an entity selection module, an entity attribute selection module and a knowledge dialogue generation module;
the entity recognition module is used for recognizing all possible entities in sentences in the dialogue as candidate entities; the entity identification module comprises a knowledge entity identification NER main model, and the knowledge entity identification NER main model identifies the entity in the dialogue through the BERT pre-training model and the conditional random field as a candidate entity;
the entity selection module selects an optimal entity capable of reflecting topic intention of the current dialogue from candidate entities, and specifically comprises the following steps: the method comprises the steps of splicing a current dialogue text, candidate entities and all attribute names of the candidate entities, then transmitting the spliced current dialogue text, the candidate entities and all attribute names of the candidate entities into a BERT pre-training model, outputting matching scores of the current dialogue text and each candidate entity through a CLS vector and a layer of full-connection network, and taking the candidate entity corresponding to the highest matching score as the best entity capable of reflecting topic intention of the current dialogue;
the entity attribute selection module is used for splicing the current dialogue text with the entity name of the optimal entity removed and the entity attributes of the optimal entity, then transmitting the spliced current dialogue text and the entity attributes of the optimal entity into the BERT pre-training model, outputting the matching scores of the current dialogue text and the entity attributes through a CLS vector and a layer of full-connection network, and taking the entity attribute corresponding to the highest matching score as the optimal entity attribute; the entity attribute comprises an attribute name and an attribute value of the entity;
knowledge dialogue generation module for forming knowledge triples of dialogue context, best entity and best entity attribute
Figure QLYQS_11
And inputting a BART pre-training model after splicing to generate dialogue replies.
2. The cascade-coupled knowledge-enhanced dialog generation method of claim 1, wherein: the training process for knowledge entity recognition NER main model comprises the following steps:
step one, training data input knowledge entity identifies NER main model, training data includes text T and correct entity e;
step two, BIOS coding is carried out on the text according to the position of the correct entity e in the text T, and the coded output is used as a training labeling label
Figure QLYQS_12
Thirdly, converting the sequence after text T coding through a BERT pre-training model to obtain a corresponding hidden vector v:
Figure QLYQS_13
step four, passing through a conditional random field
Figure QLYQS_14
Outputting the final distribution probability
Figure QLYQS_15
Step five, calculating a loss function
Figure QLYQS_16
Iteratively optimizing weights of the BERT pre-training model and the conditional random field by a gradient descent method; />
Figure QLYQS_17
Is a cross entropy loss function.
3. The cascade-coupled knowledge-enhanced dialog generation method of claim 1, wherein the entity recognition model further comprises a jieba segmentation screening recognition model and a rule screening recognition model; the jieba word segmentation screening and identifying model sets the part of speech of all entity nouns of the knowledge graph as 'entity' in a jieba word list library in advance, and words with the part of speech of 'entity' in the conversation are screened out through the jieba word segmentation and serve as candidate entities; the rule screening recognition model uses nouns within the title and quotation marks in the dialog context as candidate entities.
4. The method for generating a cascade-coupled knowledge-enhanced dialog according to claim 1, wherein determining the required negative samples by a hybrid negative sampling method before training the BERT pre-training model of the entity selection module, specifically comprises:
(1) Adding the obtained candidate entities to a negative sample set after removing the correct entities from all the candidate entities in the dialogue output by the entity recognition module;
(2) L entities are randomly sampled in the entity set in the field of the positive sample and added to the negative sample set.
5. The method of claim 1, wherein the BERT pre-training model in the entity selection module and the entity attribute selection module uses an ERNIE pre-training model, a loss function
Figure QLYQS_18
Wherein the method comprises the steps of
Figure QLYQS_19
Representing positive sample fraction, ++>
Figure QLYQS_20
Representing negative sample score, ++>
Figure QLYQS_21
The spacing of positive and negative samples is preset for the model.
6. The cascade-coupled knowledge-enhanced conversation generation method of claim 1 wherein the knowledge conversation generation module, prior to generating the conversation reply, only when it is determined that the topic intent of the conversation requires knowledge to be added, combines the conversation context with the knowledge triples
Figure QLYQS_22
And inputting a BART pre-training model after splicing, and generating dialogue replies. />
CN202310260375.0A 2023-03-17 2023-03-17 Cascade coupling knowledge enhancement dialogue generation method Active CN116010583B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310260375.0A CN116010583B (en) 2023-03-17 2023-03-17 Cascade coupling knowledge enhancement dialogue generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310260375.0A CN116010583B (en) 2023-03-17 2023-03-17 Cascade coupling knowledge enhancement dialogue generation method

Publications (2)

Publication Number Publication Date
CN116010583A true CN116010583A (en) 2023-04-25
CN116010583B CN116010583B (en) 2023-07-18

Family

ID=86021288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310260375.0A Active CN116010583B (en) 2023-03-17 2023-03-17 Cascade coupling knowledge enhancement dialogue generation method

Country Status (1)

Country Link
CN (1) CN116010583B (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083831A (en) * 2019-04-16 2019-08-02 武汉大学 A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF
CN111950285A (en) * 2020-07-31 2020-11-17 合肥工业大学 Intelligent automatic construction system and method of medical knowledge map based on multi-modal data fusion
CN112100351A (en) * 2020-09-11 2020-12-18 陕西师范大学 Method and equipment for constructing intelligent question-answering system through question generation data set
CN112131404A (en) * 2020-09-19 2020-12-25 哈尔滨工程大学 Entity alignment method in four-risk one-gold domain knowledge graph
CN112149421A (en) * 2020-09-23 2020-12-29 云南师范大学 Software programming field entity identification method based on BERT embedding
CN112487202A (en) * 2020-11-27 2021-03-12 厦门理工学院 Chinese medical named entity recognition method and device fusing knowledge map and BERT
CN113377961A (en) * 2020-12-07 2021-09-10 北京理工大学 Intention-semantic slot joint recognition model based on knowledge graph and user theme
CN113420113A (en) * 2021-06-21 2021-09-21 平安科技(深圳)有限公司 Semantic recall model training and recall question and answer method, device, equipment and medium
WO2021196520A1 (en) * 2020-03-30 2021-10-07 西安交通大学 Tax field-oriented knowledge map construction method and system
CN113590965A (en) * 2021-08-05 2021-11-02 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Video recommendation method integrating knowledge graph and emotion analysis
CN113947085A (en) * 2021-10-22 2022-01-18 南京邮电大学 Named entity identification method for intelligent question-answering system
CN113961687A (en) * 2021-10-25 2022-01-21 山东新一代信息产业技术研究院有限公司 Multi-turn question-answering system intention classification and named entity identification research method
CN114077673A (en) * 2021-06-21 2022-02-22 南京邮电大学 Knowledge graph construction method based on BTBC model
CN114943230A (en) * 2022-04-17 2022-08-26 西北工业大学 Chinese specific field entity linking method fusing common knowledge
CN114969275A (en) * 2021-02-19 2022-08-30 深圳市奥拓电子股份有限公司 Conversation method and system based on bank knowledge graph
CN115186102A (en) * 2022-07-08 2022-10-14 大连民族大学 Dynamic knowledge graph complementing method based on double-flow embedding and deep neural network
CN115309879A (en) * 2022-08-05 2022-11-08 中国石油大学(华东) Multi-task semantic parsing model based on BART
US20230061906A1 (en) * 2021-08-09 2023-03-02 Samsung Electronics Co., Ltd. Dynamic question generation for information-gathering

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083831A (en) * 2019-04-16 2019-08-02 武汉大学 A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF
WO2021196520A1 (en) * 2020-03-30 2021-10-07 西安交通大学 Tax field-oriented knowledge map construction method and system
CN111950285A (en) * 2020-07-31 2020-11-17 合肥工业大学 Intelligent automatic construction system and method of medical knowledge map based on multi-modal data fusion
CN112100351A (en) * 2020-09-11 2020-12-18 陕西师范大学 Method and equipment for constructing intelligent question-answering system through question generation data set
CN112131404A (en) * 2020-09-19 2020-12-25 哈尔滨工程大学 Entity alignment method in four-risk one-gold domain knowledge graph
CN112149421A (en) * 2020-09-23 2020-12-29 云南师范大学 Software programming field entity identification method based on BERT embedding
CN112487202A (en) * 2020-11-27 2021-03-12 厦门理工学院 Chinese medical named entity recognition method and device fusing knowledge map and BERT
CN113377961A (en) * 2020-12-07 2021-09-10 北京理工大学 Intention-semantic slot joint recognition model based on knowledge graph and user theme
CN114969275A (en) * 2021-02-19 2022-08-30 深圳市奥拓电子股份有限公司 Conversation method and system based on bank knowledge graph
CN113420113A (en) * 2021-06-21 2021-09-21 平安科技(深圳)有限公司 Semantic recall model training and recall question and answer method, device, equipment and medium
CN114077673A (en) * 2021-06-21 2022-02-22 南京邮电大学 Knowledge graph construction method based on BTBC model
CN113590965A (en) * 2021-08-05 2021-11-02 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Video recommendation method integrating knowledge graph and emotion analysis
US20230061906A1 (en) * 2021-08-09 2023-03-02 Samsung Electronics Co., Ltd. Dynamic question generation for information-gathering
CN113947085A (en) * 2021-10-22 2022-01-18 南京邮电大学 Named entity identification method for intelligent question-answering system
CN113961687A (en) * 2021-10-25 2022-01-21 山东新一代信息产业技术研究院有限公司 Multi-turn question-answering system intention classification and named entity identification research method
CN114943230A (en) * 2022-04-17 2022-08-26 西北工业大学 Chinese specific field entity linking method fusing common knowledge
CN115186102A (en) * 2022-07-08 2022-10-14 大连民族大学 Dynamic knowledge graph complementing method based on double-flow embedding and deep neural network
CN115309879A (en) * 2022-08-05 2022-11-08 中国石油大学(华东) Multi-task semantic parsing model based on BART

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
丁锋,孙晓: ""基于注意力机制和BiLSTM-CRF的消极情绪意见目标抽取"", 《计算机科学》, no. 2022, pages 223 - 230 *
吴俊;程垚;郝瀚;艾力亚尔·艾则孜;刘菲雪;苏亦坡;: ""基于BERT嵌入BiLSTM-CRF模型的中文专业术语抽取研究"", 《情报学报》, no. 2020, pages 409 - 418 *
时雨涛,孙晓: ""一种会话理解模型的问题生成方法"", 《计算机科学》, no. 2022, pages 232 - 238 *
王堃,林民;李艳玲;: "端到端对话系统意图语义槽联合识别研究综述", 《计算机工程与应用》, no. 14 *

Also Published As

Publication number Publication date
CN116010583B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
Manjavacas et al. Adapting vs. pre-training language models for historical languages
CN110442880B (en) Translation method, device and storage medium for machine translation
US20230394247A1 (en) Human-machine collaborative conversation interaction system and method
CN114676255A (en) Text processing method, device, equipment, storage medium and computer program product
CN115392259B (en) Microblog text sentiment analysis method and system based on confrontation training fusion BERT
CN112328800A (en) System and method for automatically generating programming specification question answers
CN112115242A (en) Intelligent customer service question-answering system based on naive Bayes classification algorithm
CN117390409A (en) Method for detecting reliability of answer generated by large-scale language model
CN113971394A (en) Text repeat rewriting system
CN111666374A (en) Method for integrating additional knowledge information into deep language model
CN114492460A (en) Event causal relationship extraction method based on derivative prompt learning
CN117332789A (en) Semantic analysis method and system for dialogue scene
Yu et al. Assessing the potential of AI-assisted pragmatic annotation: The case of apologies
Zdebskyi et al. Investigation of Transitivity Relation in Natural Language Inference.
CN114970563B (en) Chinese question generation method and system fusing content and form diversity
CN116010583B (en) Cascade coupling knowledge enhancement dialogue generation method
Lee Natural Language Processing: A Textbook with Python Implementation
CN114443818A (en) Dialogue type knowledge base question-answer implementation method
CN114238595A (en) Metallurgical knowledge question-answering method and system based on knowledge graph
Akhter et al. A Study of Implementation of Deep Learning Techniques for Text Summarization
CN117909494B (en) Abstract consistency assessment model training method and device
CN113743126B (en) Intelligent interaction method and device based on user emotion
CN117453895B (en) Intelligent customer service response method, device, equipment and readable storage medium
CN111368526B (en) Sequence labeling method and system
CN113343668B (en) Method and device for solving selected questions, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant