CN116010583A - Cascade coupling knowledge enhancement dialogue generation method - Google Patents
Cascade coupling knowledge enhancement dialogue generation method Download PDFInfo
- Publication number
- CN116010583A CN116010583A CN202310260375.0A CN202310260375A CN116010583A CN 116010583 A CN116010583 A CN 116010583A CN 202310260375 A CN202310260375 A CN 202310260375A CN 116010583 A CN116010583 A CN 116010583A
- Authority
- CN
- China
- Prior art keywords
- entity
- knowledge
- dialogue
- model
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000008878 coupling Effects 0.000 title abstract description 5
- 238000010168 coupling process Methods 0.000 title abstract description 5
- 238000005859 coupling reaction Methods 0.000 title abstract description 5
- 238000012549 training Methods 0.000 claims description 57
- 239000013598 vector Substances 0.000 claims description 21
- 238000012216 screening Methods 0.000 claims description 13
- 230000011218 segmentation Effects 0.000 claims description 11
- 238000005070 sampling Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 5
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 2
- 238000003058 natural language processing Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 241000282414 Homo sapiens Species 0.000 description 2
- 230000003930 cognitive ability Effects 0.000 description 2
- 230000001427 coherent effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 241000989913 Gunnera petaloidea Species 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000011511 automated evaluation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000010534 mechanism of action Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Machine Translation (AREA)
Abstract
The invention relates to the field of natural language processing and discloses a cascade coupling knowledge enhancement dialogue generation method. The entity identification module, the entity selection module and the entity attribute selection module designed by the invention are independent in form and have no coincident relation dependence, can independently provide services for tasks such as various entity identifications, entity selections and the like, and the three modules work together to generate background real knowledge for dialogue tasks, so that the information quantity of the generated dialogue is improved.
Description
Technical Field
The invention relates to the field of natural language processing, in particular to a cascade coupling knowledge enhancement dialogue generation method.
Background
The basic task of the dialog system is to simulate the daily communication of humans, replying to the user with appropriate sentences. Because of the characteristic of machine automation, the system can replace expensive manual service, liberate labor force and save cost for social operation. Early dialog systems generated dialog replies according to specific grammar rules designed by experts, which lacked flexibility and failed the Turing test. With the rapid development of the internet industry, platforms such as microblog and WeChat provide virtualized communication platforms for people, and people step forward a large proportion of conversations to a PC end and a mobile end, so that massive conversational data can be obtained. The deep learning technology in 2015 starts to enlarge the wonderful colors in the fields of computer vision, natural language processing and recommended search, and can obviously surpass the traditional machine learning method in many tasks. The combination of massive dialogue data and rapid deep learning technology breeds a seq2seq (Sequence to Sequence) model based on end to end, and the model has the advantages of being completely based on data driving and needing no artificial design rule characteristics, and becomes a mainstream technology for automatic dialogue generation.
The neural network dialogue generating system is completely based on data driving without the grammar rule of a designated language, and optimizes the maximum likelihood estimation probability based on a large-scale text corpus training model. Research experiments find that the occurrence frequency of some general replies is very high in the generation result of the method, such as general reply texts like "haha", "i am also" and the like. These general replies, while considered an acceptable result when evaluated experimentally, in a practical scenario tend to lack semantic relevance and co-situational expressive capabilities of the conversations above, severely affecting the user experience and participation, resulting in eventual loss of patience and interest for the user. While sequence-to-sequence based dialog generation systems may generate grammatically correct answers, these answers tend to be generic, lacking substantial information, rather than specific, co-morbid. How to enhance the information amount of dialogue reply and how to maintain the correlation of dialogue generation is one difficulty of current natural language generation research.
The current dialog model based on seq2seq, the training process of the model is actually to learn the distribution probability of words under specific conditions, but has no autonomous cognitive ability; the model is more dependent on co-occurrence relationships between the terms of the training corpus to learn how to generate the distribution probabilities of the terms. The lack of comments on the background information and related knowledge can cause serious degradation of the quality of the conversation generated by the model, so that the conversation generated by the model is greatly different from the conversation generated by human beings, and the model often generates a reply with low information content and repeated and tedious property. The knowledge graph has rich information and clear structure, contains various entities and various attribute information of the entities in real life, provides knowledge data support for the fields of recommendation systems, natural language understanding, computer vision and the like, and can improve the performance and the interpretability of the model in the corresponding knowledge field, so that the knowledge graph can provide powerful data support for the dialogue model, and further improve the cognitive ability of the model.
Given the above information and the vast knowledge graph of a conversation, finding knowledge triples that fit the content of the conversation and generating corresponding conversation replies on the basis of these are a complex and difficult task.
Disclosure of Invention
In order to solve the technical problems, the invention provides a cascade-coupled knowledge enhancement dialogue generation method, which comprises an entity identification module, an entity selection module, an entity attribute selection module and a knowledge dialogue generation module. The four modules are mutually independent and operate together to form a knowledge-driven dialogue pipeline, and finally knowledge-driven dialogue reply taking the knowledge graph as background information is output.
In order to solve the technical problems, the invention adopts the following technical scheme:
cascade coupling knowledge enhancement dialogue generation method, and dialogue up and downText (A),Representing dialog context +.>N is the dialogue context +.>The total number of sentences in (a); knowledge pattern->From multiple knowledge triples->Constitution of->Representing entity name->Representing attribute name->Representing attribute values; searching knowledge triples related to current dialogue intention from knowledge graph through dialogue generating model and according to dialogue context of current dialogueAs background information for dialog generation, generating knowledge-driven dialog replies;
the dialog generation model includes: the system comprises an entity identification module, an entity selection module, an entity attribute selection module and a knowledge dialogue generation module;
the entity recognition module is used for recognizing all possible entities in sentences in the dialogue as candidate entities; the entity identification module comprises a knowledge entity identification NER main model, and the knowledge entity identification NER main model identifies the entity in the dialogue through the BERT pre-training model and the conditional random field as a candidate entity;
the entity selection module selects an optimal entity capable of reflecting topic intention of the current dialogue from candidate entities, and specifically comprises the following steps: the method comprises the steps of splicing a current dialogue text, candidate entities and all attribute names of the candidate entities, then transmitting the spliced current dialogue text, the candidate entities and all attribute names of the candidate entities into a BERT pre-training model, outputting matching scores of the current dialogue text and each candidate entity through a CLS vector and a layer of full-connection network, and taking the candidate entity corresponding to the highest matching score as the best entity capable of reflecting topic intention of the current dialogue;
the entity attribute selection module is used for splicing the current dialogue text with the entity name of the optimal entity removed and the entity attributes of the optimal entity, then transmitting the spliced current dialogue text and the entity attributes of the optimal entity into the BERT pre-training model, outputting the matching scores of the current dialogue text and the entity attributes through a CLS vector and a layer of full-connection network, and taking the entity attribute corresponding to the highest matching score as the optimal entity attribute; the entity attribute comprises an attribute name and an attribute value of the entity;
knowledge dialogue generation module for forming knowledge triples of dialogue context, best entity and best entity attributeAnd inputting a BART pre-training model after splicing to generate dialogue replies.
Specifically, the training process for knowledge entity recognition NER main model includes the following steps:
step one, training data input knowledge entity identifies NER main model, training data includes text T and correct entity e;
step two, BIOS coding is carried out on the text according to the position of the correct entity e in the text T, and the coded output is used as a training labeling label;
Thirdly, converting the sequence after text T coding through a BERT pre-training model to obtain a corresponding hidden vector v:
Step five, calculating a loss functionIteratively optimizing weights of the BERT pre-training model and the conditional random field by a gradient descent method;Is a cross entropy loss function.
Specifically, the entity recognition model further comprises a jieba word segmentation screening recognition model and a rule screening recognition model; the jieba word segmentation screening and identifying model sets the part of speech of all entity nouns of the knowledge graph as 'entity' in a jieba word list library in advance, and words with the part of speech of 'entity' in the conversation are screened out through the jieba word segmentation and serve as candidate entities; the rule screening recognition model uses nouns within the title and quotation marks in the dialog context as candidate entities.
Specifically, before training the BERT pre-training model of the entity selection module, determining a required negative sample by a hybrid negative sampling method specifically includes:
(1) Adding the obtained candidate entities to a negative sample set after removing the correct entities from all the candidate entities in the dialogue output by the entity recognition module;
(2) L entities are randomly sampled in the entity set in the field of the positive sample and added to the negative sample set.
Specifically, the entity selection module and the ERNIE pre-training model adopted by the BERT pre-training model in the entity attribute selection module lose the function
Wherein the method comprises the steps ofRepresenting positive sample fraction, ++>Representing negative sample score, ++>The spacing of positive and negative samples is preset for the model.
Specifically, before the knowledge dialogue generation module generates dialogue reply, the dialogue context and the knowledge triples are only determined when the topic intention of the dialogue is judged to need to be added with knowledgeAnd inputting a BART pre-training model after splicing, and generating dialogue replies.
Compared with the prior art, the invention has the beneficial technical effects that:
according to the invention, through the multipath entity identification module, the mixed negative sampling optimized entity selection module and the entity attribute selection module of the dialogue generation model, the intention entity and the specific entity attribute of the dialogue can be output, and background knowledge data support can be provided for the subsequent knowledge dialogue generation module. The entity identification module, the entity selection module and the entity attribute selection module designed by the invention are independent in form and have no coincident relation dependence, can independently provide services for tasks such as various entity identifications, entity selections and the like, and the three modules work together to generate background real knowledge for dialogue tasks, so that the information quantity of the generated dialogue is improved. Meanwhile, the assembly line knowledge extraction framework formed by the three modules can be used as a plug-and-play plug-in framework, and only a knowledge graph data set in the related field is needed to be provided, so that reference external knowledge can be provided for any dialogue text, and the auxiliary model can generate a dialogue with more abundant information.
Drawings
FIG. 1 is a flow chart of a dialog generation model of the present invention;
FIG. 2 is a schematic diagram of an entity selection module according to the present invention;
FIG. 3 is a schematic diagram of an entity attribute selection module according to the present invention;
FIG. 4 is a schematic diagram of an intent discrimination model of the present invention;
fig. 5 is a schematic diagram of the knowledge session generation module of the present invention.
Detailed Description
A preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.
Problem definition: given dialog context information for a current user dialog,Representing dialog context +.>N is the dialogue context +.>The total number of sentences in (a); knowledge pattern->,For knowledge graph->The total amount of the middle knowledge triples, wherein the knowledge graph is +.>From multiple knowledge triples->Composition, triplet->The name of the entity is indicated and,representing attribute names,Representing attribute values; it is necessary to find knowledge triples about the user's intention from a huge knowledge graph according to the dialog context of the current user dialog>Will->As background information for dialog generation, a knowledge-driven dialog reply Y is generated.
In order to enhance the information quantity and semantic relativity in a dialogue system, the invention designs a dialogue generation model, and a flow chart of the dialogue generation model is shown in figure 1. The dialog generation model consists of the following four parts: (1) an entity identification module; (2) an entity selection module; (3) an entity attribute selection module; (4) a knowledge dialogue generation module.
The entity identification module is used for identifying all possible topic entities in the conversation as candidate entities;
the entity selection module is used for selecting the most suitable entity (the best entity) from candidate entities;
the entity attribute selection module is used for selecting the most suitable entity attribute (the best entity attribute) from the attribute relation set of the best entity;
the knowledge dialogue generating module firstly judges whether the current dialogue needs knowledge or not, if not, the knowledge dialogue generating module generates a boring dialogue reply, and if so, the knowledge-driven dialogue reply is generated.
The specific implementation and mechanism of action of each module will be described in detail in turn.
1. Entity recognition module
The purpose of entity identification is to find all possible candidate entities from all sentences in the conversation above. The invention provides a multipath entity identification module which can rapidly screen entities and ensure the recall rate of correct entities.
The multiple entity recognition module includes a knowledge entity recognition NER master model and two auxiliary rule recognition models that collectively and comprehensively output all possible candidate entities in the dialog context. Knowledge entity identification NER master model identifies the knowledge entity in the output dialogue through the BERT pre-training model and the conditional random field.
The main training process for knowledge entity identification NER main model comprises the following steps:
The sequence of Step 3 and text T is converted into corresponding hidden vectors v through a BERT pre-training model:
step 4, outputting final distribution probability through a conditional random field
Step 5, calculating the loss functionIteratively optimizing weights of the BERT pre-training model and the conditional random field by a gradient descent method;Is a cross entropy loss function.
The BIOS coding is a standard coding method for entity identification, and its rule is to assign a corresponding coding symbol to each character in a text, label the initial character of an entity fragment in the text as B (Begin), label other positions of the entity as I (intermediate), label a single-character entity as S (single), and label other fragments except the entity as O (other). For example, < text: is you aware of the composer's Zhang Sando? < entity: zhang Sanj can be encoded as "OOOOOBIOO".
Meanwhile, the invention keeps in mind that the character length of most entities in the knowledge graph is greater than 2, and the knowledge entities with single characters exist in the actual dialogue, such as 'do you recite the first poem of' first? "the main entity" nail "in" is provided. Because most of the entities of the training set are entities with the length being more than 2, the trained model is insensitive to single character entities, and the recognition effect is not satisfactory. At the same time, for some entities with nested relationships, the model may be partially careless, such as for "do you read the book of the three-text set? The model outputs the wrong entity Zhang Sanning, ignoring the correct entity Zhang Sanwenji. The invention hopes that the model can accurately screen out knowledge entities with various lengths to ensure the recall rate of correct entities, so that two auxiliary rule recognition models are additionally designed to recognize NER main model spam on the knowledge entities, and the main model is prevented from neglecting correct entities.
The auxiliary rule recognition model comprises a jieba word segmentation screening recognition model and a rule screening recognition model:
(1) The jieba word segmentation screening recognition model screens entities through jieba word segmentation, a jieba word list library is a current most popular Chinese word segmentation tool, the part of speech of all entity nouns of a knowledge graph is set as 'entity' in the jieba word list library in advance, and words with the part of speech of 'entity' are screened out through the jieba word segmentation to serve as candidate entities.
(2) The rule screening recognition model screens out nouns in the title number, quotation marks as candidate entities because the entities such as films, novels, music and the like are generally accompanied by special symbols of the title number, quotation marks and the like in Chinese dialogues through the rule screening entities.
2. Entity selection module for hybrid negative sampling optimization
The entity identification module screens out all possible candidate entities, and the entity selection module selects an entity capable of determining topic intention of the current dialogue from the candidate entities. The difficulty faced by the entity selection module is that the human beings are free and unconstrained when in dialogue exchange, a plurality of entities are often involved, and at the same time, the phenomena of thinking jump, topic switching of dialogue, backtracking of the topics and the like, which are intended to be converted, often occur. Whereas the dataset has only dialog text and its corresponding correct knowledge triples, which means that the dataset of the task has only positive samples and no negative samples. Usually, training for deep learning requires a large number of positive and negative samples to perform training optimization iteration to give the model the ability to distinguish between positive and negative samples, so that the invention requires sampling a certain number of negative samples to perform training. Since the quality of the negative sample is an important factor in determining the upper limit of the current task model, how to design a negative sampling strategy for an entity, and how to scale the samples is of great concern to the present invention.
The invention designs a mixed negative sampling method aiming at an entity selection module. The negative sample consists of two parts, wherein the first part is dialog data of the training set, all candidate entities of the dialog are comprehensively output through the multipath entity recognition module mentioned in the previous section, the candidate entities outside the positive body are removed as the negative sample, the method simulates a negative sample set which needs to be distinguished when the model is inferred, and to a certain extent, the negative sample of the training set and an interference sample encountered when the model is inferred are generated by the same entity recognition model and belong to the same distribution sample. The negative samples distributed simultaneously with the model reasoning can maximize the accuracy of entity selection during the model reasoning. The second part is that L entities are randomly sampled in an entity set in the field of positive samples to serve as negative samples, the negative samples in the same field are generally samples which are easy to confuse during reasoning of the model, and the stability and generalization of the model can be improved by adding the difficult sample learning. It should be noted that the domain where each entity is located cannot be distinguished in the knowledge graph, and the entity with the same attribute as the correct entity is reversely found out from the correct attribute to be used as the domain entity set. By generating negative samples together by the two negative sampling methods, training of the model can be facilitated.
The entity selection module is shown in fig. 2, after positive and negative samples are determined, the [ < CLS > current dialogue text < SEP > entity and all attribute names of the entity ] are transmitted into the BERT pre-training model, the matching score of the current dialogue text and the entity is output through a layer of fully connected network by the CLS vector, the entity with the highest matching score is used as the best entity, wherein < CLS > is a classification symbol of the BERT pre-training model, and < SEP > is an interval identifier of the two texts. The BERT pre-training model maps the input text sequence into a vector E, and the final representation vector T is obtained through multi-layer coding. After model training, matching scores (Score) of the text can be deduced.
In the view of figure 2,an initial vector representing dialog text, +.>An initial vector representing the name of an entity,An initial vector representing attribute names, ++>Final representation vector representing dialog text, +.>A final representation vector representing the entity name, +.>Final representation vector representing attribute names, +.>Representing class symbols<CLS>Is>Representing class symbols<SEP>Is>Representing the final classification vector, +.>Representing symbols<SEP>Is used to represent the vector.
The BERT model adopted in the task is an ERNIE model published by hundred-degree open sources, a Knowledge Masking pre-training mode is additionally added to a BERT pre-training mechanism, the BERT model is sensitive to a knowledge entity, and the BERT model is matched with the current entity selection task. The loss function uses MarginLoss loss in a pair-wise mode, compared with the loss function in a single-point mode, the MarginLoss loss focuses on score fitting of samples, the pair-wise mode focuses on size sorting relation between positive and negative samples, and sorting tasks on entity scores in model reasoning are more relevant and suitable for current tasks. Loss function
Wherein the method comprises the steps ofRepresenting positive sample fraction, ++>Representing negative sample score, ++>The larger the margin value is, the larger the distinguishing force of the model on the positive sample and the negative sample is.
3. Entity attribute selection module
The core idea of the entity attribute selection module is to match the similarity of the dialogue and the knowledge triples, and obtain the optimal entity attribute according to the similarity score ranking. Meanwhile, the main task of the module is to match the current dialogue text with the entity attribute, and the module should focus on matching the attribute, but the invention discovers that the entity names in a certain proportion of dialogue samples appear in the current dialogue text, thereby interfering the performance of the entity attribute selection module. Therefore, the invention adopts the operation of removing the entity name of the knowledge triplet and covering the entity name in the current dialogue text, so that the entity attribute selection module focuses on the matching of the dialogue and the entity attribute. The entity attribute selection module is shown in fig. 3, and the module flow is to transfer the current dialogue text < SEP > entity attribute name and attribute value after removing the entity name into the BERT pre-training model, output the matching Score (Score) of the current dialogue text and the entity attribute through a layer of full-connection network by CLS vector, and select the entity attribute with the highest matching Score as the best entity attribute. The loss function of the entity attribute selection module uses the same MarginLoss loss as the entity selection module.
In the view of figure 3 of the drawings,an initial vector representing attribute values, +.>Representing the final representation vector of attribute values.
4. Knowledge dialogue generation module
The knowledge dialogue generation module is used for obtaining the knowledge triplesAs background information for the dialog system, knowledge-driven dialog replies are generated.
The knowledge dialog generation module cannot blindly incorporate background knowledge because dialogs such as "thank you", "bye you" do not contain substantial intent information themselves, and do not require any background knowledge. Therefore, the invention designs an intention judging model firstly, and as shown in fig. 4, whether knowledge is added or not is selected by training a classification model capable of judging the intention of a user. The intention judging model outputs probability P, when P is smaller than a set threshold value, knowledge is selected to be added in the knowledge dialogue generating module, and otherwise, knowledge is not required to be added.
Knowledge dialogue generation moduleThe block is shown in fig. 5, the invention adopts a BART pre-training model of open source of complex denier university, and adopts a pre-training mode of a full-transform network, which is particularly suitable for dialogue generation tasks. The invention relates to dialogue contextAnd knowledge triplet->After splicing, get->WhereinRepresenting the sentence head symbol, < >>Symbols are separated for text. After coding by the coder, the generated words are decoded by the decoder in turn, and the decoding process encounters the termination symbol +.>After that, finally obtaining the dialogue reply. In order to enhance the diversity of the generated results, the decoding strategy adopts top-k and top-p decoding strategies.Representing the i-th word in the dialog reply, < +.>Is the text length of the dialog reply.
The performance of the invention was verified on the kdconv dialogue dataset, and as can be seen from table 1, the invention is higher than other algorithms on the BLEU, generation _f1 automatic index. The BLEU-1, BLEU-2, BLEU-4 and generation_F1 are common automatic indexes for measuring the coincidence degree of the dialogue reply generated by the model and the reference dialogue reply, and the higher the BLEU, generation _F1 value is, the better the dialogue effect generated by the model is.
TABLE 1 automated evaluation index comparison Table for the inventive and better performing algorithms
Daily conversations are various, flexible and changeable in form, and have no unified standard. The automation index based on overlap does not necessarily enable accurate assessment of dialog generation model performance. Therefore, the invention also carries out a manual evaluation and verification link, invites a plurality of personnel with a research background in the field of natural language processing to carry out manual evaluation, scores the semantic relevance and expression consistency of the generated dialogue, and the score range [0,1,2], wherein 0 indicates that the effect cannot be achieved, 1 indicates that the effect is general, and 2 indicates that the effect is very good. The results of the manual evaluation are shown in Table 2, and the model of the invention is superior to the reference model in terms of dialogue semantic relevance and consistency.
Table 2 manual evaluation results table
Table 3 is a partial sample of the generation, and it can be found that a data-driven based encoder-decoder algorithm can generate fluent and reasonable conversations, but the replies are devoid of substantial information, even though some conversations appear to have corresponding knowledge information, but these information are randomly generated by the model, are inaccurate, and tend to mislead the user. The lack of information or information errors in the dialogue reply may degrade the user experience. The dialogue generating model extracts reasonable knowledge triples from the knowledge graph and generates corresponding knowledge dialogue replies, so that the generated dialogue is smooth and coherent, and the communicated semantic information accords with life common sense and dialogue context logic.
Table 3 dialogue sample table generated by different models
In summary, the cascade coupled knowledge enhancement dialogue generation method provided by the invention combines the intention of the dialogue, selects the corresponding knowledge triples as background knowledge, and generates the knowledge-driven dialogue, so that the generated dialogue has related semantics, rich content and coherent expression.
The method can be widely applied to man-machine interaction dialogue scenes, such as chat robots, knowledge questions and answers, intelligent customer service platforms and the like.
In the figure, current query represents the current dialog text, attr_names represents attribute names, attr_values represents attribute values, knowledges represents knowledge triples, encodings represents encoders, and decoders represents decoders.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a single embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to specific embodiments, and that the embodiments may be combined appropriately to form other embodiments that will be understood by those skilled in the art.
Claims (6)
1. A cascade-coupled knowledge-enhanced dialog generation method, characterized by a dialog context,Representing dialog context +.>N is the dialogue context +.>The total number of sentences in (a); knowledge pattern->From multiple knowledge triples->A constitution in whichRepresenting entity name->Representing attribute name->Representing attribute values; searching for a knowledge triplet related to the current dialog intention from the knowledge graph by means of a dialog generation model and in dependence on the dialog context of the current dialog>As background information for dialog generation, generating knowledge-driven dialog replies;
the dialog generation model includes: the system comprises an entity identification module, an entity selection module, an entity attribute selection module and a knowledge dialogue generation module;
the entity recognition module is used for recognizing all possible entities in sentences in the dialogue as candidate entities; the entity identification module comprises a knowledge entity identification NER main model, and the knowledge entity identification NER main model identifies the entity in the dialogue through the BERT pre-training model and the conditional random field as a candidate entity;
the entity selection module selects an optimal entity capable of reflecting topic intention of the current dialogue from candidate entities, and specifically comprises the following steps: the method comprises the steps of splicing a current dialogue text, candidate entities and all attribute names of the candidate entities, then transmitting the spliced current dialogue text, the candidate entities and all attribute names of the candidate entities into a BERT pre-training model, outputting matching scores of the current dialogue text and each candidate entity through a CLS vector and a layer of full-connection network, and taking the candidate entity corresponding to the highest matching score as the best entity capable of reflecting topic intention of the current dialogue;
the entity attribute selection module is used for splicing the current dialogue text with the entity name of the optimal entity removed and the entity attributes of the optimal entity, then transmitting the spliced current dialogue text and the entity attributes of the optimal entity into the BERT pre-training model, outputting the matching scores of the current dialogue text and the entity attributes through a CLS vector and a layer of full-connection network, and taking the entity attribute corresponding to the highest matching score as the optimal entity attribute; the entity attribute comprises an attribute name and an attribute value of the entity;
2. The cascade-coupled knowledge-enhanced dialog generation method of claim 1, wherein: the training process for knowledge entity recognition NER main model comprises the following steps:
step one, training data input knowledge entity identifies NER main model, training data includes text T and correct entity e;
step two, BIOS coding is carried out on the text according to the position of the correct entity e in the text T, and the coded output is used as a training labeling label;
Thirdly, converting the sequence after text T coding through a BERT pre-training model to obtain a corresponding hidden vector v:
3. The cascade-coupled knowledge-enhanced dialog generation method of claim 1, wherein the entity recognition model further comprises a jieba segmentation screening recognition model and a rule screening recognition model; the jieba word segmentation screening and identifying model sets the part of speech of all entity nouns of the knowledge graph as 'entity' in a jieba word list library in advance, and words with the part of speech of 'entity' in the conversation are screened out through the jieba word segmentation and serve as candidate entities; the rule screening recognition model uses nouns within the title and quotation marks in the dialog context as candidate entities.
4. The method for generating a cascade-coupled knowledge-enhanced dialog according to claim 1, wherein determining the required negative samples by a hybrid negative sampling method before training the BERT pre-training model of the entity selection module, specifically comprises:
(1) Adding the obtained candidate entities to a negative sample set after removing the correct entities from all the candidate entities in the dialogue output by the entity recognition module;
(2) L entities are randomly sampled in the entity set in the field of the positive sample and added to the negative sample set.
5. The method of claim 1, wherein the BERT pre-training model in the entity selection module and the entity attribute selection module uses an ERNIE pre-training model, a loss function
6. The cascade-coupled knowledge-enhanced conversation generation method of claim 1 wherein the knowledge conversation generation module, prior to generating the conversation reply, only when it is determined that the topic intent of the conversation requires knowledge to be added, combines the conversation context with the knowledge triplesAnd inputting a BART pre-training model after splicing, and generating dialogue replies. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310260375.0A CN116010583B (en) | 2023-03-17 | 2023-03-17 | Cascade coupling knowledge enhancement dialogue generation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310260375.0A CN116010583B (en) | 2023-03-17 | 2023-03-17 | Cascade coupling knowledge enhancement dialogue generation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116010583A true CN116010583A (en) | 2023-04-25 |
CN116010583B CN116010583B (en) | 2023-07-18 |
Family
ID=86021288
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310260375.0A Active CN116010583B (en) | 2023-03-17 | 2023-03-17 | Cascade coupling knowledge enhancement dialogue generation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116010583B (en) |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110083831A (en) * | 2019-04-16 | 2019-08-02 | 武汉大学 | A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF |
CN111950285A (en) * | 2020-07-31 | 2020-11-17 | 合肥工业大学 | Intelligent automatic construction system and method of medical knowledge map based on multi-modal data fusion |
CN112100351A (en) * | 2020-09-11 | 2020-12-18 | 陕西师范大学 | Method and equipment for constructing intelligent question-answering system through question generation data set |
CN112131404A (en) * | 2020-09-19 | 2020-12-25 | 哈尔滨工程大学 | Entity alignment method in four-risk one-gold domain knowledge graph |
CN112149421A (en) * | 2020-09-23 | 2020-12-29 | 云南师范大学 | Software programming field entity identification method based on BERT embedding |
CN112487202A (en) * | 2020-11-27 | 2021-03-12 | 厦门理工学院 | Chinese medical named entity recognition method and device fusing knowledge map and BERT |
CN113377961A (en) * | 2020-12-07 | 2021-09-10 | 北京理工大学 | Intention-semantic slot joint recognition model based on knowledge graph and user theme |
CN113420113A (en) * | 2021-06-21 | 2021-09-21 | 平安科技(深圳)有限公司 | Semantic recall model training and recall question and answer method, device, equipment and medium |
WO2021196520A1 (en) * | 2020-03-30 | 2021-10-07 | 西安交通大学 | Tax field-oriented knowledge map construction method and system |
CN113590965A (en) * | 2021-08-05 | 2021-11-02 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Video recommendation method integrating knowledge graph and emotion analysis |
CN113947085A (en) * | 2021-10-22 | 2022-01-18 | 南京邮电大学 | Named entity identification method for intelligent question-answering system |
CN113961687A (en) * | 2021-10-25 | 2022-01-21 | 山东新一代信息产业技术研究院有限公司 | Multi-turn question-answering system intention classification and named entity identification research method |
CN114077673A (en) * | 2021-06-21 | 2022-02-22 | 南京邮电大学 | Knowledge graph construction method based on BTBC model |
CN114943230A (en) * | 2022-04-17 | 2022-08-26 | 西北工业大学 | Chinese specific field entity linking method fusing common knowledge |
CN114969275A (en) * | 2021-02-19 | 2022-08-30 | 深圳市奥拓电子股份有限公司 | Conversation method and system based on bank knowledge graph |
CN115186102A (en) * | 2022-07-08 | 2022-10-14 | 大连民族大学 | Dynamic knowledge graph complementing method based on double-flow embedding and deep neural network |
CN115309879A (en) * | 2022-08-05 | 2022-11-08 | 中国石油大学(华东) | Multi-task semantic parsing model based on BART |
US20230061906A1 (en) * | 2021-08-09 | 2023-03-02 | Samsung Electronics Co., Ltd. | Dynamic question generation for information-gathering |
-
2023
- 2023-03-17 CN CN202310260375.0A patent/CN116010583B/en active Active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110083831A (en) * | 2019-04-16 | 2019-08-02 | 武汉大学 | A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF |
WO2021196520A1 (en) * | 2020-03-30 | 2021-10-07 | 西安交通大学 | Tax field-oriented knowledge map construction method and system |
CN111950285A (en) * | 2020-07-31 | 2020-11-17 | 合肥工业大学 | Intelligent automatic construction system and method of medical knowledge map based on multi-modal data fusion |
CN112100351A (en) * | 2020-09-11 | 2020-12-18 | 陕西师范大学 | Method and equipment for constructing intelligent question-answering system through question generation data set |
CN112131404A (en) * | 2020-09-19 | 2020-12-25 | 哈尔滨工程大学 | Entity alignment method in four-risk one-gold domain knowledge graph |
CN112149421A (en) * | 2020-09-23 | 2020-12-29 | 云南师范大学 | Software programming field entity identification method based on BERT embedding |
CN112487202A (en) * | 2020-11-27 | 2021-03-12 | 厦门理工学院 | Chinese medical named entity recognition method and device fusing knowledge map and BERT |
CN113377961A (en) * | 2020-12-07 | 2021-09-10 | 北京理工大学 | Intention-semantic slot joint recognition model based on knowledge graph and user theme |
CN114969275A (en) * | 2021-02-19 | 2022-08-30 | 深圳市奥拓电子股份有限公司 | Conversation method and system based on bank knowledge graph |
CN113420113A (en) * | 2021-06-21 | 2021-09-21 | 平安科技(深圳)有限公司 | Semantic recall model training and recall question and answer method, device, equipment and medium |
CN114077673A (en) * | 2021-06-21 | 2022-02-22 | 南京邮电大学 | Knowledge graph construction method based on BTBC model |
CN113590965A (en) * | 2021-08-05 | 2021-11-02 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Video recommendation method integrating knowledge graph and emotion analysis |
US20230061906A1 (en) * | 2021-08-09 | 2023-03-02 | Samsung Electronics Co., Ltd. | Dynamic question generation for information-gathering |
CN113947085A (en) * | 2021-10-22 | 2022-01-18 | 南京邮电大学 | Named entity identification method for intelligent question-answering system |
CN113961687A (en) * | 2021-10-25 | 2022-01-21 | 山东新一代信息产业技术研究院有限公司 | Multi-turn question-answering system intention classification and named entity identification research method |
CN114943230A (en) * | 2022-04-17 | 2022-08-26 | 西北工业大学 | Chinese specific field entity linking method fusing common knowledge |
CN115186102A (en) * | 2022-07-08 | 2022-10-14 | 大连民族大学 | Dynamic knowledge graph complementing method based on double-flow embedding and deep neural network |
CN115309879A (en) * | 2022-08-05 | 2022-11-08 | 中国石油大学(华东) | Multi-task semantic parsing model based on BART |
Non-Patent Citations (4)
Title |
---|
丁锋,孙晓: ""基于注意力机制和BiLSTM-CRF的消极情绪意见目标抽取"", 《计算机科学》, no. 2022, pages 223 - 230 * |
吴俊;程垚;郝瀚;艾力亚尔·艾则孜;刘菲雪;苏亦坡;: ""基于BERT嵌入BiLSTM-CRF模型的中文专业术语抽取研究"", 《情报学报》, no. 2020, pages 409 - 418 * |
时雨涛,孙晓: ""一种会话理解模型的问题生成方法"", 《计算机科学》, no. 2022, pages 232 - 238 * |
王堃,林民;李艳玲;: "端到端对话系统意图语义槽联合识别研究综述", 《计算机工程与应用》, no. 14 * |
Also Published As
Publication number | Publication date |
---|---|
CN116010583B (en) | 2023-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Manjavacas et al. | Adapting vs. pre-training language models for historical languages | |
CN114676255A (en) | Text processing method, device, equipment, storage medium and computer program product | |
CN115392259B (en) | Microblog text sentiment analysis method and system based on confrontation training fusion BERT | |
CN112328800A (en) | System and method for automatically generating programming specification question answers | |
CN112115242A (en) | Intelligent customer service question-answering system based on naive Bayes classification algorithm | |
CN114970563B (en) | Chinese question generation method and system fusing content and form diversity | |
CN117609419A (en) | Domain retrieval method based on meta learning and knowledge enhancement | |
CN117332789A (en) | Semantic analysis method and system for dialogue scene | |
CN117390409A (en) | Method for detecting reliability of answer generated by large-scale language model | |
CN113971394A (en) | Text repeat rewriting system | |
CN111666374A (en) | Method for integrating additional knowledge information into deep language model | |
CN114492460A (en) | Event causal relationship extraction method based on derivative prompt learning | |
Lee | Natural Language Processing: A Textbook with Python Implementation | |
Yu et al. | Assessing the potential of AI-assisted pragmatic annotation: The case of apologies | |
Zdebskyi et al. | Investigation of Transitivity Relation in Natural Language Inference. | |
CN116010583B (en) | Cascade coupling knowledge enhancement dialogue generation method | |
CN115906818A (en) | Grammar knowledge prediction method, grammar knowledge prediction device, electronic equipment and storage medium | |
CN113743126B (en) | Intelligent interaction method and device based on user emotion | |
CN114443818A (en) | Dialogue type knowledge base question-answer implementation method | |
CN114238595A (en) | Metallurgical knowledge question-answering method and system based on knowledge graph | |
Akhter et al. | A Study of Implementation of Deep Learning Techniques for Text Summarization | |
CN117909494B (en) | Abstract consistency assessment model training method and device | |
CN117874190B (en) | Question-answering method and system with traceability based on multi-source knowledge base | |
CN117453895B (en) | Intelligent customer service response method, device, equipment and readable storage medium | |
CN111368526B (en) | Sequence labeling method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |