CN114443827A - Local information perception dialogue method and system based on pre-training language model - Google Patents
Local information perception dialogue method and system based on pre-training language model Download PDFInfo
- Publication number
- CN114443827A CN114443827A CN202210109478.2A CN202210109478A CN114443827A CN 114443827 A CN114443827 A CN 114443827A CN 202210109478 A CN202210109478 A CN 202210109478A CN 114443827 A CN114443827 A CN 114443827A
- Authority
- CN
- China
- Prior art keywords
- reply
- training
- context
- language model
- dialogue
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 151
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000008447 perception Effects 0.000 title claims abstract description 35
- 238000013135 deep learning Methods 0.000 claims abstract description 51
- 101001095965 Dictyostelium discoideum Phospholipid-inositol phosphatase Proteins 0.000 claims abstract description 23
- 102100022893 Histone acetyltransferase KAT5 Human genes 0.000 claims abstract description 23
- 101001046996 Homo sapiens Histone acetyltransferase KAT5 Proteins 0.000 claims abstract description 23
- 101000738757 Homo sapiens Phosphatidylglycerophosphatase and protein-tyrosine phosphatase 1 Proteins 0.000 claims abstract description 23
- 238000002372 labelling Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 90
- 230000006870 function Effects 0.000 claims description 28
- 230000007246 mechanism Effects 0.000 claims description 22
- 238000012512 characterization method Methods 0.000 claims description 16
- 238000013507 mapping Methods 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000011478 gradient descent method Methods 0.000 claims description 6
- 235000019580 granularity Nutrition 0.000 claims description 5
- 238000013480 data collection Methods 0.000 claims description 4
- 230000002452 interceptive effect Effects 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000000844 transformation Methods 0.000 claims description 4
- 230000009471 action Effects 0.000 claims description 3
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 230000001737 promoting effect Effects 0.000 claims description 2
- 238000011160 research Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 241000282320 Panthera leo Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a local information perception dialogue method and a system based on a pre-training language model, wherein the method comprises the following steps: step A: collecting multi-turn dialog texts of a specific scene, labeling the category to which each multi-turn dialog reply belongs, and constructing a training set with positive and negative category labelsD(ii) a And B: use training setDTraining a local information perception deep learning network model PLIP based on a pre-training language model for selecting a reply corresponding to a given multi-turn dialogue context; and C: and inputting the multi-turn conversation context and the reply set into the trained local information perception deep learning network model PLIP to obtain the most appropriate reply corresponding to the multi-turn conversation context. The method and the system can effectively improve the accuracy of multi-turn dialogue reply selection.
Description
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a local information perception dialogue method and system based on a pre-training language model.
Background
In recent years, with the development of machine learning and deep learning networks, human beings have made great progress in intelligent dialogue with computers, and dialogue systems gradually move into the visual field of people. The dialogue system has important research value for both the industrial and academic fields, and can be widely applied in many fields. The current dialogue system algorithm mainly comprises two types of generative dialogue and retrieval dialogue, wherein the generative dialogue can generate an answer word by word according to a question without depending on any corpus in the reasoning stage, the generated answer has the advantage of diversity, but the obtained answer is usually not strong in logic and sometimes falls into a trap of safe reply. The retrieval type dialogue is to enable an algorithm to find a most appropriate answer from a corpus according to a specific question to reply, extract information related to correct reply from the question, and deduce the appropriate answer according to the information. The search-type dialogue model is widely applied to a multi-turn dialogue system such as Microsoft ice, and is more reliable and has better practicability compared with a generation-type dialogue model.
Two reference models are constructed by Lowe et al for reply selection tasks in a search-type multi-turn dialog, which are respectively based on a Recurrent Neural Networks (RNNs) algorithm and a Long Short Term Memory network (LSTM) algorithm. In the process of coding the text, the two reference models memorize the text characteristics at the previous moment by means of the hidden layer unit of the RNN, time sequence information is introduced for the models, and the defect that a bag-of-words model is used in an early algorithm is overcome. However, in a plurality of rounds of conversations, the conversation history may be lengthy, not all contents are related to the reply, and the two reference models directly encode the whole conversation data, so that important information cannot be extracted from the conversation data in a targeted manner, and unnecessary noise is brought to the models. In order to extract important information from long texts, researchers propose to extract important information by matching context and reply, and decompose the reply selection task into three steps, the first step is to extract features from each utterance and reply by using an RNN-based algorithm, the second step is to match the extracted utterance features with the reply features, and the third step is to extract information required for calculating scores in a matching matrix by using a method such as CNN. However, semantic information that the RNN can extract is limited, RNN coding assumes that data are sequence-related, but topics in conversational data are dynamic, two distant paragraphs may also be highly related, RNN coding is difficult to accurately learn the relationship of the two paragraphs, and at the same time, RNN coding may also have a phenomenon of gradient disappearance when a paragraph is long in coding length, and cannot well acquire a distant dependency relationship. The limitations of RNN result in the possibility that the above method may have lost important information in the first step. The Transformer architecture proposed by Vaswani in 2017 can fully grasp global dependency information by means of a large amount of self-attention and interactive attention operations, and is not limited by sequence distance. Researchers rewrite and apply the encoder part of the transform to the encoding module of the model, so that the capability of the model for extracting information is enhanced, and meanwhile, the influence of a multi-head attention mechanism in the transform is utilized, the work constructs semantic information with various granularities by utilizing the multi-head attention in a matching stage, the feature representation of the model is enriched, and an obvious improvement effect is achieved. However, the above model has the following problems. First, global sequence information is under consideration. The model mainly uses methods such as RNN to code all statement characterizations after the matching is finished, and the statement characterizations can lose important information in the coding and matching stages. Second, the word vector representation used does not take into account the context. The model mainly uses static Word vectors such as Word2vec, the problem of polysemy of a Word is difficult to solve, semantic information cannot be accurately expressed according to different context, and therefore noise is brought in the encoding stage.
In 2018, Google proposed a well-known BERT (bidirectional Encoder retrieval from transformations) model. In the process of coding dialogue data, the BERT can do a depth self-attention mechanism aiming at word granularity in the global range, and can effectively solve the two problems that global sequence information is not considered sufficiently and the context is not considered in word vector representation by combining position embedded representation in the BERT. Therefore, the research on replying selection tasks in multiple rounds of conversations mainly turns to a method based on a pre-training language model, and the basic steps of the method are firstly to encode the whole section of conversation data by using the pre-training language model with a multi-layer Transformer encoder, and then input an output representation capable of representing the position of global information [ CLS ] in the output into a classification layer for prediction. On the basis, researchers can strengthen the adaptive capacity of the pre-training language model on the conversation task by enabling the pre-training language model to learn the post-training strategy of the domain data before fine adjustment. In addition, some research works try to generate additional dialogue texts or generate data enhancement methods such as embedding of dialogue people, some research works extract the output of the pre-training language model to perform further fine matching to filter noise irrelevant to reply, and the research works add new data or modules on the basis of the pre-training language model to achieve good effects. However, the method simply inputs the context and the reply concatenation into the pre-training language model, some potential features in the dialogue data cannot be learned, the number of parameters of the model is greatly increased due to the addition of additional modules, the improvement effect is limited, and the potential of the pre-training language model cannot be efficiently mined. Recently, some research works introduce multi-task learning into multiple rounds of dialogue reply selection tasks, and design multiple subtasks according to the characteristics of the continuity, consistency and the like of dialogue data. The subtasks share the parameters of the pre-training language model together with the main task, and an additional loss function is designed to purposefully optimize the pre-training language model. Compared with the prior framework, the method combining the pre-training language model and the multi-task learning shows a remarkable effect in the multi-round dialogue reply selection task, further excavates the potential of the pre-training language model, reduces the number of parameters of the model, can learn the potential features in the dialogue data more efficiently, and greatly improves the comprehension capability of the pre-training language model on the dialogue data.
In summary, although the model of multi-turn dialog reply selection based on the pre-trained language model has been developed, the following problems still exist: first, the subtasks of the design are not efficient enough. Designing multiple subtasks for the pre-trained language model means that the pre-trained language model needs to be fine-tuned multiple times during training, which is at the cost of a large amount of time cost and computing resources, and the training cost is higher as more subtasks are used. Meanwhile, the optimization objectives of the subtasks are very different from those of the main task, and for example, the subtasks that predict their positions after the utterance is deleted may cause noise to the main task. Second, the semantic comprehension capabilities of the pre-trained language model are not fully exploited. The method only uses the [ CLS ] label representing the global information to predict in the main task, and ignores a large amount of information output at other positions except the [ CLS ]. Meanwhile, the model can understand the dialogue semantics from different angles in a plurality of auxiliary tasks and learn precious dialogue information, but the main task only uses the output of the [ CLS ] position for simple classification, and does not extract the information learned in the auxiliary tasks in a targeted manner, so that the model cannot fully utilize the important semantic information learned in the auxiliary tasks.
Disclosure of Invention
The invention aims to provide a local information perception dialogue method and system based on a pre-training language model, which are beneficial to improving the accuracy of multi-turn dialogue reply selection.
In order to achieve the purpose, the invention adopts the technical scheme that: a local information perception dialogue method based on a pre-training language model comprises the following steps:
step A: collecting multi-turn dialog texts of a specific scene, labeling the category to which each multi-turn dialog reply belongs, and constructing a training set D with positive and negative category labels;
and B: training a local information perception deep learning network model PLIP based on a pre-training language model by using a training set D, and selecting a reply corresponding to a given multi-turn conversation context;
and C: and inputting the multi-turn conversation context and the reply set into the trained local information perception deep learning network model PLIP to obtain the most appropriate reply corresponding to the multi-turn conversation context.
Further, the step B specifically includes the following steps:
step B1: inputting each sample of the training set D into the deep learning network model in the form of a triplet (c, r, y), wherein c is { u ═ u1,u2,...,umDenotes a dialog context containing m utterances, with the t-th utterance in the context Wherein ltIs the number of words in the t-th utterance, r is a candidate reply,lrfor the number of words in the reply, y belongs to {0,1} as a sample label, y equals 1 to indicate that the candidate reply is a reasonable reply of the current context, and y equals 0 to indicate unreasonable;
the deep learning network model PLIP outputs an evaluation score capable of reflecting context and reply correlation degree after encoding a calculation triple, the deep learning network model learns context semantic representation combined with the context by utilizing a multi-layer attention mechanism of a pre-training language model, and simultaneously adopts a multi-task learning strategy, so that the learning of the pre-training language model on the local context of multi-round conversation is enhanced in an auxiliary task while optimizing a main task, namely a multi-round conversation reply selection task, the characterization vector understanding global information is promoted, the context and the reply correlation degree are learned, and the semantic understanding capability of the pre-training language model is fully developed;
step B2: in the auxiliary task part, a PLIP (platform learning network) model replies a prediction task by using a random sliding window to further strengthen the comprehension capability of a pre-training language model on the local context of the multi-turn conversation;
the method comprises the steps that a random sliding window replying prediction task samples dialogue context data of different initial positions in a multi-turn dialogue context to obtain dialogue segments, a pre-training language model is used for coding the dialogue segments, and the window replying is predicted, so that the pre-training language model can fully learn semantic information of local contexts;
step B3: in a multi-round dialogue reply selection task, a deep learning network model PLIP adopts a local information perception module to promote a pre-training language model to generate local semantic information, meanwhile, global information and the local semantic information are fused, rationality scores between multi-round dialogue contexts and replies are calculated, whether the current reply corresponds to the given multi-round dialogue context is evaluated, finally, according to a target loss function, the gradient of each parameter in the deep learning network model is calculated by using a back propagation method, and the parameter is updated by using a random gradient descent method;
step B4: and terminating the training of the deep learning network model when the iterative change of the loss value generated by the PLIP of the deep learning network model is smaller than a set threshold value or reaches the maximum iteration times.
Further, the step B1 specifically includes the following steps:
step B11: splicing the words and the replies in the conversation context to obtain an input x of the deep learning network model;
x={[CLS],u1,[EOT],u2,[EOT],…,[EOT],um,[SEP],r,[SEP]}
wherein, x is a long text obtained by splicing, [ SEP ] is a separator, [ CLS ] is a mark used for learning global features by a deep learning network model, and [ EOT ] is a special mark used for learning local information by the deep learning network model;
step B12: mapping x into a form of a number sequence through a dictionary of a pre-training language model, wherein each number is an id of a word in a word list, inputting the id sequence into an embedding layer in the pre-training language model, and mapping the id sequence into word embedding representation, position embedding representation and paragraph embedding representation according to three initialized embedding matrixes;
X=Embeddingword(x)+Embeddingpos(xpos)+Embeddingtype(xtype)
wherein, EmbeddingwordRepresenting the mapping of word-embedded representations, the input sequence being able to be mapped into a word vector, Embedding, according to a vocabularyposThe mapping mode of the expression position Embedding expression can be mapped to a corresponding position Embedding matrix, Embedding according to the position of each wordtypeThe mapping mode of the embedded representation of the representation paragraph can map the context and the reply to different vector spaces to obtain three word vectors, and then the three word vectors are added to obtain the word vector l is the number of words in x, [ CLS]、[SEP]、[EOT]All are regarded as a word;
step B13: adding the word embedded representation, the sentence representation and the position representation of each word to obtain a fused embedded representation, and coding by using a multilayer Transformer network to obtain high-level semantic feature representation of a sequence;
the multi-layer Transformer network is formed by stacking a plurality of Transformer coding blocks; each Transformer coding block comprises a multi-head self-attention mechanism and a forward feedback layer, and a residual error connection and normalization layer is arranged behind each sublayer; x is firstly mapped into three vectors, namely a query vector Q, a key vector K and a value vector V, and the calculation formula is as follows:
Q=XWQ+bQ
K=XWK+bK
V=XWV+bV
wherein, WQ、WK、WV、bQ、bK、bVRepresenting a training parameter;
step B14: sending Q, K, V vectors into a multi-head self-attention mechanism, dividing h sub-vectors on the word vector dimension d of the h sub-vectors, wherein the dimension of each sub-vector is d/h, respectively sending the h sub-vectors into the self-attention mechanism for training, and finally splicing the h sub-vectors to obtain a d-dimension output vector C; in order to prevent overfitting, make the vector more integral and accelerate network convergence, residual connection and normalization are added to the multi-head self-attention mechanism sublayer to obtain a vector T, and the calculation formula is as follows:
C=Concat(head1,head2,...,headh)WC+bC
T=LayerNorm(X+C)
wherein the headiRepresents the self-attention score of the ith sub-vector,WC,bCrepresenting a training parameter, Concat representing splicing operation, and LayerNorm being layer normalization transformation;
step B15: sending the vector T into a fully-connected forward feedback sublayer, performing two linear transformations on the T by the layer to obtain comprehensive characteristics FFN of the sequence, performing residual error connection on the T and the FFN, and performing layer normalization processing to obtain final high-level characteristics H of the sequence, wherein a computer formula is as follows:
FFN=(WFT+bF)WN+bN
H=LayerNorm(T+FFN)
wherein, WF、WN、bF、bNRepresenting the training parameters.
Further, the step B2 specifically includes the following steps:
step B21: in the auxiliary task random sliding window reply prediction, the length and the position of a sliding window are set to be random by a model, a large amount of dialogue local context data falling in the sliding window are sampled from the dialogue context, and a special label [ EOT ] is inserted behind each utterance of the dialogue local context data, so that the following formula is shown:
wherein x 'is the input of the subtask, different from the main task, x' only retains the information inside the window, other information is replaced by [ PAD ], i is the initial position of the sliding window, w represents the size of the current window, m represents the number of utterances of the current context, and k is a hyper-parameter, which represents the size of the minimum window;
step B22: the dialogue window data is encoded using the pre-training language model BERT, the formula is as follows:
E=BERT(x′)
step B23: the vector E obtained in the step B22 contains all semantic representations of the dialog segments coded by the pre-training language model BERT, and the semantic representation which can represent the current dialog segment most is further selected from the vector E to optimize the auxiliary task; in order not to destroy [ CLS ] capable of representing global information in pre-training language model]Indicating that the model only selects the [ EOT ] with the nearest distance window reply in the output of the pre-training language model]Represents E[EOT]The vector is used as a final characterization vector of a random sliding window reply prediction task; the auxiliary task expresses reasonability of window data, and [ EOT ] in BERT]The tag learns the information of different segments and different moments in the conversation and enriches the EOT]The ability to understand local area information;
step B24: obtaining the final characterization vector E[EOT]Then, the score is calculated by inputting the score into the classification layer, and the calculation formula is as follows:
g(wc,wr)=σ(Ww TE[EOT]+bw)
wherein, wc、wrRepresenting context and reply within a sliding window, WwIs a trainable parameter in the prediction layer, sigma (·) represents sigmoid activation function;
step B25: the random sliding window reply prediction task is optimized by adopting a gradient descending mode aiming at an objective function, the objective function adopts a cross entropy loss function to evaluate the difference between the current mark and the real dialogue window mark, and the specific formula is as follows:
where D' represents a window data set.
Further, the step B3 specifically includes the following steps:
step B31: the local information awareness module embeds a special tag [ EOT ] behind each sentence in the dialog context, as shown in the following formula:
x={[CLS],u1,[EOT],u2,[EOT],…,[EOT],um,[SEP],r,[SEP]}
under the combined action of a pre-training language model deep attention mechanism and position embedding, the special label [ EOT ] of each position can learn the interactive information with the surrounding text at the specific position; meanwhile, in the process of randomly sliding the window to reply the prediction task optimization, the last [ EOT ] label in the window is used for establishing a classification task, and the capability of replying the identification window is gradually learned; thus, the representation of the [ EOT ] tag gradually learns the correct representation of the sentence and focuses more on the text of the local region;
step B32: in the feature fusion stage, the local information perception module selects n local semantic representations which are closest to the response from the output of the pre-training language model as local information with multiple granularities, and simultaneously, the local information is aggregated into a whole in a splicing mode, wherein the specific formula is as follows:
wherein, l represents the entry closest to the reply, and n is a hyper-parameter used for representing the number of [ EOT ] representations to be taken out;
step B33: the local information perception module integrally fuses local information and global information to obtain a final characterization vector of a main task, and the aggregation process is as follows:
step B34: inputting the aggregated characterization vectors into a classification layer to calculate the rationality score between the current multi-turn conversation context and the reply, wherein the formula is as follows:
g(c,r)=σ(WTEensemble+b)
where W is a trainable parameter, σ (-) represents a sigmoid activation function, b is a bias term for the current classification level;
step B35: the PLIP model updates parameters in the learning model in a gradient descending mode, and meanwhile, cross entropy is adopted as a loss function for a multi-round dialogue reply selection task, and the specific formula is as follows:
and combining the optimization target of the auxiliary task, wherein the final loss function of the model is as follows:
Loss=Lossmain+αLosswindow
wherein alpha is a hyper-parameter used for controlling the auxiliary task to recover the influence of the prediction task on the model by the random sliding window.
The invention also provides a local information perception dialogue system adopting the method, which comprises the following steps:
the data collection module is used for collecting multi-round conversation samples in a specific field, labeling answer positive and negative labels corresponding to each question in the multi-round conversation data, and constructing a multi-round conversation reply selection training set D with the positive and negative labels;
the pre-training language model coding module is mainly composed of an embedded layer and a multi-layer multi-head attention mechanism; sending each training sample in the form of a triplet of the training set D into a pre-training language model BERT, and learning to combine context semantic representation by utilizing a multi-layer attention mechanism of the pre-training language model; meanwhile, the model fully excavates the semantic understanding ability of the pre-training language model in a multi-task learning mode;
the auxiliary task module is used for exporting parameters of the pre-training language model BERT, and replying a prediction task by using a random sliding window to further enhance the comprehension capability of the pre-training language model on local dialogue information; the random sliding window replying prediction task samples window data with different positions and sizes in a multi-turn conversation context, a derived pre-training language model is used for coding a conversation window, and the pre-training language model is made to fully learn local language characteristics of different conversation stages and conversation lengths by utilizing the reply of a newly added special label [ EOT ] prediction window;
the local information perception module is used for promoting a pre-training language model BERT to generate multi-granularity local semantic information by adopting the local information perception module in a multi-round dialogue reply selection task, meanwhile, global information and the local semantic information are fused to perform classification score calculation, and whether the current reply corresponds to a given multi-round dialogue context is evaluated; finally, calculating the gradient of each parameter in the deep learning network model by using a back propagation method according to the target loss function, and updating the parameters by using a random gradient descent method; and
and the network training module is used for terminating the training of the deep learning network model when the loss value iteration change generated by the deep learning network model is smaller than a set threshold value and is not reduced or reaches the maximum iteration times.
Compared with the prior art, the invention has the following beneficial effects: the method and the system adopt a multi-task learning strategy, strengthen the learning of the pre-training language model to the multi-round dialogue local area in an auxiliary task while optimizing a main task, namely a multi-round dialogue reply selection task, learning context and reply correlation degree, and fully excavate the semantic understanding capability of the pre-training language model, thereby fusing global information and local semantic information and obtaining the most appropriate reply corresponding to the multi-round dialogue context. Therefore, the invention can effectively improve the accuracy of multi-turn dialogue reply selection and has strong practicability and wide application prospect.
Drawings
FIG. 1 is a flow chart of a method implementation of an embodiment of the present invention;
FIG. 2 is a diagram of a local information-aware deep learning network model architecture based on a pre-trained language model according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating random sliding window reply prediction in accordance with an embodiment of the present invention;
FIG. 4 is a diagram of a random sliding window recovery prediction structure according to an embodiment of the present invention;
FIG. 5 is a block diagram of a local information awareness module according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides a local information-aware dialogue method based on a pre-trained language model, which includes the following steps:
step A: and collecting multi-turn dialog texts of a specific scene, labeling the category to which each multi-turn dialog reply belongs, and constructing a training set D with positive and negative category labels.
And B: and training a local information perception deep learning network model PLIP based on the pre-training language model by using a training set D, and selecting a reply corresponding to the given multi-turn conversation context.
The step B specifically comprises the following steps:
step B1: inputting each sample of the training set D into the deep learning network model in the form of a triplet (c, r, y), wherein c is { u ═ u1,u2,...,umDenotes a dialog context containing m utterances, with the t-th utterance in the context Wherein ltIs the number of words in the t-th utterance, r is a candidate reply,lrfor the number of words in the reply, y ∈ {0,1} is a sample label, where y ═ 1 indicates that the candidate reply is a reasonable reply for the current context, and y ═ 0 indicates unreasonable.
The deep learning network model PLIP outputs an evaluation score capable of reflecting context and reply correlation degree after encoding and calculating triples, the deep learning network model learns context semantic representation combined with the context by utilizing a multi-layer attention mechanism of a pre-training language model, and meanwhile, a multi-task learning strategy is adopted, so that the learning of the pre-training language model on the local context of multi-round conversation is enhanced in an auxiliary task while a main task, namely a multi-round conversation reply selection task, is optimized, the representation vector understanding of global information is promoted, the context and the reply correlation degree are learned, and the semantic understanding capability of the pre-training language model is fully mined. The architecture of the deep learning network model is shown in fig. 2.
The step B1 specifically includes the following steps:
step B11: splicing the words and the replies in the conversation context to obtain an input x of the deep learning network model;
x={[CLS],u1,[EOT],u2,[EOT],…,[EOT],um,[SEP],r,[SEP]}
wherein, x is a long text obtained by splicing, [ SEP ] is a separator, [ CLS ] is a mark used by the deep learning network model for learning global features, and [ EOT ] is a special mark used by the deep learning network model for learning local information.
Step B12: mapping x into a form of a number sequence through a dictionary of a pre-training language model, wherein each number is an id of a word in a word list, inputting the id sequence into an embedding layer in the pre-training language model, and mapping the id sequence into word embedding representation, position embedding representation and paragraph embedding representation according to three initialized embedding matrixes;
X=Embeddingword(x)+Embeddingpos(xpos)+Embeddingtype(xtype)
wherein, EmbeddingwordRepresenting the mapping of word-embedded representations, the input sequence being able to be mapped into a word vector, Embedding, according to a vocabularyposThe mapping mode of the expression position Embedding expression can be mapped to a corresponding position Embedding matrix, Embedding according to the position of each wordtypeThe mapping mode of the embedded representation of the representation paragraph can map the context and the reply to different vector spaces to obtain three word vectors, and then the three word vectors are added to obtain the word vector l is the number of words in x, [ CLS]、[SEP]、[EOT]Are all regarded as a word.
Step B13: adding the word embedded representation, the sentence representation and the position representation of each word to obtain a fused embedded representation, and coding by using a multilayer Transformer network to obtain the high-level semantic feature representation of the sequence.
The multi-layer Transformer network is formed by stacking a plurality of Transformer coding blocks; each Transformer coding block comprises a multi-head self-attention mechanism and a forward feedback layer, and a residual error connection and normalization layer is arranged behind each sublayer; x is firstly mapped into three vectors, namely a query vector Q, a key vector K and a value vector V, and the calculation formula is as follows:
Q=XWQ+bQ
K=XWK+bK
V=XWV+bV
wherein, WQ、WK、WV、bQ、bK、bVRepresenting the training parameters.
Step B14: sending Q, K, V vectors into a multi-head self-attention machine system, dividing h sub-vectors on the word vector dimension d of the vectors, wherein the dimension of each sub-vector is d/h, respectively sending the sub-vectors into the self-attention machine system for training, and finally splicing the h self-attention sub-vectors to obtain a d-dimensional output vector C again; in order to prevent overfitting, make the vector more integral and accelerate network convergence, residual connection and normalization are added to the multi-head self-attention mechanism sublayer to obtain a vector T, and the calculation formula is as follows:
C=Concat(head1,head2,...,headh)WC+bC
T=LayerNorm(X+C)
wherein the headiRepresents the self-attention score of the ith sub-vector,WC,bCrepresents the training parameters, Concat represents the splicing operation, and LayerNorm is the layer normalization transformation.
Step B15: sending the vector T into a fully-connected forward feedback sublayer, performing two linear transformations on the T by the layer to obtain comprehensive characteristics FFN of the sequence, performing residual error connection on the T and the FFN, and performing layer normalization processing to obtain final high-level characteristics H of the sequence, wherein a computer formula is as follows:
FFN=(WFT+bF)WN+bN
H=LayerNorm(T+FFN)
wherein, WF、WN、bF、bNRepresenting the training parameters.
Step B2: in the auxiliary task part, the PLIP deep learning network model uses a random sliding window to reply to a prediction task to further strengthen the comprehensibility of the pre-training language model on the local context of the multi-turn dialogue.
The method comprises the steps that a random sliding window replying prediction task samples dialogue context data of different initial positions in a multi-turn dialogue context to obtain dialogue segments, a pre-training language model is used for coding the dialogue segments, and the window replying is predicted, so that the pre-training language model can fully learn semantic information of local contexts. The process and structure of the random sliding window reply prediction are shown in fig. 3 and 4.
The step B2 specifically includes the following steps:
step B21: in the auxiliary task random sliding window reply prediction, the length and the position of a sliding window are set to be random by a model, a large amount of dialogue local context data falling in the sliding window are sampled from the dialogue context, and a special label [ EOT ] is inserted behind each utterance of the dialogue local context data, so that the following formula is shown:
wherein x 'is the input of the subtask, different from the main task, x' only retains the information inside the window, other information is replaced by [ PAD ], i is the starting position of the sliding window, w represents the size of the current window, m represents the number of utterances of the current context, and κ is a hyper-parameter, which represents the size of the minimum window.
Step B22: the dialog window data is encoded using the pre-training language model BERT, the formula being as follows:
E=BERT(x′)
step B23: the vector E obtained in the step B22 contains all semantic representations of the dialog segments coded by the pre-training language model BERT, and the semantic representation which can represent the current dialog segment most is further selected from the vector E to optimize the auxiliary task; in order not to destroy [ CLS ] capable of representing global information in pre-training language model]Indicating that the model only selects the [ EOT ] with the nearest distance window reply in the output of the pre-training language model]Represents E[EOT]The vector is used as a final characterization vector of a random sliding window reply prediction task; the auxiliary task expresses reasonability of window data, and [ EOT ] in BERT]The tag learns the information of different segments and different moments in the conversation and enriches the EOT]Indicates the ability to understand the local area information.
Step B24: obtaining the final characterization vector E[EOT]Then, the score is calculated by inputting the score into the classification layer, and the calculation formula is as follows:
g(wc,wr)=σ(Ww TE[EOT]+bw)
wherein, wc、wrRepresenting context and reply within a sliding window, WwIs a trainable parameter in the prediction layer and σ (-) represents the sigmoid activation function.
Step B25: the random sliding window reply prediction task is optimized by adopting a gradient descending mode aiming at an objective function, the objective function adopts a cross entropy loss function to evaluate the difference between the current mark and the real dialogue window mark, and the specific formula is as follows:
where D' represents a window data set.
Step B3: in the multi-round dialogue reply selection task, a local information perception module shown in figure 5 is adopted by a deep learning network model PLIP to promote a pre-training language model to generate local semantic information, global information and the local semantic information are fused at the same time, the rationality score between the multi-round dialogue context and the reply is calculated, whether the current reply corresponds to the given multi-round dialogue context is evaluated, finally, the gradient of each parameter in the deep learning network model is calculated by using a back propagation method according to a target loss function, and the parameter is updated by using a random gradient descent method.
The step B3 specifically includes the following steps:
step B31: the local information awareness module embeds a special tag [ EOT ] behind each sentence in the dialog context, as shown in the following formula:
x={[CLS],u1,[EOT],u2,[EOT],…,[EOT],um,[SEP],r,[SEP]}
under the combined action of a pre-training language model deep attention mechanism and position embedding, the special label [ EOT ] of each position can learn the interactive information with the surrounding text at the specific position; meanwhile, in the process of randomly sliding the window to reply the prediction task optimization, the last [ EOT ] label in the window is used for establishing a classification task, and the capability of replying the identification window is gradually learned; thus, the representation of the [ EOT ] tag gradually learns the correct representation of the sentence and focuses more on the text of the local region.
Step B32: in the feature fusion stage, the local information perception module selects n local semantic representations which are closest to the response from the output of the pre-training language model as local information with multiple granularities, and simultaneously, the local information is aggregated into a whole in a splicing mode, wherein the specific formula is as follows:
where l represents the entry closest to the reply, and n is a hyperparameter representing the number of fetch [ EOT ] tokens.
Step B33: the local information perception module integrally fuses local information and global information to obtain a final characterization vector of a main task, and the aggregation process is as follows:
step B34: inputting the aggregated characterization vectors into a classification layer to calculate the rationality score between the current multi-turn conversation context and the reply, wherein the formula is as follows:
g(c,r)=σ(WTEensemble+b)
where W is a trainable parameter, σ (-) stands for sigmoid activation function, and b is a bias term for the current classification level.
Step B35: the PLIP model updates parameters in the learning model in a gradient descending mode, and meanwhile, cross entropy is adopted as a loss function for a multi-round dialogue reply selection task, and the specific formula is as follows:
combining the optimization objective of the auxiliary task, the final loss function of the model is:
Loss=Lossmain+αLosswindow
wherein alpha is a hyper-parameter used for controlling the auxiliary task to recover the influence of the prediction task on the model by the random sliding window.
Step B4: and terminating the training of the deep learning network model when the iterative change of the loss value generated by the PLIP of the deep learning network model is smaller than a set threshold value or reaches the maximum iteration times.
And C: and inputting the multi-turn conversation context and the reply set into the trained local information perception deep learning network model PLIP to obtain the most appropriate reply corresponding to the multi-turn conversation context.
The embodiment also provides a local information perception dialogue method and a system adopting the method, and the method comprises a data collection module, a pre-training language model coding module, an auxiliary task module, a local information perception module and a network training module.
The data collection module is used for collecting multi-round conversation samples in a specific field, labeling answer positive and negative labels corresponding to each question in the multi-round conversation data, and constructing a multi-round conversation reply selection training set D with the positive and negative labels.
The pre-training language model coding module comprises a pre-training language model, and the pre-training language model mainly comprises an embedded layer and a multi-layer multi-head attention mechanism; sending each training sample in the form of a triplet of the training set D into a pre-training language model BERT, and learning to combine context semantic representation by utilizing a multi-layer attention mechanism of the pre-training language model; meanwhile, the model fully excavates the semantic understanding ability of the pre-training language model in a multi-task learning mode.
In an auxiliary task module, parameters of a pre-training language model BERT are exported by the model, and the comprehension capability of the pre-training language model on conversation local information is further enhanced by replying a prediction task by using a random sliding window; the random sliding window replying prediction task samples window data with different positions and sizes in a multi-turn conversation context, encodes a conversation window by using a derived pre-training language model, and makes the pre-training language model fully learn local language characteristics of different conversation stages and conversation lengths by using the replying of a newly-added special label [ EOT ] prediction window.
In the multi-round dialog reply selection task, the model adopts a local information perception module to promote a pre-training language model BERT to generate multi-granularity local semantic information, meanwhile, global information and the local semantic information are fused to perform classification score calculation, and whether the current reply corresponds to a given multi-round dialog context is evaluated; and finally, calculating the gradient of each parameter in the deep learning network model by using a back propagation method according to the target loss function, and updating the parameter by using a random gradient descent method.
And the network training module is used for training the network model, and when the iterative change of the loss value generated by the deep learning network model is smaller than a set threshold value and is not reduced or reaches the maximum iterative times, the training of the deep learning network model is terminated.
The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.
Claims (6)
1. A local information perception dialogue method based on a pre-training language model is characterized by comprising the following steps:
step A: collecting multi-turn dialog texts of a specific scene, labeling the category to which each multi-turn dialog reply belongs, and constructing a training set D with positive and negative category labels;
and B, step B: training a local information perception deep learning network model PLIP based on a pre-training language model by using a training set D, and selecting a reply corresponding to a given multi-turn conversation context;
and C: and inputting the multi-turn conversation context and the reply set into the trained local information perception deep learning network model PLIP to obtain the most appropriate reply corresponding to the multi-turn conversation context.
2. The local information perception dialogue method and system based on the pre-trained language model according to claim 1, wherein the step B specifically comprises the following steps:
step B1: inputting each sample of the training set D into the deep learning network model in the form of a triplet (c, r, y), wherein c is { u ═ u1,u2,...,umDenotes a dialog context containing m utterances, with the t-th utterance in the context Wherein ltIs the number of words in the t-th utterance, r is a candidate reply,lrfor the number of words in the reply, y belongs to {0,1} as a sample label, y equals 1 to indicate that the candidate reply is a reasonable reply of the current context, and y equals 0 to indicate unreasonable;
the deep learning network model PLIP outputs an evaluation score capable of reflecting context and reply correlation degree after encoding a calculation triple, the deep learning network model learns context semantic representation combined with the context by utilizing a multi-layer attention mechanism of a pre-training language model, and simultaneously adopts a multi-task learning strategy, so that the learning of the pre-training language model on the local context of multi-round conversation is enhanced in an auxiliary task while optimizing a main task, namely a multi-round conversation reply selection task, the characterization vector understanding global information is promoted, the context and the reply correlation degree are learned, and the semantic understanding capability of the pre-training language model is fully developed;
step B2: in the auxiliary task part, a PLIP (platform learning network) model replies a prediction task by using a random sliding window to further strengthen the comprehension capability of a pre-training language model on the local context of the multi-turn conversation;
the method comprises the steps that a random sliding window replying prediction task samples dialogue context data of different initial positions in a multi-turn dialogue context to obtain dialogue segments, a pre-training language model is used for coding the dialogue segments, and the window replying is predicted, so that the pre-training language model can fully learn semantic information of local contexts;
step B3: in a multi-round dialogue reply selection task, a deep learning network model PLIP adopts a local information perception module to promote a pre-training language model to generate local semantic information, meanwhile, global information and the local semantic information are fused, rationality scores between multi-round dialogue contexts and replies are calculated, whether the current reply corresponds to the given multi-round dialogue context is evaluated, finally, according to a target loss function, the gradient of each parameter in the deep learning network model is calculated by using a back propagation method, and the parameter is updated by using a random gradient descent method;
step B4: and terminating the training of the deep learning network model when the iterative change of the loss value generated by the PLIP of the deep learning network model is smaller than a set threshold value or reaches the maximum iteration times.
3. The local information perception dialogue method based on the pre-trained language model of claim 2, wherein the step B1 specifically comprises the following steps:
step B11: splicing the words and the replies in the conversation context to obtain an input x of the deep learning network model;
x={[CLS],u1,[EOT],u2,[EOT],…,[EOT],um,[SEP],r,[SEP]}
wherein, x is a long text obtained by splicing, [ SEP ] is a separator, [ CLS ] is a mark used for learning global features of a deep learning network model, and [ EOT ] is a special mark used for learning local information of the deep learning network model;
step B12: mapping x into a form of a digital sequence through a dictionary of a pre-training language model, wherein each digit is an id of a word in a word list, inputting the id sequence into an embedding layer in the pre-training language model, and mapping the id sequence into word embedding representation, position embedding representation and paragraph embedding representation according to three initialized embedding matrixes;
X=Embeddingword(x)+Embeddingpos(xpos)+Embeddingtype(xtype)
wherein, EmbeddingwordRepresenting the mapping of word-embedded representations, the input sequence being able to be mapped into a word vector, Embedding, according to a vocabularyposThe mapping mode of the expression position Embedding expression can be mapped to a corresponding position Embedding matrix, Embedding according to the position of each wordtypeThe mapping mode of the embedded representation of the representation paragraph can map the context and the reply to different vector spaces to obtain three word vectors, and then the three word vectors are added to obtain the word vector l is the number of words in x, [ CLS]、[SEP]、[EOT]All are regarded as a word;
step B13: adding the word embedded representation, the sentence representation and the position representation of each word to obtain a fused embedded representation, and coding by using a multilayer Transformer network to obtain high-level semantic feature representation of a sequence;
the multi-layer Transformer network is formed by stacking a plurality of Transformer coding blocks; each Transformer coding block comprises a multi-head self-attention mechanism and a forward feedback layer, and a residual error connection and normalization layer is arranged behind each sublayer; x is firstly mapped into three vectors, namely a query vector Q, a key vector K and a value vector V, and the calculation formula is as follows:
Q=XWQ+bQ
K=XWK+bK
V=XWV+bV
wherein, WQ、WK、WV、bQ、bK、bVRepresenting a training parameter;
step B14: sending Q, K, V vectors into a multi-head self-attention machine system, dividing h sub-vectors on the word vector dimension d of the vectors, wherein the dimension of each sub-vector is d/h, respectively sending the sub-vectors into the self-attention machine system for training, and finally splicing the h self-attention sub-vectors to obtain a d-dimensional output vector C again; in order to prevent overfitting, make the vector more integral and accelerate network convergence, residual connection and normalization are added to the multi-head self-attention mechanism sublayer to obtain a vector T, and the calculation formula is as follows:
C=Concat(head1,head2,...,headh)WC+bC
T=LayerNorm(X+C)
wherein the headiRepresents the self-attention score of the ith sub-vector,WC,bCrepresenting training parameters, Concat representing splicing operation, and LayerNorm representing layer normalization transformation;
step B15: sending the vector T into a fully-connected forward feedback sublayer, performing two linear transformations on the T by the layer to obtain comprehensive characteristics FFN of the sequence, performing residual error connection on the T and the FFN, and performing layer normalization processing to obtain final high-level characteristics H of the sequence, wherein a computer formula is as follows:
FFN=(WFT+bF)WN+bN
H=LayerNorm(T+FFN)
wherein, WF、WN、bF、bNRepresenting the training parameters.
4. The local information perception dialogue method based on the pre-trained language model of claim 3, wherein the step B2 specifically comprises the following steps:
step B21: in the auxiliary task random sliding window reply prediction, the length and the position of a sliding window are set to be random by a model, a large amount of dialogue local context data falling in the sliding window are sampled from the dialogue context, and a special label [ EOT ] is inserted behind each utterance of the dialogue local context data, so that the following formula is shown:
wherein x 'is the input of the subtask, different from the main task, x' only retains the information inside the window, other information is replaced by [ PAD ], i is the initial position of the sliding window, w represents the size of the current window, m represents the number of utterances of the current context, and k is a hyper-parameter, which represents the size of the minimum window;
step B22: the dialogue window data is encoded using the pre-training language model BERT, the formula is as follows:
E=BERT(x′)
step B23: the vector E obtained in the step B22 contains all semantic representations of the dialog segments coded by the pre-training language model BERT, and the most representative current dialog segment is further selected from the vector ETo optimize the auxiliary task; in order not to destroy [ CLS ] capable of representing global information in pre-training language model]Indicating that the model only selects the [ EOT ] with the nearest distance window reply in the output of the pre-training language model]Represents E[EOT]The vector is used as a final characterization vector of a random sliding window reply prediction task; the auxiliary task expresses reasonability of window data, and [ EOT ] in BERT]The tag learns the information of different segments and different moments in the conversation and enriches the EOT]The ability to understand local area information;
step B24: obtaining the final characterization vector E[EOT]Then, the score is calculated by inputting the score into the classification layer, and the calculation formula is as follows:
g(wc,wr)=σ(Ww TE[EOT]+bw)
wherein, wc、wrRepresenting context and reply within a sliding window, WwIs a trainable parameter in the prediction layer, sigma (·) represents sigmoid activation function;
step B25: the random sliding window reply prediction task is optimized by adopting a gradient descending mode aiming at an objective function, the objective function adopts a cross entropy loss function to evaluate the difference between the current mark and the real dialogue window mark, and the specific formula is as follows:
where D' represents a window data set.
5. The local information perception dialogue method according to claim 4, wherein the step B3 specifically comprises the following steps:
step B31: the local information awareness module embeds a special tag [ EOT ] behind each sentence in the dialog context, as shown in the following formula:
x={[CLS],u1,[EOT],u2,[EOT],…,[EOT],um,[SEP],r,[SEP]}
under the combined action of a pre-training language model deep attention mechanism and position embedding, the special label [ EOT ] of each position can learn the interactive information with the surrounding text at the specific position; meanwhile, in the process of randomly sliding the window to reply the prediction task optimization, the last [ EOT ] label in the window is used for establishing a classification task, and the capability of replying the identification window is gradually learned; thus, the representation of the [ EOT ] tag gradually learns the correct representation of the sentence and focuses more on the text of the local region;
step B32: in the feature fusion stage, the local information perception module selects n local semantic representations which are closest to the response from the output of the pre-training language model as local information with multiple granularities, and simultaneously, the local information is aggregated into a whole in a splicing mode, wherein the specific formula is as follows:
wherein, l represents the entry closest to the reply, and n is a hyper-parameter used for representing the number of [ EOT ] representations to be taken out;
step B33: the local information perception module integrally fuses local information and global information to obtain a final characterization vector of a main task, and the aggregation process is as follows:
step B34: inputting the aggregated characterization vectors into a classification layer to calculate the rationality score between the current multi-turn conversation context and the reply, wherein the formula is as follows:
g(c,r)=σ(WTEensemble+b)
where W is a trainable parameter, σ (-) represents a sigmoid activation function, b is a bias term for the current classification level;
step B35: the PLIP model updates parameters in the learning model in a gradient descending mode, and meanwhile, cross entropy is adopted as a loss function for a multi-round dialogue reply selection task, and the specific formula is as follows:
and combining the optimization target of the auxiliary task, wherein the final loss function of the model is as follows:
Loss=Lossmain+αLosswindow
wherein alpha is a hyper-parameter used for controlling the auxiliary task to recover the influence of the prediction task on the model by the random sliding window.
6. A local information-aware dialog system employing the method of any one of claims 1 to 5, comprising:
the data collection module is used for collecting multi-round conversation samples in a specific field, labeling answer positive and negative labels corresponding to each question in the multi-round conversation data, and constructing a multi-round conversation reply selection training set D with the positive and negative labels;
the pre-training language model coding module is mainly composed of an embedded layer and a multi-layer multi-head attention mechanism; sending each training sample in the form of a triplet of the training set D into a pre-training language model BERT, and learning to combine context semantic representation by utilizing a multi-layer attention mechanism of the pre-training language model; meanwhile, the model fully excavates the semantic understanding ability of the pre-training language model in a multi-task learning mode;
the auxiliary task module is used for exporting parameters of the pre-training language model BERT, and replying a prediction task by using a random sliding window to further enhance the comprehension capability of the pre-training language model on the local information of the conversation; the random sliding window replying prediction task samples window data with different positions and sizes in a multi-turn conversation context, a derived pre-training language model is used for coding a conversation window, and the pre-training language model is made to fully learn local language characteristics of different conversation stages and conversation lengths by utilizing the reply of a newly added special label [ EOT ] prediction window;
the local information perception module is used for promoting a pre-training language model BERT to generate multi-granularity local semantic information, calculating the rationality score between multi-round conversation context and reply and evaluating whether the current reply corresponds to the given multi-round conversation context or not in the multi-round conversation reply selection task; finally, calculating the gradient of each parameter in the deep learning network model by using a back propagation method according to the target loss function, and updating the parameters by using a random gradient descent method; and
and the network training module terminates the training of the deep learning network model when the loss value iteration change generated by the deep learning network model is smaller than a set threshold value or reaches the maximum iteration times.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210109478.2A CN114443827A (en) | 2022-01-28 | 2022-01-28 | Local information perception dialogue method and system based on pre-training language model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210109478.2A CN114443827A (en) | 2022-01-28 | 2022-01-28 | Local information perception dialogue method and system based on pre-training language model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114443827A true CN114443827A (en) | 2022-05-06 |
Family
ID=81370746
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210109478.2A Pending CN114443827A (en) | 2022-01-28 | 2022-01-28 | Local information perception dialogue method and system based on pre-training language model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114443827A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115048944A (en) * | 2022-08-16 | 2022-09-13 | 之江实验室 | Open domain dialogue reply method and system based on theme enhancement |
CN115081437A (en) * | 2022-07-20 | 2022-09-20 | 中国电子科技集团公司第三十研究所 | Machine-generated text detection method and system based on linguistic feature contrast learning |
CN115129838A (en) * | 2022-06-08 | 2022-09-30 | 阿里巴巴(中国)有限公司 | Model training method, dialogue data processing method, device, equipment and storage medium |
CN115129824A (en) * | 2022-08-15 | 2022-09-30 | 山东交通学院 | Search type multi-turn dialogue method and system |
CN115310429A (en) * | 2022-08-05 | 2022-11-08 | 厦门靠谱云股份有限公司 | Data compression and high-performance calculation method in multi-turn listening dialogue model |
CN115329062A (en) * | 2022-10-17 | 2022-11-11 | 中邮消费金融有限公司 | Dialogue model training method under low-data scene and computer equipment |
CN115617971A (en) * | 2022-11-14 | 2023-01-17 | 湖南君安科技有限公司 | Dialog text generation method based on ALBERT-Coref model |
CN116932703A (en) * | 2023-09-19 | 2023-10-24 | 苏州元脑智能科技有限公司 | User controllable content generation method, device, equipment and medium |
CN116957047A (en) * | 2023-09-19 | 2023-10-27 | 苏州元脑智能科技有限公司 | Sampling network updating method, device, equipment and medium |
CN117875434A (en) * | 2024-03-13 | 2024-04-12 | 中国科学技术大学 | Financial large model length extrapolation method for expanding input context length |
CN118227796A (en) * | 2024-05-23 | 2024-06-21 | 国家计算机网络与信息安全管理中心 | Automatic classification and threshold optimization method and system for long text specific content |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111274375A (en) * | 2020-01-20 | 2020-06-12 | 福州大学 | Multi-turn dialogue method and system based on bidirectional GRU network |
WO2021077974A1 (en) * | 2019-10-24 | 2021-04-29 | 西北工业大学 | Personalized dialogue content generating method |
CN112818105A (en) * | 2021-02-05 | 2021-05-18 | 江苏实达迪美数据处理有限公司 | Multi-turn dialogue method and system fusing context information |
CN113806508A (en) * | 2021-09-17 | 2021-12-17 | 平安普惠企业管理有限公司 | Multi-turn dialogue method and device based on artificial intelligence and storage medium |
-
2022
- 2022-01-28 CN CN202210109478.2A patent/CN114443827A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021077974A1 (en) * | 2019-10-24 | 2021-04-29 | 西北工业大学 | Personalized dialogue content generating method |
CN111274375A (en) * | 2020-01-20 | 2020-06-12 | 福州大学 | Multi-turn dialogue method and system based on bidirectional GRU network |
CN112818105A (en) * | 2021-02-05 | 2021-05-18 | 江苏实达迪美数据处理有限公司 | Multi-turn dialogue method and system fusing context information |
CN113806508A (en) * | 2021-09-17 | 2021-12-17 | 平安普惠企业管理有限公司 | Multi-turn dialogue method and device based on artificial intelligence and storage medium |
Non-Patent Citations (2)
Title |
---|
ZELIN CHEN 等: ""Improving BERT with local context comprehension for multi-turn response selection in retrieval-based dialogue systems"", 《COMPUTER SPEECH&LANGUAGE》, vol. 82, 31 July 2023 (2023-07-31), pages 1 - 15 * |
廖彬 等: ""一种局部信息增强与对话结构感知的多轮对话模型"", 《小型微型计算机系统》, vol. 44, no. 11, 30 November 2023 (2023-11-30), pages 2408 - 2415 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115129838A (en) * | 2022-06-08 | 2022-09-30 | 阿里巴巴(中国)有限公司 | Model training method, dialogue data processing method, device, equipment and storage medium |
CN115081437A (en) * | 2022-07-20 | 2022-09-20 | 中国电子科技集团公司第三十研究所 | Machine-generated text detection method and system based on linguistic feature contrast learning |
CN115081437B (en) * | 2022-07-20 | 2022-12-09 | 中国电子科技集团公司第三十研究所 | Machine-generated text detection method and system based on linguistic feature contrast learning |
CN115310429A (en) * | 2022-08-05 | 2022-11-08 | 厦门靠谱云股份有限公司 | Data compression and high-performance calculation method in multi-turn listening dialogue model |
CN115129824B (en) * | 2022-08-15 | 2024-09-13 | 山东交通学院 | Search type multi-round dialogue method and system |
CN115129824A (en) * | 2022-08-15 | 2022-09-30 | 山东交通学院 | Search type multi-turn dialogue method and system |
CN115048944A (en) * | 2022-08-16 | 2022-09-13 | 之江实验室 | Open domain dialogue reply method and system based on theme enhancement |
CN115329062A (en) * | 2022-10-17 | 2022-11-11 | 中邮消费金融有限公司 | Dialogue model training method under low-data scene and computer equipment |
CN115617971A (en) * | 2022-11-14 | 2023-01-17 | 湖南君安科技有限公司 | Dialog text generation method based on ALBERT-Coref model |
CN116932703A (en) * | 2023-09-19 | 2023-10-24 | 苏州元脑智能科技有限公司 | User controllable content generation method, device, equipment and medium |
CN116957047B (en) * | 2023-09-19 | 2024-01-23 | 苏州元脑智能科技有限公司 | Sampling network updating method, device, equipment and medium |
CN116932703B (en) * | 2023-09-19 | 2024-01-23 | 苏州元脑智能科技有限公司 | User controllable content generation method, device, equipment and medium |
CN116957047A (en) * | 2023-09-19 | 2023-10-27 | 苏州元脑智能科技有限公司 | Sampling network updating method, device, equipment and medium |
CN117875434A (en) * | 2024-03-13 | 2024-04-12 | 中国科学技术大学 | Financial large model length extrapolation method for expanding input context length |
CN117875434B (en) * | 2024-03-13 | 2024-06-04 | 中国科学技术大学 | Financial large model length extrapolation method for expanding input context length |
CN118227796A (en) * | 2024-05-23 | 2024-06-21 | 国家计算机网络与信息安全管理中心 | Automatic classification and threshold optimization method and system for long text specific content |
CN118227796B (en) * | 2024-05-23 | 2024-07-19 | 国家计算机网络与信息安全管理中心 | Automatic classification and threshold optimization method and system for long text specific content |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114443827A (en) | Local information perception dialogue method and system based on pre-training language model | |
CN111783462A (en) | Chinese named entity recognition model and method based on dual neural network fusion | |
CN111460176B (en) | Multi-document machine reading and understanding method based on hash learning | |
CN111274375A (en) | Multi-turn dialogue method and system based on bidirectional GRU network | |
CN114490991A (en) | Dialog structure perception dialog method and system based on fine-grained local information enhancement | |
CN113673535B (en) | Image description generation method of multi-modal feature fusion network | |
CN111145914B (en) | Method and device for determining text entity of lung cancer clinical disease seed bank | |
CN112364148B (en) | Deep learning method-based generative chat robot | |
CN116484024A (en) | Multi-level knowledge base construction method based on knowledge graph | |
CN114168754A (en) | Relation extraction method based on syntactic dependency and fusion information | |
CN118132674A (en) | Text information extraction method based on large language model and high-efficiency parameter fine adjustment | |
CN111597816A (en) | Self-attention named entity recognition method, device, equipment and storage medium | |
CN113887836A (en) | Narrative event prediction method fusing event environment information | |
CN117933226A (en) | Context-aware dialogue information extraction system and method | |
CN115422388B (en) | Visual dialogue method and system | |
CN111813907A (en) | Question and sentence intention identification method in natural language question-answering technology | |
CN114548090B (en) | Fast relation extraction method based on convolutional neural network and improved cascade labeling | |
CN115422945A (en) | Rumor detection method and system integrating emotion mining | |
CN113626537B (en) | Knowledge graph construction-oriented entity relation extraction method and system | |
CN114564568A (en) | Knowledge enhancement and context awareness based dialog state tracking method and system | |
CN114860908A (en) | Task-based dialogue state tracking method fusing slot association and semantic association | |
CN115169363A (en) | Knowledge-fused incremental coding dialogue emotion recognition method | |
CN114648005A (en) | Multi-fragment machine reading understanding method and device for multitask joint learning | |
CN111158640B (en) | One-to-many demand analysis and identification method based on deep learning | |
CN118114667B (en) | Named entity recognition model based on multitask learning and attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |