CN114443827A - Local information perception dialogue method and system based on pre-training language model - Google Patents

Local information perception dialogue method and system based on pre-training language model Download PDF

Info

Publication number
CN114443827A
CN114443827A CN202210109478.2A CN202210109478A CN114443827A CN 114443827 A CN114443827 A CN 114443827A CN 202210109478 A CN202210109478 A CN 202210109478A CN 114443827 A CN114443827 A CN 114443827A
Authority
CN
China
Prior art keywords
reply
training
context
language model
dialogue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210109478.2A
Other languages
Chinese (zh)
Inventor
陈羽中
陈泽林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202210109478.2A priority Critical patent/CN114443827A/en
Publication of CN114443827A publication Critical patent/CN114443827A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a local information perception dialogue method and a system based on a pre-training language model, wherein the method comprises the following steps: step A: collecting multi-turn dialog texts of a specific scene, labeling the category to which each multi-turn dialog reply belongs, and constructing a training set with positive and negative category labelsD(ii) a And B: use training setDTraining a local information perception deep learning network model PLIP based on a pre-training language model for selecting a reply corresponding to a given multi-turn dialogue context; and C: and inputting the multi-turn conversation context and the reply set into the trained local information perception deep learning network model PLIP to obtain the most appropriate reply corresponding to the multi-turn conversation context. The method and the system can effectively improve the accuracy of multi-turn dialogue reply selection.

Description

Local information perception dialogue method and system based on pre-training language model
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a local information perception dialogue method and system based on a pre-training language model.
Background
In recent years, with the development of machine learning and deep learning networks, human beings have made great progress in intelligent dialogue with computers, and dialogue systems gradually move into the visual field of people. The dialogue system has important research value for both the industrial and academic fields, and can be widely applied in many fields. The current dialogue system algorithm mainly comprises two types of generative dialogue and retrieval dialogue, wherein the generative dialogue can generate an answer word by word according to a question without depending on any corpus in the reasoning stage, the generated answer has the advantage of diversity, but the obtained answer is usually not strong in logic and sometimes falls into a trap of safe reply. The retrieval type dialogue is to enable an algorithm to find a most appropriate answer from a corpus according to a specific question to reply, extract information related to correct reply from the question, and deduce the appropriate answer according to the information. The search-type dialogue model is widely applied to a multi-turn dialogue system such as Microsoft ice, and is more reliable and has better practicability compared with a generation-type dialogue model.
Two reference models are constructed by Lowe et al for reply selection tasks in a search-type multi-turn dialog, which are respectively based on a Recurrent Neural Networks (RNNs) algorithm and a Long Short Term Memory network (LSTM) algorithm. In the process of coding the text, the two reference models memorize the text characteristics at the previous moment by means of the hidden layer unit of the RNN, time sequence information is introduced for the models, and the defect that a bag-of-words model is used in an early algorithm is overcome. However, in a plurality of rounds of conversations, the conversation history may be lengthy, not all contents are related to the reply, and the two reference models directly encode the whole conversation data, so that important information cannot be extracted from the conversation data in a targeted manner, and unnecessary noise is brought to the models. In order to extract important information from long texts, researchers propose to extract important information by matching context and reply, and decompose the reply selection task into three steps, the first step is to extract features from each utterance and reply by using an RNN-based algorithm, the second step is to match the extracted utterance features with the reply features, and the third step is to extract information required for calculating scores in a matching matrix by using a method such as CNN. However, semantic information that the RNN can extract is limited, RNN coding assumes that data are sequence-related, but topics in conversational data are dynamic, two distant paragraphs may also be highly related, RNN coding is difficult to accurately learn the relationship of the two paragraphs, and at the same time, RNN coding may also have a phenomenon of gradient disappearance when a paragraph is long in coding length, and cannot well acquire a distant dependency relationship. The limitations of RNN result in the possibility that the above method may have lost important information in the first step. The Transformer architecture proposed by Vaswani in 2017 can fully grasp global dependency information by means of a large amount of self-attention and interactive attention operations, and is not limited by sequence distance. Researchers rewrite and apply the encoder part of the transform to the encoding module of the model, so that the capability of the model for extracting information is enhanced, and meanwhile, the influence of a multi-head attention mechanism in the transform is utilized, the work constructs semantic information with various granularities by utilizing the multi-head attention in a matching stage, the feature representation of the model is enriched, and an obvious improvement effect is achieved. However, the above model has the following problems. First, global sequence information is under consideration. The model mainly uses methods such as RNN to code all statement characterizations after the matching is finished, and the statement characterizations can lose important information in the coding and matching stages. Second, the word vector representation used does not take into account the context. The model mainly uses static Word vectors such as Word2vec, the problem of polysemy of a Word is difficult to solve, semantic information cannot be accurately expressed according to different context, and therefore noise is brought in the encoding stage.
In 2018, Google proposed a well-known BERT (bidirectional Encoder retrieval from transformations) model. In the process of coding dialogue data, the BERT can do a depth self-attention mechanism aiming at word granularity in the global range, and can effectively solve the two problems that global sequence information is not considered sufficiently and the context is not considered in word vector representation by combining position embedded representation in the BERT. Therefore, the research on replying selection tasks in multiple rounds of conversations mainly turns to a method based on a pre-training language model, and the basic steps of the method are firstly to encode the whole section of conversation data by using the pre-training language model with a multi-layer Transformer encoder, and then input an output representation capable of representing the position of global information [ CLS ] in the output into a classification layer for prediction. On the basis, researchers can strengthen the adaptive capacity of the pre-training language model on the conversation task by enabling the pre-training language model to learn the post-training strategy of the domain data before fine adjustment. In addition, some research works try to generate additional dialogue texts or generate data enhancement methods such as embedding of dialogue people, some research works extract the output of the pre-training language model to perform further fine matching to filter noise irrelevant to reply, and the research works add new data or modules on the basis of the pre-training language model to achieve good effects. However, the method simply inputs the context and the reply concatenation into the pre-training language model, some potential features in the dialogue data cannot be learned, the number of parameters of the model is greatly increased due to the addition of additional modules, the improvement effect is limited, and the potential of the pre-training language model cannot be efficiently mined. Recently, some research works introduce multi-task learning into multiple rounds of dialogue reply selection tasks, and design multiple subtasks according to the characteristics of the continuity, consistency and the like of dialogue data. The subtasks share the parameters of the pre-training language model together with the main task, and an additional loss function is designed to purposefully optimize the pre-training language model. Compared with the prior framework, the method combining the pre-training language model and the multi-task learning shows a remarkable effect in the multi-round dialogue reply selection task, further excavates the potential of the pre-training language model, reduces the number of parameters of the model, can learn the potential features in the dialogue data more efficiently, and greatly improves the comprehension capability of the pre-training language model on the dialogue data.
In summary, although the model of multi-turn dialog reply selection based on the pre-trained language model has been developed, the following problems still exist: first, the subtasks of the design are not efficient enough. Designing multiple subtasks for the pre-trained language model means that the pre-trained language model needs to be fine-tuned multiple times during training, which is at the cost of a large amount of time cost and computing resources, and the training cost is higher as more subtasks are used. Meanwhile, the optimization objectives of the subtasks are very different from those of the main task, and for example, the subtasks that predict their positions after the utterance is deleted may cause noise to the main task. Second, the semantic comprehension capabilities of the pre-trained language model are not fully exploited. The method only uses the [ CLS ] label representing the global information to predict in the main task, and ignores a large amount of information output at other positions except the [ CLS ]. Meanwhile, the model can understand the dialogue semantics from different angles in a plurality of auxiliary tasks and learn precious dialogue information, but the main task only uses the output of the [ CLS ] position for simple classification, and does not extract the information learned in the auxiliary tasks in a targeted manner, so that the model cannot fully utilize the important semantic information learned in the auxiliary tasks.
Disclosure of Invention
The invention aims to provide a local information perception dialogue method and system based on a pre-training language model, which are beneficial to improving the accuracy of multi-turn dialogue reply selection.
In order to achieve the purpose, the invention adopts the technical scheme that: a local information perception dialogue method based on a pre-training language model comprises the following steps:
step A: collecting multi-turn dialog texts of a specific scene, labeling the category to which each multi-turn dialog reply belongs, and constructing a training set D with positive and negative category labels;
and B: training a local information perception deep learning network model PLIP based on a pre-training language model by using a training set D, and selecting a reply corresponding to a given multi-turn conversation context;
and C: and inputting the multi-turn conversation context and the reply set into the trained local information perception deep learning network model PLIP to obtain the most appropriate reply corresponding to the multi-turn conversation context.
Further, the step B specifically includes the following steps:
step B1: inputting each sample of the training set D into the deep learning network model in the form of a triplet (c, r, y), wherein c is { u ═ u1,u2,...,umDenotes a dialog context containing m utterances, with the t-th utterance in the context
Figure BDA0003494680180000032
Figure BDA0003494680180000033
Wherein ltIs the number of words in the t-th utterance, r is a candidate reply,
Figure BDA0003494680180000031
lrfor the number of words in the reply, y belongs to {0,1} as a sample label, y equals 1 to indicate that the candidate reply is a reasonable reply of the current context, and y equals 0 to indicate unreasonable;
the deep learning network model PLIP outputs an evaluation score capable of reflecting context and reply correlation degree after encoding a calculation triple, the deep learning network model learns context semantic representation combined with the context by utilizing a multi-layer attention mechanism of a pre-training language model, and simultaneously adopts a multi-task learning strategy, so that the learning of the pre-training language model on the local context of multi-round conversation is enhanced in an auxiliary task while optimizing a main task, namely a multi-round conversation reply selection task, the characterization vector understanding global information is promoted, the context and the reply correlation degree are learned, and the semantic understanding capability of the pre-training language model is fully developed;
step B2: in the auxiliary task part, a PLIP (platform learning network) model replies a prediction task by using a random sliding window to further strengthen the comprehension capability of a pre-training language model on the local context of the multi-turn conversation;
the method comprises the steps that a random sliding window replying prediction task samples dialogue context data of different initial positions in a multi-turn dialogue context to obtain dialogue segments, a pre-training language model is used for coding the dialogue segments, and the window replying is predicted, so that the pre-training language model can fully learn semantic information of local contexts;
step B3: in a multi-round dialogue reply selection task, a deep learning network model PLIP adopts a local information perception module to promote a pre-training language model to generate local semantic information, meanwhile, global information and the local semantic information are fused, rationality scores between multi-round dialogue contexts and replies are calculated, whether the current reply corresponds to the given multi-round dialogue context is evaluated, finally, according to a target loss function, the gradient of each parameter in the deep learning network model is calculated by using a back propagation method, and the parameter is updated by using a random gradient descent method;
step B4: and terminating the training of the deep learning network model when the iterative change of the loss value generated by the PLIP of the deep learning network model is smaller than a set threshold value or reaches the maximum iteration times.
Further, the step B1 specifically includes the following steps:
step B11: splicing the words and the replies in the conversation context to obtain an input x of the deep learning network model;
x={[CLS],u1,[EOT],u2,[EOT],…,[EOT],um,[SEP],r,[SEP]}
wherein, x is a long text obtained by splicing, [ SEP ] is a separator, [ CLS ] is a mark used for learning global features by a deep learning network model, and [ EOT ] is a special mark used for learning local information by the deep learning network model;
step B12: mapping x into a form of a number sequence through a dictionary of a pre-training language model, wherein each number is an id of a word in a word list, inputting the id sequence into an embedding layer in the pre-training language model, and mapping the id sequence into word embedding representation, position embedding representation and paragraph embedding representation according to three initialized embedding matrixes;
X=Embeddingword(x)+Embeddingpos(xpos)+Embeddingtype(xtype)
wherein, EmbeddingwordRepresenting the mapping of word-embedded representations, the input sequence being able to be mapped into a word vector, Embedding, according to a vocabularyposThe mapping mode of the expression position Embedding expression can be mapped to a corresponding position Embedding matrix, Embedding according to the position of each wordtypeThe mapping mode of the embedded representation of the representation paragraph can map the context and the reply to different vector spaces to obtain three word vectors, and then the three word vectors are added to obtain the word vector
Figure BDA0003494680180000054
Figure BDA0003494680180000055
l is the number of words in x, [ CLS]、[SEP]、[EOT]All are regarded as a word;
step B13: adding the word embedded representation, the sentence representation and the position representation of each word to obtain a fused embedded representation, and coding by using a multilayer Transformer network to obtain high-level semantic feature representation of a sequence;
the multi-layer Transformer network is formed by stacking a plurality of Transformer coding blocks; each Transformer coding block comprises a multi-head self-attention mechanism and a forward feedback layer, and a residual error connection and normalization layer is arranged behind each sublayer; x is firstly mapped into three vectors, namely a query vector Q, a key vector K and a value vector V, and the calculation formula is as follows:
Q=XWQ+bQ
K=XWK+bK
V=XWV+bV
wherein, WQ、WK、WV、bQ、bK、bVRepresenting a training parameter;
step B14: sending Q, K, V vectors into a multi-head self-attention mechanism, dividing h sub-vectors on the word vector dimension d of the h sub-vectors, wherein the dimension of each sub-vector is d/h, respectively sending the h sub-vectors into the self-attention mechanism for training, and finally splicing the h sub-vectors to obtain a d-dimension output vector C; in order to prevent overfitting, make the vector more integral and accelerate network convergence, residual connection and normalization are added to the multi-head self-attention mechanism sublayer to obtain a vector T, and the calculation formula is as follows:
Figure BDA0003494680180000051
Figure BDA0003494680180000052
C=Concat(head1,head2,...,headh)WC+bC
T=LayerNorm(X+C)
wherein the headiRepresents the self-attention score of the ith sub-vector,
Figure BDA0003494680180000053
WC,bCrepresenting a training parameter, Concat representing splicing operation, and LayerNorm being layer normalization transformation;
step B15: sending the vector T into a fully-connected forward feedback sublayer, performing two linear transformations on the T by the layer to obtain comprehensive characteristics FFN of the sequence, performing residual error connection on the T and the FFN, and performing layer normalization processing to obtain final high-level characteristics H of the sequence, wherein a computer formula is as follows:
FFN=(WFT+bF)WN+bN
H=LayerNorm(T+FFN)
wherein, WF、WN、bF、bNRepresenting the training parameters.
Further, the step B2 specifically includes the following steps:
step B21: in the auxiliary task random sliding window reply prediction, the length and the position of a sliding window are set to be random by a model, a large amount of dialogue local context data falling in the sliding window are sampled from the dialogue context, and a special label [ EOT ] is inserted behind each utterance of the dialogue local context data, so that the following formula is shown:
Figure BDA0003494680180000061
wherein x 'is the input of the subtask, different from the main task, x' only retains the information inside the window, other information is replaced by [ PAD ], i is the initial position of the sliding window, w represents the size of the current window, m represents the number of utterances of the current context, and k is a hyper-parameter, which represents the size of the minimum window;
step B22: the dialogue window data is encoded using the pre-training language model BERT, the formula is as follows:
E=BERT(x′)
step B23: the vector E obtained in the step B22 contains all semantic representations of the dialog segments coded by the pre-training language model BERT, and the semantic representation which can represent the current dialog segment most is further selected from the vector E to optimize the auxiliary task; in order not to destroy [ CLS ] capable of representing global information in pre-training language model]Indicating that the model only selects the [ EOT ] with the nearest distance window reply in the output of the pre-training language model]Represents E[EOT]The vector is used as a final characterization vector of a random sliding window reply prediction task; the auxiliary task expresses reasonability of window data, and [ EOT ] in BERT]The tag learns the information of different segments and different moments in the conversation and enriches the EOT]The ability to understand local area information;
step B24: obtaining the final characterization vector E[EOT]Then, the score is calculated by inputting the score into the classification layer, and the calculation formula is as follows:
g(wc,wr)=σ(Ww TE[EOT]+bw)
wherein, wc、wrRepresenting context and reply within a sliding window, WwIs a trainable parameter in the prediction layer, sigma (·) represents sigmoid activation function;
step B25: the random sliding window reply prediction task is optimized by adopting a gradient descending mode aiming at an objective function, the objective function adopts a cross entropy loss function to evaluate the difference between the current mark and the real dialogue window mark, and the specific formula is as follows:
Figure BDA0003494680180000071
where D' represents a window data set.
Further, the step B3 specifically includes the following steps:
step B31: the local information awareness module embeds a special tag [ EOT ] behind each sentence in the dialog context, as shown in the following formula:
x={[CLS],u1,[EOT],u2,[EOT],…,[EOT],um,[SEP],r,[SEP]}
under the combined action of a pre-training language model deep attention mechanism and position embedding, the special label [ EOT ] of each position can learn the interactive information with the surrounding text at the specific position; meanwhile, in the process of randomly sliding the window to reply the prediction task optimization, the last [ EOT ] label in the window is used for establishing a classification task, and the capability of replying the identification window is gradually learned; thus, the representation of the [ EOT ] tag gradually learns the correct representation of the sentence and focuses more on the text of the local region;
step B32: in the feature fusion stage, the local information perception module selects n local semantic representations which are closest to the response from the output of the pre-training language model as local information with multiple granularities, and simultaneously, the local information is aggregated into a whole in a splicing mode, wherein the specific formula is as follows:
Figure BDA0003494680180000072
wherein, l represents the entry closest to the reply, and n is a hyper-parameter used for representing the number of [ EOT ] representations to be taken out;
step B33: the local information perception module integrally fuses local information and global information to obtain a final characterization vector of a main task, and the aggregation process is as follows:
Figure BDA0003494680180000073
step B34: inputting the aggregated characterization vectors into a classification layer to calculate the rationality score between the current multi-turn conversation context and the reply, wherein the formula is as follows:
g(c,r)=σ(WTEensemble+b)
where W is a trainable parameter, σ (-) represents a sigmoid activation function, b is a bias term for the current classification level;
step B35: the PLIP model updates parameters in the learning model in a gradient descending mode, and meanwhile, cross entropy is adopted as a loss function for a multi-round dialogue reply selection task, and the specific formula is as follows:
Figure BDA0003494680180000074
and combining the optimization target of the auxiliary task, wherein the final loss function of the model is as follows:
Loss=Lossmain+αLosswindow
wherein alpha is a hyper-parameter used for controlling the auxiliary task to recover the influence of the prediction task on the model by the random sliding window.
The invention also provides a local information perception dialogue system adopting the method, which comprises the following steps:
the data collection module is used for collecting multi-round conversation samples in a specific field, labeling answer positive and negative labels corresponding to each question in the multi-round conversation data, and constructing a multi-round conversation reply selection training set D with the positive and negative labels;
the pre-training language model coding module is mainly composed of an embedded layer and a multi-layer multi-head attention mechanism; sending each training sample in the form of a triplet of the training set D into a pre-training language model BERT, and learning to combine context semantic representation by utilizing a multi-layer attention mechanism of the pre-training language model; meanwhile, the model fully excavates the semantic understanding ability of the pre-training language model in a multi-task learning mode;
the auxiliary task module is used for exporting parameters of the pre-training language model BERT, and replying a prediction task by using a random sliding window to further enhance the comprehension capability of the pre-training language model on local dialogue information; the random sliding window replying prediction task samples window data with different positions and sizes in a multi-turn conversation context, a derived pre-training language model is used for coding a conversation window, and the pre-training language model is made to fully learn local language characteristics of different conversation stages and conversation lengths by utilizing the reply of a newly added special label [ EOT ] prediction window;
the local information perception module is used for promoting a pre-training language model BERT to generate multi-granularity local semantic information by adopting the local information perception module in a multi-round dialogue reply selection task, meanwhile, global information and the local semantic information are fused to perform classification score calculation, and whether the current reply corresponds to a given multi-round dialogue context is evaluated; finally, calculating the gradient of each parameter in the deep learning network model by using a back propagation method according to the target loss function, and updating the parameters by using a random gradient descent method; and
and the network training module is used for terminating the training of the deep learning network model when the loss value iteration change generated by the deep learning network model is smaller than a set threshold value and is not reduced or reaches the maximum iteration times.
Compared with the prior art, the invention has the following beneficial effects: the method and the system adopt a multi-task learning strategy, strengthen the learning of the pre-training language model to the multi-round dialogue local area in an auxiliary task while optimizing a main task, namely a multi-round dialogue reply selection task, learning context and reply correlation degree, and fully excavate the semantic understanding capability of the pre-training language model, thereby fusing global information and local semantic information and obtaining the most appropriate reply corresponding to the multi-round dialogue context. Therefore, the invention can effectively improve the accuracy of multi-turn dialogue reply selection and has strong practicability and wide application prospect.
Drawings
FIG. 1 is a flow chart of a method implementation of an embodiment of the present invention;
FIG. 2 is a diagram of a local information-aware deep learning network model architecture based on a pre-trained language model according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating random sliding window reply prediction in accordance with an embodiment of the present invention;
FIG. 4 is a diagram of a random sliding window recovery prediction structure according to an embodiment of the present invention;
FIG. 5 is a block diagram of a local information awareness module according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides a local information-aware dialogue method based on a pre-trained language model, which includes the following steps:
step A: and collecting multi-turn dialog texts of a specific scene, labeling the category to which each multi-turn dialog reply belongs, and constructing a training set D with positive and negative category labels.
And B: and training a local information perception deep learning network model PLIP based on the pre-training language model by using a training set D, and selecting a reply corresponding to the given multi-turn conversation context.
The step B specifically comprises the following steps:
step B1: inputting each sample of the training set D into the deep learning network model in the form of a triplet (c, r, y), wherein c is { u ═ u1,u2,...,umDenotes a dialog context containing m utterances, with the t-th utterance in the context
Figure BDA0003494680180000092
Figure BDA0003494680180000093
Wherein ltIs the number of words in the t-th utterance, r is a candidate reply,
Figure BDA0003494680180000091
lrfor the number of words in the reply, y ∈ {0,1} is a sample label, where y ═ 1 indicates that the candidate reply is a reasonable reply for the current context, and y ═ 0 indicates unreasonable.
The deep learning network model PLIP outputs an evaluation score capable of reflecting context and reply correlation degree after encoding and calculating triples, the deep learning network model learns context semantic representation combined with the context by utilizing a multi-layer attention mechanism of a pre-training language model, and meanwhile, a multi-task learning strategy is adopted, so that the learning of the pre-training language model on the local context of multi-round conversation is enhanced in an auxiliary task while a main task, namely a multi-round conversation reply selection task, is optimized, the representation vector understanding of global information is promoted, the context and the reply correlation degree are learned, and the semantic understanding capability of the pre-training language model is fully mined. The architecture of the deep learning network model is shown in fig. 2.
The step B1 specifically includes the following steps:
step B11: splicing the words and the replies in the conversation context to obtain an input x of the deep learning network model;
x={[CLS],u1,[EOT],u2,[EOT],…,[EOT],um,[SEP],r,[SEP]}
wherein, x is a long text obtained by splicing, [ SEP ] is a separator, [ CLS ] is a mark used by the deep learning network model for learning global features, and [ EOT ] is a special mark used by the deep learning network model for learning local information.
Step B12: mapping x into a form of a number sequence through a dictionary of a pre-training language model, wherein each number is an id of a word in a word list, inputting the id sequence into an embedding layer in the pre-training language model, and mapping the id sequence into word embedding representation, position embedding representation and paragraph embedding representation according to three initialized embedding matrixes;
X=Embeddingword(x)+Embeddingpos(xpos)+Embeddingtype(xtype)
wherein, EmbeddingwordRepresenting the mapping of word-embedded representations, the input sequence being able to be mapped into a word vector, Embedding, according to a vocabularyposThe mapping mode of the expression position Embedding expression can be mapped to a corresponding position Embedding matrix, Embedding according to the position of each wordtypeThe mapping mode of the embedded representation of the representation paragraph can map the context and the reply to different vector spaces to obtain three word vectors, and then the three word vectors are added to obtain the word vector
Figure BDA0003494680180000101
Figure BDA0003494680180000102
l is the number of words in x, [ CLS]、[SEP]、[EOT]Are all regarded as a word.
Step B13: adding the word embedded representation, the sentence representation and the position representation of each word to obtain a fused embedded representation, and coding by using a multilayer Transformer network to obtain the high-level semantic feature representation of the sequence.
The multi-layer Transformer network is formed by stacking a plurality of Transformer coding blocks; each Transformer coding block comprises a multi-head self-attention mechanism and a forward feedback layer, and a residual error connection and normalization layer is arranged behind each sublayer; x is firstly mapped into three vectors, namely a query vector Q, a key vector K and a value vector V, and the calculation formula is as follows:
Q=XWQ+bQ
K=XWK+bK
V=XWV+bV
wherein, WQ、WK、WV、bQ、bK、bVRepresenting the training parameters.
Step B14: sending Q, K, V vectors into a multi-head self-attention machine system, dividing h sub-vectors on the word vector dimension d of the vectors, wherein the dimension of each sub-vector is d/h, respectively sending the sub-vectors into the self-attention machine system for training, and finally splicing the h self-attention sub-vectors to obtain a d-dimensional output vector C again; in order to prevent overfitting, make the vector more integral and accelerate network convergence, residual connection and normalization are added to the multi-head self-attention mechanism sublayer to obtain a vector T, and the calculation formula is as follows:
Figure BDA0003494680180000111
Figure BDA0003494680180000112
C=Concat(head1,head2,...,headh)WC+bC
T=LayerNorm(X+C)
wherein the headiRepresents the self-attention score of the ith sub-vector,
Figure BDA0003494680180000113
WC,bCrepresents the training parameters, Concat represents the splicing operation, and LayerNorm is the layer normalization transformation.
Step B15: sending the vector T into a fully-connected forward feedback sublayer, performing two linear transformations on the T by the layer to obtain comprehensive characteristics FFN of the sequence, performing residual error connection on the T and the FFN, and performing layer normalization processing to obtain final high-level characteristics H of the sequence, wherein a computer formula is as follows:
FFN=(WFT+bF)WN+bN
H=LayerNorm(T+FFN)
wherein, WF、WN、bF、bNRepresenting the training parameters.
Step B2: in the auxiliary task part, the PLIP deep learning network model uses a random sliding window to reply to a prediction task to further strengthen the comprehensibility of the pre-training language model on the local context of the multi-turn dialogue.
The method comprises the steps that a random sliding window replying prediction task samples dialogue context data of different initial positions in a multi-turn dialogue context to obtain dialogue segments, a pre-training language model is used for coding the dialogue segments, and the window replying is predicted, so that the pre-training language model can fully learn semantic information of local contexts. The process and structure of the random sliding window reply prediction are shown in fig. 3 and 4.
The step B2 specifically includes the following steps:
step B21: in the auxiliary task random sliding window reply prediction, the length and the position of a sliding window are set to be random by a model, a large amount of dialogue local context data falling in the sliding window are sampled from the dialogue context, and a special label [ EOT ] is inserted behind each utterance of the dialogue local context data, so that the following formula is shown:
Figure BDA0003494680180000114
wherein x 'is the input of the subtask, different from the main task, x' only retains the information inside the window, other information is replaced by [ PAD ], i is the starting position of the sliding window, w represents the size of the current window, m represents the number of utterances of the current context, and κ is a hyper-parameter, which represents the size of the minimum window.
Step B22: the dialog window data is encoded using the pre-training language model BERT, the formula being as follows:
E=BERT(x′)
step B23: the vector E obtained in the step B22 contains all semantic representations of the dialog segments coded by the pre-training language model BERT, and the semantic representation which can represent the current dialog segment most is further selected from the vector E to optimize the auxiliary task; in order not to destroy [ CLS ] capable of representing global information in pre-training language model]Indicating that the model only selects the [ EOT ] with the nearest distance window reply in the output of the pre-training language model]Represents E[EOT]The vector is used as a final characterization vector of a random sliding window reply prediction task; the auxiliary task expresses reasonability of window data, and [ EOT ] in BERT]The tag learns the information of different segments and different moments in the conversation and enriches the EOT]Indicates the ability to understand the local area information.
Step B24: obtaining the final characterization vector E[EOT]Then, the score is calculated by inputting the score into the classification layer, and the calculation formula is as follows:
g(wc,wr)=σ(Ww TE[EOT]+bw)
wherein, wc、wrRepresenting context and reply within a sliding window, WwIs a trainable parameter in the prediction layer and σ (-) represents the sigmoid activation function.
Step B25: the random sliding window reply prediction task is optimized by adopting a gradient descending mode aiming at an objective function, the objective function adopts a cross entropy loss function to evaluate the difference between the current mark and the real dialogue window mark, and the specific formula is as follows:
Figure BDA0003494680180000121
where D' represents a window data set.
Step B3: in the multi-round dialogue reply selection task, a local information perception module shown in figure 5 is adopted by a deep learning network model PLIP to promote a pre-training language model to generate local semantic information, global information and the local semantic information are fused at the same time, the rationality score between the multi-round dialogue context and the reply is calculated, whether the current reply corresponds to the given multi-round dialogue context is evaluated, finally, the gradient of each parameter in the deep learning network model is calculated by using a back propagation method according to a target loss function, and the parameter is updated by using a random gradient descent method.
The step B3 specifically includes the following steps:
step B31: the local information awareness module embeds a special tag [ EOT ] behind each sentence in the dialog context, as shown in the following formula:
x={[CLS],u1,[EOT],u2,[EOT],…,[EOT],um,[SEP],r,[SEP]}
under the combined action of a pre-training language model deep attention mechanism and position embedding, the special label [ EOT ] of each position can learn the interactive information with the surrounding text at the specific position; meanwhile, in the process of randomly sliding the window to reply the prediction task optimization, the last [ EOT ] label in the window is used for establishing a classification task, and the capability of replying the identification window is gradually learned; thus, the representation of the [ EOT ] tag gradually learns the correct representation of the sentence and focuses more on the text of the local region.
Step B32: in the feature fusion stage, the local information perception module selects n local semantic representations which are closest to the response from the output of the pre-training language model as local information with multiple granularities, and simultaneously, the local information is aggregated into a whole in a splicing mode, wherein the specific formula is as follows:
Figure BDA0003494680180000131
where l represents the entry closest to the reply, and n is a hyperparameter representing the number of fetch [ EOT ] tokens.
Step B33: the local information perception module integrally fuses local information and global information to obtain a final characterization vector of a main task, and the aggregation process is as follows:
Figure BDA0003494680180000132
step B34: inputting the aggregated characterization vectors into a classification layer to calculate the rationality score between the current multi-turn conversation context and the reply, wherein the formula is as follows:
g(c,r)=σ(WTEensemble+b)
where W is a trainable parameter, σ (-) stands for sigmoid activation function, and b is a bias term for the current classification level.
Step B35: the PLIP model updates parameters in the learning model in a gradient descending mode, and meanwhile, cross entropy is adopted as a loss function for a multi-round dialogue reply selection task, and the specific formula is as follows:
Figure BDA0003494680180000133
combining the optimization objective of the auxiliary task, the final loss function of the model is:
Loss=Lossmain+αLosswindow
wherein alpha is a hyper-parameter used for controlling the auxiliary task to recover the influence of the prediction task on the model by the random sliding window.
Step B4: and terminating the training of the deep learning network model when the iterative change of the loss value generated by the PLIP of the deep learning network model is smaller than a set threshold value or reaches the maximum iteration times.
And C: and inputting the multi-turn conversation context and the reply set into the trained local information perception deep learning network model PLIP to obtain the most appropriate reply corresponding to the multi-turn conversation context.
The embodiment also provides a local information perception dialogue method and a system adopting the method, and the method comprises a data collection module, a pre-training language model coding module, an auxiliary task module, a local information perception module and a network training module.
The data collection module is used for collecting multi-round conversation samples in a specific field, labeling answer positive and negative labels corresponding to each question in the multi-round conversation data, and constructing a multi-round conversation reply selection training set D with the positive and negative labels.
The pre-training language model coding module comprises a pre-training language model, and the pre-training language model mainly comprises an embedded layer and a multi-layer multi-head attention mechanism; sending each training sample in the form of a triplet of the training set D into a pre-training language model BERT, and learning to combine context semantic representation by utilizing a multi-layer attention mechanism of the pre-training language model; meanwhile, the model fully excavates the semantic understanding ability of the pre-training language model in a multi-task learning mode.
In an auxiliary task module, parameters of a pre-training language model BERT are exported by the model, and the comprehension capability of the pre-training language model on conversation local information is further enhanced by replying a prediction task by using a random sliding window; the random sliding window replying prediction task samples window data with different positions and sizes in a multi-turn conversation context, encodes a conversation window by using a derived pre-training language model, and makes the pre-training language model fully learn local language characteristics of different conversation stages and conversation lengths by using the replying of a newly-added special label [ EOT ] prediction window.
In the multi-round dialog reply selection task, the model adopts a local information perception module to promote a pre-training language model BERT to generate multi-granularity local semantic information, meanwhile, global information and the local semantic information are fused to perform classification score calculation, and whether the current reply corresponds to a given multi-round dialog context is evaluated; and finally, calculating the gradient of each parameter in the deep learning network model by using a back propagation method according to the target loss function, and updating the parameter by using a random gradient descent method.
And the network training module is used for training the network model, and when the iterative change of the loss value generated by the deep learning network model is smaller than a set threshold value and is not reduced or reaches the maximum iterative times, the training of the deep learning network model is terminated.
The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims (6)

1. A local information perception dialogue method based on a pre-training language model is characterized by comprising the following steps:
step A: collecting multi-turn dialog texts of a specific scene, labeling the category to which each multi-turn dialog reply belongs, and constructing a training set D with positive and negative category labels;
and B, step B: training a local information perception deep learning network model PLIP based on a pre-training language model by using a training set D, and selecting a reply corresponding to a given multi-turn conversation context;
and C: and inputting the multi-turn conversation context and the reply set into the trained local information perception deep learning network model PLIP to obtain the most appropriate reply corresponding to the multi-turn conversation context.
2. The local information perception dialogue method and system based on the pre-trained language model according to claim 1, wherein the step B specifically comprises the following steps:
step B1: inputting each sample of the training set D into the deep learning network model in the form of a triplet (c, r, y), wherein c is { u ═ u1,u2,...,umDenotes a dialog context containing m utterances, with the t-th utterance in the context
Figure FDA0003494680170000011
Figure FDA0003494680170000012
Wherein ltIs the number of words in the t-th utterance, r is a candidate reply,
Figure FDA0003494680170000013
lrfor the number of words in the reply, y belongs to {0,1} as a sample label, y equals 1 to indicate that the candidate reply is a reasonable reply of the current context, and y equals 0 to indicate unreasonable;
the deep learning network model PLIP outputs an evaluation score capable of reflecting context and reply correlation degree after encoding a calculation triple, the deep learning network model learns context semantic representation combined with the context by utilizing a multi-layer attention mechanism of a pre-training language model, and simultaneously adopts a multi-task learning strategy, so that the learning of the pre-training language model on the local context of multi-round conversation is enhanced in an auxiliary task while optimizing a main task, namely a multi-round conversation reply selection task, the characterization vector understanding global information is promoted, the context and the reply correlation degree are learned, and the semantic understanding capability of the pre-training language model is fully developed;
step B2: in the auxiliary task part, a PLIP (platform learning network) model replies a prediction task by using a random sliding window to further strengthen the comprehension capability of a pre-training language model on the local context of the multi-turn conversation;
the method comprises the steps that a random sliding window replying prediction task samples dialogue context data of different initial positions in a multi-turn dialogue context to obtain dialogue segments, a pre-training language model is used for coding the dialogue segments, and the window replying is predicted, so that the pre-training language model can fully learn semantic information of local contexts;
step B3: in a multi-round dialogue reply selection task, a deep learning network model PLIP adopts a local information perception module to promote a pre-training language model to generate local semantic information, meanwhile, global information and the local semantic information are fused, rationality scores between multi-round dialogue contexts and replies are calculated, whether the current reply corresponds to the given multi-round dialogue context is evaluated, finally, according to a target loss function, the gradient of each parameter in the deep learning network model is calculated by using a back propagation method, and the parameter is updated by using a random gradient descent method;
step B4: and terminating the training of the deep learning network model when the iterative change of the loss value generated by the PLIP of the deep learning network model is smaller than a set threshold value or reaches the maximum iteration times.
3. The local information perception dialogue method based on the pre-trained language model of claim 2, wherein the step B1 specifically comprises the following steps:
step B11: splicing the words and the replies in the conversation context to obtain an input x of the deep learning network model;
x={[CLS],u1,[EOT],u2,[EOT],…,[EOT],um,[SEP],r,[SEP]}
wherein, x is a long text obtained by splicing, [ SEP ] is a separator, [ CLS ] is a mark used for learning global features of a deep learning network model, and [ EOT ] is a special mark used for learning local information of the deep learning network model;
step B12: mapping x into a form of a digital sequence through a dictionary of a pre-training language model, wherein each digit is an id of a word in a word list, inputting the id sequence into an embedding layer in the pre-training language model, and mapping the id sequence into word embedding representation, position embedding representation and paragraph embedding representation according to three initialized embedding matrixes;
X=Embeddingword(x)+Embeddingpos(xpos)+Embeddingtype(xtype)
wherein, EmbeddingwordRepresenting the mapping of word-embedded representations, the input sequence being able to be mapped into a word vector, Embedding, according to a vocabularyposThe mapping mode of the expression position Embedding expression can be mapped to a corresponding position Embedding matrix, Embedding according to the position of each wordtypeThe mapping mode of the embedded representation of the representation paragraph can map the context and the reply to different vector spaces to obtain three word vectors, and then the three word vectors are added to obtain the word vector
Figure FDA0003494680170000021
Figure FDA0003494680170000022
l is the number of words in x, [ CLS]、[SEP]、[EOT]All are regarded as a word;
step B13: adding the word embedded representation, the sentence representation and the position representation of each word to obtain a fused embedded representation, and coding by using a multilayer Transformer network to obtain high-level semantic feature representation of a sequence;
the multi-layer Transformer network is formed by stacking a plurality of Transformer coding blocks; each Transformer coding block comprises a multi-head self-attention mechanism and a forward feedback layer, and a residual error connection and normalization layer is arranged behind each sublayer; x is firstly mapped into three vectors, namely a query vector Q, a key vector K and a value vector V, and the calculation formula is as follows:
Q=XWQ+bQ
K=XWK+bK
V=XWV+bV
wherein, WQ、WK、WV、bQ、bK、bVRepresenting a training parameter;
step B14: sending Q, K, V vectors into a multi-head self-attention machine system, dividing h sub-vectors on the word vector dimension d of the vectors, wherein the dimension of each sub-vector is d/h, respectively sending the sub-vectors into the self-attention machine system for training, and finally splicing the h self-attention sub-vectors to obtain a d-dimensional output vector C again; in order to prevent overfitting, make the vector more integral and accelerate network convergence, residual connection and normalization are added to the multi-head self-attention mechanism sublayer to obtain a vector T, and the calculation formula is as follows:
Figure FDA0003494680170000031
Figure FDA0003494680170000032
C=Concat(head1,head2,...,headh)WC+bC
T=LayerNorm(X+C)
wherein the headiRepresents the self-attention score of the ith sub-vector,
Figure FDA0003494680170000033
WC,bCrepresenting training parameters, Concat representing splicing operation, and LayerNorm representing layer normalization transformation;
step B15: sending the vector T into a fully-connected forward feedback sublayer, performing two linear transformations on the T by the layer to obtain comprehensive characteristics FFN of the sequence, performing residual error connection on the T and the FFN, and performing layer normalization processing to obtain final high-level characteristics H of the sequence, wherein a computer formula is as follows:
FFN=(WFT+bF)WN+bN
H=LayerNorm(T+FFN)
wherein, WF、WN、bF、bNRepresenting the training parameters.
4. The local information perception dialogue method based on the pre-trained language model of claim 3, wherein the step B2 specifically comprises the following steps:
step B21: in the auxiliary task random sliding window reply prediction, the length and the position of a sliding window are set to be random by a model, a large amount of dialogue local context data falling in the sliding window are sampled from the dialogue context, and a special label [ EOT ] is inserted behind each utterance of the dialogue local context data, so that the following formula is shown:
Figure FDA0003494680170000034
wherein x 'is the input of the subtask, different from the main task, x' only retains the information inside the window, other information is replaced by [ PAD ], i is the initial position of the sliding window, w represents the size of the current window, m represents the number of utterances of the current context, and k is a hyper-parameter, which represents the size of the minimum window;
step B22: the dialogue window data is encoded using the pre-training language model BERT, the formula is as follows:
E=BERT(x′)
step B23: the vector E obtained in the step B22 contains all semantic representations of the dialog segments coded by the pre-training language model BERT, and the most representative current dialog segment is further selected from the vector ETo optimize the auxiliary task; in order not to destroy [ CLS ] capable of representing global information in pre-training language model]Indicating that the model only selects the [ EOT ] with the nearest distance window reply in the output of the pre-training language model]Represents E[EOT]The vector is used as a final characterization vector of a random sliding window reply prediction task; the auxiliary task expresses reasonability of window data, and [ EOT ] in BERT]The tag learns the information of different segments and different moments in the conversation and enriches the EOT]The ability to understand local area information;
step B24: obtaining the final characterization vector E[EOT]Then, the score is calculated by inputting the score into the classification layer, and the calculation formula is as follows:
g(wc,wr)=σ(Ww TE[EOT]+bw)
wherein, wc、wrRepresenting context and reply within a sliding window, WwIs a trainable parameter in the prediction layer, sigma (·) represents sigmoid activation function;
step B25: the random sliding window reply prediction task is optimized by adopting a gradient descending mode aiming at an objective function, the objective function adopts a cross entropy loss function to evaluate the difference between the current mark and the real dialogue window mark, and the specific formula is as follows:
Figure FDA0003494680170000041
where D' represents a window data set.
5. The local information perception dialogue method according to claim 4, wherein the step B3 specifically comprises the following steps:
step B31: the local information awareness module embeds a special tag [ EOT ] behind each sentence in the dialog context, as shown in the following formula:
x={[CLS],u1,[EOT],u2,[EOT],…,[EOT],um,[SEP],r,[SEP]}
under the combined action of a pre-training language model deep attention mechanism and position embedding, the special label [ EOT ] of each position can learn the interactive information with the surrounding text at the specific position; meanwhile, in the process of randomly sliding the window to reply the prediction task optimization, the last [ EOT ] label in the window is used for establishing a classification task, and the capability of replying the identification window is gradually learned; thus, the representation of the [ EOT ] tag gradually learns the correct representation of the sentence and focuses more on the text of the local region;
step B32: in the feature fusion stage, the local information perception module selects n local semantic representations which are closest to the response from the output of the pre-training language model as local information with multiple granularities, and simultaneously, the local information is aggregated into a whole in a splicing mode, wherein the specific formula is as follows:
Figure FDA0003494680170000051
wherein, l represents the entry closest to the reply, and n is a hyper-parameter used for representing the number of [ EOT ] representations to be taken out;
step B33: the local information perception module integrally fuses local information and global information to obtain a final characterization vector of a main task, and the aggregation process is as follows:
Figure FDA0003494680170000052
step B34: inputting the aggregated characterization vectors into a classification layer to calculate the rationality score between the current multi-turn conversation context and the reply, wherein the formula is as follows:
g(c,r)=σ(WTEensemble+b)
where W is a trainable parameter, σ (-) represents a sigmoid activation function, b is a bias term for the current classification level;
step B35: the PLIP model updates parameters in the learning model in a gradient descending mode, and meanwhile, cross entropy is adopted as a loss function for a multi-round dialogue reply selection task, and the specific formula is as follows:
Figure FDA0003494680170000053
and combining the optimization target of the auxiliary task, wherein the final loss function of the model is as follows:
Loss=Lossmain+αLosswindow
wherein alpha is a hyper-parameter used for controlling the auxiliary task to recover the influence of the prediction task on the model by the random sliding window.
6. A local information-aware dialog system employing the method of any one of claims 1 to 5, comprising:
the data collection module is used for collecting multi-round conversation samples in a specific field, labeling answer positive and negative labels corresponding to each question in the multi-round conversation data, and constructing a multi-round conversation reply selection training set D with the positive and negative labels;
the pre-training language model coding module is mainly composed of an embedded layer and a multi-layer multi-head attention mechanism; sending each training sample in the form of a triplet of the training set D into a pre-training language model BERT, and learning to combine context semantic representation by utilizing a multi-layer attention mechanism of the pre-training language model; meanwhile, the model fully excavates the semantic understanding ability of the pre-training language model in a multi-task learning mode;
the auxiliary task module is used for exporting parameters of the pre-training language model BERT, and replying a prediction task by using a random sliding window to further enhance the comprehension capability of the pre-training language model on the local information of the conversation; the random sliding window replying prediction task samples window data with different positions and sizes in a multi-turn conversation context, a derived pre-training language model is used for coding a conversation window, and the pre-training language model is made to fully learn local language characteristics of different conversation stages and conversation lengths by utilizing the reply of a newly added special label [ EOT ] prediction window;
the local information perception module is used for promoting a pre-training language model BERT to generate multi-granularity local semantic information, calculating the rationality score between multi-round conversation context and reply and evaluating whether the current reply corresponds to the given multi-round conversation context or not in the multi-round conversation reply selection task; finally, calculating the gradient of each parameter in the deep learning network model by using a back propagation method according to the target loss function, and updating the parameters by using a random gradient descent method; and
and the network training module terminates the training of the deep learning network model when the loss value iteration change generated by the deep learning network model is smaller than a set threshold value or reaches the maximum iteration times.
CN202210109478.2A 2022-01-28 2022-01-28 Local information perception dialogue method and system based on pre-training language model Pending CN114443827A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210109478.2A CN114443827A (en) 2022-01-28 2022-01-28 Local information perception dialogue method and system based on pre-training language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210109478.2A CN114443827A (en) 2022-01-28 2022-01-28 Local information perception dialogue method and system based on pre-training language model

Publications (1)

Publication Number Publication Date
CN114443827A true CN114443827A (en) 2022-05-06

Family

ID=81370746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210109478.2A Pending CN114443827A (en) 2022-01-28 2022-01-28 Local information perception dialogue method and system based on pre-training language model

Country Status (1)

Country Link
CN (1) CN114443827A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115048944A (en) * 2022-08-16 2022-09-13 之江实验室 Open domain dialogue reply method and system based on theme enhancement
CN115081437A (en) * 2022-07-20 2022-09-20 中国电子科技集团公司第三十研究所 Machine-generated text detection method and system based on linguistic feature contrast learning
CN115129824A (en) * 2022-08-15 2022-09-30 山东交通学院 Search type multi-turn dialogue method and system
CN115129838A (en) * 2022-06-08 2022-09-30 阿里巴巴(中国)有限公司 Model training method, dialogue data processing method, device, equipment and storage medium
CN115310429A (en) * 2022-08-05 2022-11-08 厦门靠谱云股份有限公司 Data compression and high-performance calculation method in multi-turn listening dialogue model
CN115329062A (en) * 2022-10-17 2022-11-11 中邮消费金融有限公司 Dialogue model training method under low-data scene and computer equipment
CN115617971A (en) * 2022-11-14 2023-01-17 湖南君安科技有限公司 Dialog text generation method based on ALBERT-Coref model
CN116932703A (en) * 2023-09-19 2023-10-24 苏州元脑智能科技有限公司 User controllable content generation method, device, equipment and medium
CN116957047A (en) * 2023-09-19 2023-10-27 苏州元脑智能科技有限公司 Sampling network updating method, device, equipment and medium
CN117875434A (en) * 2024-03-13 2024-04-12 中国科学技术大学 Financial large model length extrapolation method for expanding input context length
CN118227796A (en) * 2024-05-23 2024-06-21 国家计算机网络与信息安全管理中心 Automatic classification and threshold optimization method and system for long text specific content

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274375A (en) * 2020-01-20 2020-06-12 福州大学 Multi-turn dialogue method and system based on bidirectional GRU network
WO2021077974A1 (en) * 2019-10-24 2021-04-29 西北工业大学 Personalized dialogue content generating method
CN112818105A (en) * 2021-02-05 2021-05-18 江苏实达迪美数据处理有限公司 Multi-turn dialogue method and system fusing context information
CN113806508A (en) * 2021-09-17 2021-12-17 平安普惠企业管理有限公司 Multi-turn dialogue method and device based on artificial intelligence and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021077974A1 (en) * 2019-10-24 2021-04-29 西北工业大学 Personalized dialogue content generating method
CN111274375A (en) * 2020-01-20 2020-06-12 福州大学 Multi-turn dialogue method and system based on bidirectional GRU network
CN112818105A (en) * 2021-02-05 2021-05-18 江苏实达迪美数据处理有限公司 Multi-turn dialogue method and system fusing context information
CN113806508A (en) * 2021-09-17 2021-12-17 平安普惠企业管理有限公司 Multi-turn dialogue method and device based on artificial intelligence and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZELIN CHEN 等: ""Improving BERT with local context comprehension for multi-turn response selection in retrieval-based dialogue systems"", 《COMPUTER SPEECH&LANGUAGE》, vol. 82, 31 July 2023 (2023-07-31), pages 1 - 15 *
廖彬 等: ""一种局部信息增强与对话结构感知的多轮对话模型"", 《小型微型计算机系统》, vol. 44, no. 11, 30 November 2023 (2023-11-30), pages 2408 - 2415 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129838A (en) * 2022-06-08 2022-09-30 阿里巴巴(中国)有限公司 Model training method, dialogue data processing method, device, equipment and storage medium
CN115081437A (en) * 2022-07-20 2022-09-20 中国电子科技集团公司第三十研究所 Machine-generated text detection method and system based on linguistic feature contrast learning
CN115081437B (en) * 2022-07-20 2022-12-09 中国电子科技集团公司第三十研究所 Machine-generated text detection method and system based on linguistic feature contrast learning
CN115310429A (en) * 2022-08-05 2022-11-08 厦门靠谱云股份有限公司 Data compression and high-performance calculation method in multi-turn listening dialogue model
CN115129824A (en) * 2022-08-15 2022-09-30 山东交通学院 Search type multi-turn dialogue method and system
CN115048944A (en) * 2022-08-16 2022-09-13 之江实验室 Open domain dialogue reply method and system based on theme enhancement
CN115329062A (en) * 2022-10-17 2022-11-11 中邮消费金融有限公司 Dialogue model training method under low-data scene and computer equipment
CN115617971A (en) * 2022-11-14 2023-01-17 湖南君安科技有限公司 Dialog text generation method based on ALBERT-Coref model
CN116932703A (en) * 2023-09-19 2023-10-24 苏州元脑智能科技有限公司 User controllable content generation method, device, equipment and medium
CN116957047A (en) * 2023-09-19 2023-10-27 苏州元脑智能科技有限公司 Sampling network updating method, device, equipment and medium
CN116957047B (en) * 2023-09-19 2024-01-23 苏州元脑智能科技有限公司 Sampling network updating method, device, equipment and medium
CN116932703B (en) * 2023-09-19 2024-01-23 苏州元脑智能科技有限公司 User controllable content generation method, device, equipment and medium
CN117875434A (en) * 2024-03-13 2024-04-12 中国科学技术大学 Financial large model length extrapolation method for expanding input context length
CN117875434B (en) * 2024-03-13 2024-06-04 中国科学技术大学 Financial large model length extrapolation method for expanding input context length
CN118227796A (en) * 2024-05-23 2024-06-21 国家计算机网络与信息安全管理中心 Automatic classification and threshold optimization method and system for long text specific content
CN118227796B (en) * 2024-05-23 2024-07-19 国家计算机网络与信息安全管理中心 Automatic classification and threshold optimization method and system for long text specific content

Similar Documents

Publication Publication Date Title
CN114443827A (en) Local information perception dialogue method and system based on pre-training language model
CN111783462A (en) Chinese named entity recognition model and method based on dual neural network fusion
CN111460176B (en) Multi-document machine reading and understanding method based on hash learning
CN111274375A (en) Multi-turn dialogue method and system based on bidirectional GRU network
CN114490991A (en) Dialog structure perception dialog method and system based on fine-grained local information enhancement
CN113673535B (en) Image description generation method of multi-modal feature fusion network
CN112364148B (en) Deep learning method-based generative chat robot
CN116484024A (en) Multi-level knowledge base construction method based on knowledge graph
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN114168754A (en) Relation extraction method based on syntactic dependency and fusion information
CN111597816A (en) Self-attention named entity recognition method, device, equipment and storage medium
CN113887836A (en) Narrative event prediction method fusing event environment information
CN115422388B (en) Visual dialogue method and system
CN111813907A (en) Question and sentence intention identification method in natural language question-answering technology
CN114548090B (en) Fast relation extraction method based on convolutional neural network and improved cascade labeling
CN115422945A (en) Rumor detection method and system integrating emotion mining
CN113626537B (en) Knowledge graph construction-oriented entity relation extraction method and system
CN114564568A (en) Knowledge enhancement and context awareness based dialog state tracking method and system
CN114860908A (en) Task-based dialogue state tracking method fusing slot association and semantic association
CN115169363A (en) Knowledge-fused incremental coding dialogue emotion recognition method
CN114648005A (en) Multi-fragment machine reading understanding method and device for multitask joint learning
CN111158640B (en) One-to-many demand analysis and identification method based on deep learning
CN118312612A (en) Chinese multi-label classification method integrating named entity recognition
CN118312833A (en) Hierarchical multi-label classification method and system for travel resources
CN114528407A (en) Emotional feature extraction method of BI-LSTM-CNN based on orthogonal projection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination