CN114490950A - Training method and storage medium of encoder model, and similarity prediction method and system - Google Patents
Training method and storage medium of encoder model, and similarity prediction method and system Download PDFInfo
- Publication number
- CN114490950A CN114490950A CN202210360834.8A CN202210360834A CN114490950A CN 114490950 A CN114490950 A CN 114490950A CN 202210360834 A CN202210360834 A CN 202210360834A CN 114490950 A CN114490950 A CN 114490950A
- Authority
- CN
- China
- Prior art keywords
- neural network
- text
- encoder model
- text sequence
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000012549 training Methods 0.000 title claims abstract description 50
- 238000013528 artificial neural network Methods 0.000 claims abstract description 153
- 239000013598 vector Substances 0.000 claims abstract description 140
- 230000006870 function Effects 0.000 claims abstract description 60
- 238000011176 pooling Methods 0.000 claims abstract description 47
- 238000004364 calculation method Methods 0.000 claims abstract description 20
- 230000008569 process Effects 0.000 claims description 13
- 239000000126 substance Substances 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 2
- 230000000873 masking effect Effects 0.000 claims 1
- 230000009286 beneficial effect Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Acoustics & Sound (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system, comprising the following steps: inputting the two text sequences into the embedding layer to obtain a text sequence vector; inputting the two text sequence vectors into a twin neural network encoder model so as to determine a hidden state based on the same neural network parameters; constructing an automatic supervision loss function according to the neural network parameters; inputting the hidden state into a pooling layer to perform pooling according to the hidden state, determining similarity of two text sequences according to the text sequence vectors after the pooling, and constructing a supervision loss function by using the similarity; determining a loss function according to the self-supervision and supervised loss functions to update neural network parameters; the new text sequence continues to be entered until the value of the loss function is at a minimum. The method greatly improves the reasoning bandwidth of the model when the similarity of the text sequences is calculated, and can realize the accurate calculation of the similarity of the two text sequences based on the trained neural network encoder model.
Description
Technical Field
The invention relates to the field of text similarity, in particular to a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system.
Background
The text similarity refers to the similarity of two texts, and the application scenes comprise text classification, clustering, text topic detection, topic tracking, machine translation and the like. More specifically, monitoring the call line in the voice communication scene also requires determining the similarity between texts, but the conversation content acquired in the voice communication scene is noisy, mixed with accent and insufficient in information integrity, and in the prior art, whether the conversation content is similar or not needs to be checked manually, which consumes a lot of manpower and time.
Disclosure of Invention
The invention aims to overcome at least one defect of the prior art, and provides a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system, which are used for solving the problems that in the prior art, manual sampling inspection is relied on when text similarity is determined, the detection coverage is small, and the subjectivity is high.
The technical scheme adopted by the invention comprises the following steps:
in a first aspect, the present invention provides a method for training a deep neural network encoder model, including: performing training operations on two different text sequences; the training operation is as follows: inputting the two text sequences into an embedded layer for vectorization to obtain two text sequence vectors; inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters; simultaneously constructing an auto-supervision loss function of the neural network encoder model according to the neural network parameters; inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors, and determining similarity of the two text sequences according to the two text sequence vectors after the pooling processing; constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences; determining a loss function of the neural network encoder model according to the unsupervised loss function and the supervised loss function, so that the neural network encoder model updates neural network parameters according to the loss function; and continuing to execute the training operation on the new two different text sequences until the numerical value of the loss function is the minimum value, so as to obtain the trained neural network encoder model.
The invention provides a method for predicting similarity of text sequences, which comprises the steps of inputting two different text sequences into an embedded layer for vectorization to obtain two text sequence vectors; inputting the two text sequence vectors into a twin neural network encoder model obtained by training the deep neural network encoder model by the training method, so that the neural network encoder model outputs the hidden states of the two text sequence vectors; inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors; and determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.
In a third aspect, the present invention provides a system for predicting similarity of text sequences, including: the system comprises a word input module, a word embedding module, a twin neural network encoder model obtained by training the deep neural network encoder model by the training method, a hidden state pooling module and a vector similarity calculation module; the word input module is used for serializing two different text data input from the outside to obtain two different text sequences and outputting the two different text sequences to the word embedding module; the word embedding module is used for vectorizing the two text sequences to obtain two text sequence vectors and outputting the two text sequence vectors to the neural network encoder model; the neural network encoder model is used for determining the hidden states of the two text sequence vectors based on the neural network parameters and outputting the hidden states to a hidden state pooling module; the hidden state pooling module is used for pooling the two text sequence vectors according to the hidden states of the two text sequence vectors and outputting the pooled text sequence vectors to the vector similarity calculation module; and the vector similarity calculation module is used for determining the similarity of the two text sequences according to the two text sequence vectors after the pooling processing.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the above-mentioned method for training a deep neural network encoder model, and/or the above-mentioned method for predicting similarity of text sequences.
Compared with the prior art, the invention has the beneficial effects that:
the training method of the encoder model provided by the embodiment is used for training to obtain a trained twin neural network encoder model, and the twin neural network encoder model shares the same neural network parameter, so that the inference bandwidth of the model in calculating the semantic similarity between text sequences is greatly increased, and the trained neural network encoder model can be used for realizing the accurate calculation of the similarity between two text sequences. Meanwhile, in the training process, the neural network encoder model is trained in a combined manner of self-supervision and supervision, so that the finally updated neural network parameters are beneficial to improving the accuracy of semantic similarity calculation of the neural network encoder model at the semantic level.
Drawings
FIG. 1 is a schematic flow chart of the method steps S110-S180 in example 1.
Fig. 2 is a schematic diagram of a training process of the neural network encoder model according to embodiment 1.
Fig. 3 is a schematic diagram of a hidden state calculation process of the neural network encoder model according to embodiment 1.
FIG. 4 is a flowchart illustrating steps S210-S240 of the method of embodiment 2.
Fig. 5 is a schematic diagram of a prediction process of the prediction method of embodiment 2.
Fig. 6 is a schematic diagram of a prediction process of the prediction system of embodiment 3.
Detailed Description
The drawings are only for purposes of illustration and are not to be construed as limiting the invention. For a better understanding of the following embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
Example 1
The embodiment provides a training method of a deep neural network encoder model, which is used for training a twin neural network encoder model, wherein the twin neural network can be composed of two sub-networks or one network in a broad sense, and the key point is that the twin neural networks share the same neural network parameter.
As shown in fig. 1 and 2, the method includes the following steps:
s110, inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;
in this step, the text sequence refers to text data that has been preprocessed so as to satisfy an input format compatible with the embedding layer. In a specific embodiment, the pre-treatment comprises:
carrying out data cleaning on the original text data; reading preset special symbols, stop words and a user dictionary word list, removing the special symbols in the text data, combining the read user dictionary to perform word segmentation on the text sequence, and removing the stop words existing in the text data. And converting the text data into a plurality of sub-text sequences, sequencing and splicing the plurality of sub-text sequences according to the length, and cutting according to the preset data size of the training batch to obtain a plurality of text sequences as training data.
The training method provided by the embodiment is used for training a neural network encoder model for calculating the similarity of the text sequences, so that the label is the real similarity between two different text sequences in each group of text sequences. The sets of text sequences that have been selected as input are converted into integer data before entering the embedding layer. In a preferred embodiment, Tokenizer may be employed to convert the text data into integer data.
The embedding layer is used for converting an input text sequence into a vector with a fixed size, specifically mapping the text sequence into a vector space, thereby obtaining text sequence vectors of two text sequences.
S120, inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors;
in this step, after receiving the two text sequence vectors, the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters. The neural network parameters refer to parameters of a neural network encoder model backbone network. The hidden state is a high-dimensional vector obtained by a series of matrix operations and nonlinear transformation in a neural network.
And when the neural network encoder model is initialized, video memory space is distributed according to each internal module, the pre-trained parameters are loaded, and the neural network parameters are read. In a specific embodiment, the neural network coding model may be implemented by a (Bidirectional Encoder) pre-training language model, and when the neural network coding model is initialized, pre-trained BERT parameters are loaded and then the neural network parameters are read.
As shown in fig. 3, in a specific implementation process, the neural network encoder model is composed of N neural network encoder sub-modules, and is used for iteratively calculating the hidden state of the text sequence vector.
A single encoder model submodule in a neural network encoder model receives two text sequence vectors x1And x2And then, firstly determining the hidden state of each text sequence vector, and carrying out layer standardization processing on the obtained hidden state to relieve the problem of gradient explosion in the model training process. And inputting the hidden state subjected to layer standardization into a residual error module in the submodule for calculation so as to avoid gradient dispersion caused by excessive network layer number of the neural network encoder model. Inputting the hidden state output by the residual module into the full-link layer in the submodule for processing to obtain a corresponding text sequence vector x output by the encoder submodule1Hidden state u of1And corresponding text sequence vector x2Hidden state u of2。
N encoder sub-modules are connected in series, each encoder model sub-module calculates the hidden state of the encoder model sub-module relative to the text sequence vector based on the respective internal neural network parameters, the hidden state of the text sequence vector finally output by the sub-module is output to the next encoder model sub-module and serves as the input of the next encoder model sub-module until the last encoder model sub-module outputs the hidden state of the text sequence vector and serves as the hidden state of the text sequence vector output by the final model.
In particular, each encoder model sub-module in the neural network encoder model may be according to the equation:
determining a hidden state of the text sequence vector, wherein,is a hidden state of the text sequence vector,in order to be a non-linear activation function,in order to take care of the force-mechanism transformation,in order to be a parameter of the neural network,is the input text sequence vector.
S130, constructing an automatic supervision loss function of a neural network encoder model according to the neural network parameters;
the variable of the self-supervision loss function is a neural network parameter of the neural network encoder model, and is used for updating the neural network parameter in a gradient descending mode so as to enable the loss function to reach the minimum value.
In a specific embodiment, the auto-supervised loss function is:
wherein the content of the first and second substances,the function of the probability density is represented by,in order to be a parameter of the neural network,to obscure the corresponding parameters of the language model output layer,the corresponding parameters of the layer are output for the next sentence of the prediction model. The Mask Language Model (MLM) refers to a Model that randomly masks some positions in an input text sequence and then predicts the positions Masked by the text sequence. Next sentence prediction model (NS)P) refers to a model for predicting whether two sentences are consecutive two sentences.In order to obscure the training data set of the language model,the training data set for the next sentence of the predictive model,andwords predicted for the masked location and words that are true for the location are separately for the masked language model,showing the connection relation between the next sentence prediction model output and the two text sequences before and after the next sentence prediction model output,representing the true connection relationship with the two text sequences before and after.
S140, inputting the hidden states of the two text sequence vectors output by the neural network encoder model into a pooling layer, so that the pooling layer performs pooling treatment on the two text sequence vectors according to the hidden states of the two text sequence vectors;
in this step, after receiving the hidden states of the two text sequence vectors, the pooling layer maps the hidden states to a semantic vector space with a fixed size, so as to obtain semantic vectors of the text sequence vectors in a uniform size, that is, the text sequence vectors after pooling. The fixed size is preset.
S150, determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment;
in this step, two text sequences can be determined by the method commonly used in the prior art for calculating the similarity between two vectorsColumn similarity. In particular embodiments, the formula may be utilizedThe similarity of two text sequences is determined.
Wherein the content of the first and second substances,is the degree of similarity of two text sequences,andrespectively, to represent two text sequences that are,is the vector product of the two pooled text sequence vectors,is the product of the moduli of the two pooled text sequence vectors.
S160, constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences;
the supervised loss function is constructed by the similarity and the real similarity of two text sequences determined by a neural network encoder model, the similarity of the two text sequences is calculated based on pooled text sequence vectors, the pooled text sequence vectors are obtained based on a hidden state output by the neural network encoder model, and the hidden state is obtained based on neural network parameters, so that the neural network parameters certainly influence the similarity calculation of the two text sequences.
In a specific embodiment, the supervised loss function is:
wherein the content of the first and second substances,is composed ofAndthe degree of similarity of the real text of (c),is the number of text sequences that are captured each time a training operation is performed.
S170, determining a loss function of the neural network encoder model according to the self-supervision loss function and the supervision loss function, so that the neural network encoder model updates neural network parameters according to the loss function;
in the step, the loss function of the neural network encoder model is constructed by combining the self-supervision loss function and the supervised loss function, namely, the neural network encoder is trained jointly by combining the self-supervision mode and the supervised mode, which is beneficial to obtaining the optimal solution of the neural network parameters. The combination of the auto-supervised and supervised loss functions may be by adding the two or by performing any suitable operation on the two.
In a specific embodiment, the loss function is. Wherein the content of the first and second substances,is an auto-supervision loss function;in order to have a supervised loss function,for adjusting the hyperparameters of the weights, i.e. by adjustingThe values of (a) may adjust the weights that the supervised and the unsupervised loss functions account for in the overall loss function,less than 1 is satisfied.
And S180, judging whether the numerical value of the loss function reaches the minimum value, if not, updating the neural network parameters, and re-executing the step S110 on the new two different text sequences, if so, obtaining the trained neural network encoder model.
Because only one group of two different text sequences are input into the neural network encoder model in the above steps, step S110 needs to be executed again, new text sequences are continuously input into the neural network encoder model to train the neural network encoder model, the neural network parameters of the neural network encoder model are continuously updated in a gradient descending manner in the training process until the numerical value of the loss function is the minimum value, and the training of the neural network encoder model is completed to obtain the trained neural network encoder model.
The training method of the deep neural network encoder model provided by the embodiment is used for training a twin neural network encoder model, the neural network encoder model obtained through training greatly improves the inference bandwidth during semantic similarity calculation between text sequences, and accurate calculation of similarity between two text sequences can be achieved based on the neural network encoder model. Meanwhile, in the training process, a loss function of the neural network encoder model is constructed in a mode of combining self-supervision and supervision to jointly train the neural network encoder model, and finally, the updated neural network parameters are beneficial to improving the accuracy of semantic similarity calculation of the neural network encoder model on the semantic level. Because the neural network encoder model captures context semantic information well, when the neural network encoder model is applied to multi-turn conversation scenes such as communication lines, different conversation scenes can be distinguished more intelligently and automatically, abnormal communication behaviors can be found in time, and the intelligent degree of voice service management is improved.
Example 2
Based on the same concept as that of embodiment 1, this embodiment provides a method for predicting similarity of text sequences, which mainly predicts the similarity of two different text sequences by using a neural network encoder model obtained by training the neural network encoder model provided in the embodiment.
As shown in fig. 3 and 4, the method includes:
s210, inputting two different text sequences into the embedded layer for vectorization to obtain two text sequence vectors;
before this step is performed, two types of text data requiring prediction similarity may be determined, and may be preprocessed by serialization or the like, so that the two types of text data become two types of text sequences and are compatible with the embedding layer, the neural network encoder model, and the pooling layer.
S220, inputting the two text sequence vectors into the trained neural network encoder model so that the neural network encoder model outputs the hidden states of the two text sequence vectors;
after the trained neural network encoder model receives the two text sequence vectors, each encoder model submodule of the neural network encoder model is according to the formulaDetermining a hidden state of the text sequence vector, wherein,is a hidden state of the text sequence vector,in order to be a non-linear activation function,in order to take care of the force-mechanism transformation,in order to be a parameter of the neural network,is the input text sequence vector.
In a specific implementation process, the neural network encoder model comprises a plurality of neural network encoder model sub-modules, the output of one sub-module is used as the input of the next sub-module in a front-back series connection mode and used for iteratively calculating the hidden state of the text sequence vector, and the last encoder model sub-module outputs the hidden state of the text sequence vector as the hidden state of the text sequence vector output by the final model.
S230, inputting the hidden states of the two text sequence vectors into a pooling layer so that the pooling layer performs pooling treatment on the two text sequence vectors according to the hidden states of the two text sequence vectors;
and after receiving the hidden states of the two text sequence vectors, the pooling layer maps the hidden states of the two text sequences to a semantic vector space with a fixed size to obtain the semantic vectors with a uniform size.
And S240, determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.
Wherein the content of the first and second substances,is the degree of similarity of two text sequences,andare respectively provided withWhich represents two sequences of text that are to be presented,is the vector product of the two pooled text sequence vectors,is the product of the moduli of the two pooled text sequence vectors.
The twin neural network encoder model obtained by the training method provided in embodiment 1 can realize high accuracy of semantic similarity calculation at a semantic level based on the determined neural network parameters, and when the input text sequence is conversation content supervised in a communication line, the neural network encoder model can automatically distinguish different conversation scenes more intelligently, discover abnormal communication behaviors in time, and improve the intelligent degree of voice service management.
The method for predicting similarity of text sequences provided in this embodiment is based on the same concept as that of embodiment 1, and therefore, the same steps and terms, definitions, explanations, specific/preferred embodiments, and beneficial effects thereof as those in embodiment 1 can be referred to the description in embodiment 1, and are not repeated in this embodiment.
Example 3
Based on the same concept as that in embodiments 1 and 2, this embodiment provides a text sequence similarity prediction system, which mainly predicts the similarity between two different text sequences by using a neural network encoder model obtained by training through the neural network encoder model training method provided in embodiment 1.
As shown in fig. 5, the system includes: the word input module 310, the word embedding module 320, the neural network encoder model trained by the training method provided in embodiment 1, the hidden state pooling module 330, and the vector similarity calculation module 340.
The word input module 310 is configured to receive two types of text data input from the outside, serialize the two types of text data to obtain two different text sequences, and output the two different text sequences to the word embedding module 320.
The word embedding module 320 is configured to vector the two text sequences, specifically, map the text sequences into a vector space, so as to obtain text sequence vectors of the two text sequences, and output the text sequence vectors to the neural network encoder model. The neural network encoder model is used to determine the hidden states of the two text sequence vectors based on the neural network parameters and output them to the hidden state pooling module 330.
After the trained neural network encoder model receives the two text sequence vectors, each encoder model submodule of the neural network encoder model is according to the formulaDetermining a hidden state of the text sequence vector, wherein,is a hidden state of the text sequence vector,in order to be a non-linear activation function,in order to take care of the force-mechanism transformation,in order to be a parameter of the neural network,is the input text sequence vector.
In a specific implementation process, the neural network encoder model comprises a plurality of neural network encoder model sub-modules, the output of one sub-module is used as the input of the next sub-module in a front-back series connection mode and used for iteratively calculating the hidden state of the text sequence vector, and the last encoder model sub-module outputs the hidden state of the text sequence vector as the hidden state of the text sequence vector output by the final model.
The hidden state pooling module 330 is configured to pool the two text sequence vectors according to the hidden states of the two text sequence vectors, specifically, map the hidden states of the two text sequences to a semantic vector space with a fixed size to obtain semantic vectors with a uniform size, and output the semantic vectors to the vector similarity calculation module 340 as the text sequence vectors after pooling.
The vector similarity calculation module 340 is configured to determine the similarity between two text sequences according to the two text sequence vectors after the pooling process.
The vector similarity calculation module 340 is particularly useful for utilizing equationsThe similarity of two text sequences is determined. Wherein the content of the first and second substances,the degree of similarity of the two text sequences,andrespectively, to represent two text sequences that are,is the vector product of the two pooled text sequence vectors,is the product of the moduli of the two pooled text sequence vectors.
The similarity prediction system for text sequences provided in this embodiment is based on the same concept as that of embodiments 1 and 2, and therefore, the same steps and terms, definitions, explanations, specific/preferred embodiments, and the beneficial effects thereof as those of embodiments 1 and 2 can be referred to the descriptions in embodiments 1 and 2, and are not repeated in this embodiment.
It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the technical solutions of the present invention, and are not intended to limit the specific embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention claims should be included in the protection scope of the present invention claims.
Claims (10)
1. A training method of a deep neural network encoder model is characterized by comprising the following steps:
performing training operation on two different text sequences;
the training operation is as follows:
inputting the two text sequences into an embedded layer for vectorization to obtain two text sequence vectors;
inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters;
simultaneously constructing an auto-supervision loss function of the neural network encoder model according to the neural network parameters;
inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer pools the two text sequence vectors according to the hidden states of the two text sequence vectors, and determines the similarity of the two text sequences according to the two text sequence vectors after pooling;
constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences;
determining a loss function of the neural network encoder model according to the unsupervised loss function and the supervised loss function, so that the neural network encoder model updates neural network parameters according to the loss function;
and continuing to execute the training operation on the new two different text sequences until the numerical value of the loss function is the minimum value, so as to obtain the trained neural network encoder model.
2. The method of claim 1, wherein the deep neural network encoder model is a deep neural network encoder model,
determining a loss function of the neural network encoder model according to the auto-supervised loss function and the supervised loss function, specifically comprising: taking the sum of the auto-supervised loss function and the supervised loss function as a loss function of the neural network encoder model.
3. The method of claim 1, wherein the deep neural network encoder model is a deep neural network encoder model,
determining the similarity of the two text sequences according to the two text sequence vectors after the pooling process, specifically comprising: utilizing typeDetermining the similarity of the two text sequences;
wherein the content of the first and second substances,for the similarity of two of said text sequences,andrespectively, to represent two text sequences that are,the vector product of the two text sequence vectors after the pooling treatment is obtained;is the product of the moduli of the two pooled text sequence vectors.
4. The method of claim 3, wherein the deep neural network encoder model is trainedCharacterized in that said supervised loss function is:;
5. The method of claim 4, wherein the auto-supervised loss function is:
wherein the content of the first and second substances,the function of the probability density is represented by,for the purpose of the neural network parameters,andrespectively represent the masked language modelsThe next sentence prediction model corresponds to the parameters of the output layer,andtraining data sets for the masking language model and the next sentence prediction model respectively,andrespectively for the predicted words and the real words of the masked language model,showing the connection relation between the next sentence prediction model output and the two text sequences before and after the next sentence prediction model output,representing the true connection relationship with the two text sequences before and after.
7. The method for training the deep neural network encoder model according to claim 1, wherein the neural network encoder model determines hidden states of two text sequence vectors based on the same neural network parameters, and specifically comprises:
the neural network encoder model utilizesDetermining the hidden states of the two text sequence vectors;
8. A method for predicting similarity of text sequences is characterized in that,
inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;
inputting two text sequence vectors into a twin neural network encoder model obtained by training the deep neural network encoder model according to any one of claims 1 to 7, so that the neural network encoder model outputs the hidden states of the two text sequence vectors;
inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors;
and determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.
9. A system for predicting similarity of text sequences, comprising: the system comprises a word input module, a word embedding module, a twin neural network encoder model obtained by training the deep neural network encoder model according to any one of claims 1-7, a hidden state pooling module and a vector similarity calculation module;
the word input module is used for serializing two different text data input from the outside to obtain two different text sequences and outputting the two different text sequences to the word embedding module;
the word embedding module is used for vectorizing the two text sequences to obtain two text sequence vectors and outputting the two text sequence vectors to the neural network encoder model;
the neural network encoder model is used for determining the hidden states of the two text sequence vectors based on the neural network parameters and outputting the hidden states to a hidden state pooling module;
the hidden state pooling module is used for pooling the two text sequence vectors according to the hidden states of the two text sequence vectors and outputting the pooled text sequence vectors to the vector similarity calculation module;
and the vector similarity calculation module is used for determining the similarity of the two text sequences according to the two text sequence vectors after the pooling processing.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for training a deep neural network encoder model according to any one of claims 1 to 7 and/or the method for predicting similarity of text sequences according to claim 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210360834.8A CN114490950B (en) | 2022-04-07 | 2022-04-07 | Method and storage medium for training encoder model, and method and system for predicting similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210360834.8A CN114490950B (en) | 2022-04-07 | 2022-04-07 | Method and storage medium for training encoder model, and method and system for predicting similarity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114490950A true CN114490950A (en) | 2022-05-13 |
CN114490950B CN114490950B (en) | 2022-07-12 |
Family
ID=81487384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210360834.8A Active CN114490950B (en) | 2022-04-07 | 2022-04-07 | Method and storage medium for training encoder model, and method and system for predicting similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114490950B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114743545A (en) * | 2022-06-14 | 2022-07-12 | 联通(广东)产业互联网有限公司 | Dialect type prediction model training method and device and storage medium |
CN115357690A (en) * | 2022-10-19 | 2022-11-18 | 有米科技股份有限公司 | Text repetition removing method and device based on text mode self-supervision |
CN115660871A (en) * | 2022-11-08 | 2023-01-31 | 上海栈略数据技术有限公司 | Medical clinical process unsupervised modeling method, computer device, and storage medium |
WO2024067779A1 (en) * | 2022-09-30 | 2024-04-04 | 华为技术有限公司 | Data processing method and related apparatus |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3054403A2 (en) * | 2015-02-06 | 2016-08-10 | Google, Inc. | Recurrent neural networks for data item generation |
CN108388888A (en) * | 2018-03-23 | 2018-08-10 | 腾讯科技(深圳)有限公司 | A kind of vehicle identification method, device and storage medium |
CN109614471A (en) * | 2018-12-07 | 2019-04-12 | 北京大学 | A kind of open-ended question automatic generation method based on production confrontation network |
CN110009013A (en) * | 2019-03-21 | 2019-07-12 | 腾讯科技(深圳)有限公司 | Encoder training and characterization information extracting method and device |
CN110347839A (en) * | 2019-07-18 | 2019-10-18 | 湖南数定智能科技有限公司 | A kind of file classification method based on production multi-task learning model |
US20200026954A1 (en) * | 2019-09-27 | 2020-01-23 | Intel Corporation | Video tracking with deep siamese networks and bayesian optimization |
CN111144565A (en) * | 2019-12-27 | 2020-05-12 | 中国人民解放军军事科学院国防科技创新研究院 | Self-supervision field self-adaptive deep learning method based on consistency training |
CN112149689A (en) * | 2020-09-28 | 2020-12-29 | 上海交通大学 | Unsupervised domain adaptation method and system based on target domain self-supervised learning |
CN112396479A (en) * | 2021-01-20 | 2021-02-23 | 成都晓多科技有限公司 | Clothing matching recommendation method and system based on knowledge graph |
CN113159945A (en) * | 2021-03-12 | 2021-07-23 | 华东师范大学 | Stock fluctuation prediction method based on multitask self-supervision learning |
US20210326660A1 (en) * | 2020-04-21 | 2021-10-21 | Google Llc | Supervised Contrastive Learning with Multiple Positive Examples |
CN113553906A (en) * | 2021-06-16 | 2021-10-26 | 之江实验室 | Method for discriminating unsupervised cross-domain pedestrian re-identification based on class center domain alignment |
CN113705772A (en) * | 2021-07-21 | 2021-11-26 | 浪潮(北京)电子信息产业有限公司 | Model training method, device and equipment and readable storage medium |
CN113936647A (en) * | 2021-12-17 | 2022-01-14 | 中国科学院自动化研究所 | Training method of voice recognition model, voice recognition method and system |
CN114003698A (en) * | 2021-12-27 | 2022-02-01 | 成都晓多科技有限公司 | Text retrieval method, system, equipment and storage medium |
-
2022
- 2022-04-07 CN CN202210360834.8A patent/CN114490950B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3054403A2 (en) * | 2015-02-06 | 2016-08-10 | Google, Inc. | Recurrent neural networks for data item generation |
CN108388888A (en) * | 2018-03-23 | 2018-08-10 | 腾讯科技(深圳)有限公司 | A kind of vehicle identification method, device and storage medium |
CN109614471A (en) * | 2018-12-07 | 2019-04-12 | 北京大学 | A kind of open-ended question automatic generation method based on production confrontation network |
CN110009013A (en) * | 2019-03-21 | 2019-07-12 | 腾讯科技(深圳)有限公司 | Encoder training and characterization information extracting method and device |
CN110347839A (en) * | 2019-07-18 | 2019-10-18 | 湖南数定智能科技有限公司 | A kind of file classification method based on production multi-task learning model |
US20200026954A1 (en) * | 2019-09-27 | 2020-01-23 | Intel Corporation | Video tracking with deep siamese networks and bayesian optimization |
CN111144565A (en) * | 2019-12-27 | 2020-05-12 | 中国人民解放军军事科学院国防科技创新研究院 | Self-supervision field self-adaptive deep learning method based on consistency training |
US20210326660A1 (en) * | 2020-04-21 | 2021-10-21 | Google Llc | Supervised Contrastive Learning with Multiple Positive Examples |
CN112149689A (en) * | 2020-09-28 | 2020-12-29 | 上海交通大学 | Unsupervised domain adaptation method and system based on target domain self-supervised learning |
CN112396479A (en) * | 2021-01-20 | 2021-02-23 | 成都晓多科技有限公司 | Clothing matching recommendation method and system based on knowledge graph |
CN113159945A (en) * | 2021-03-12 | 2021-07-23 | 华东师范大学 | Stock fluctuation prediction method based on multitask self-supervision learning |
CN113553906A (en) * | 2021-06-16 | 2021-10-26 | 之江实验室 | Method for discriminating unsupervised cross-domain pedestrian re-identification based on class center domain alignment |
CN113705772A (en) * | 2021-07-21 | 2021-11-26 | 浪潮(北京)电子信息产业有限公司 | Model training method, device and equipment and readable storage medium |
CN113936647A (en) * | 2021-12-17 | 2022-01-14 | 中国科学院自动化研究所 | Training method of voice recognition model, voice recognition method and system |
CN114003698A (en) * | 2021-12-27 | 2022-02-01 | 成都晓多科技有限公司 | Text retrieval method, system, equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
徐流畅: "预训练深度学习架构下的语义地址匹配与语义空间融合模型研究", 《中国优秀博硕士学位论文全文数据库(博士)基础科学辑》 * |
赵龙龙: "基于深度学习组合模型的滚动轴承故障诊断", 《中国优秀博硕士学位论文全文数据库(硕士)工程科技Ⅱ辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114743545A (en) * | 2022-06-14 | 2022-07-12 | 联通(广东)产业互联网有限公司 | Dialect type prediction model training method and device and storage medium |
CN114743545B (en) * | 2022-06-14 | 2022-09-02 | 联通(广东)产业互联网有限公司 | Dialect type prediction model training method and device and storage medium |
WO2024067779A1 (en) * | 2022-09-30 | 2024-04-04 | 华为技术有限公司 | Data processing method and related apparatus |
CN115357690A (en) * | 2022-10-19 | 2022-11-18 | 有米科技股份有限公司 | Text repetition removing method and device based on text mode self-supervision |
CN115660871A (en) * | 2022-11-08 | 2023-01-31 | 上海栈略数据技术有限公司 | Medical clinical process unsupervised modeling method, computer device, and storage medium |
CN115660871B (en) * | 2022-11-08 | 2023-06-06 | 上海栈略数据技术有限公司 | Unsupervised modeling method for medical clinical process, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114490950B (en) | 2022-07-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114490950B (en) | Method and storage medium for training encoder model, and method and system for predicting similarity | |
CN112116030B (en) | Image classification method based on vector standardization and knowledge distillation | |
Gu et al. | Stack-captioning: Coarse-to-fine learning for image captioning | |
CN108427771B (en) | Abstract text generation method and device and computer equipment | |
WO2022142041A1 (en) | Training method and apparatus for intent recognition model, computer device, and storage medium | |
CN111708882B (en) | Transformer-based Chinese text information missing completion method | |
CN110648659B (en) | Voice recognition and keyword detection device and method based on multitask model | |
CN112528637B (en) | Text processing model training method, device, computer equipment and storage medium | |
CN109902301B (en) | Deep neural network-based relationship reasoning method, device and equipment | |
CN111930914B (en) | Problem generation method and device, electronic equipment and computer readable storage medium | |
US20220343139A1 (en) | Methods and systems for training a neural network model for mixed domain and multi-domain tasks | |
WO2020155619A1 (en) | Method and apparatus for chatting with machine with sentiment, computer device and storage medium | |
CN111813954B (en) | Method and device for determining relationship between two entities in text statement and electronic equipment | |
JP6738769B2 (en) | Sentence pair classification device, sentence pair classification learning device, method, and program | |
CN112101042A (en) | Text emotion recognition method and device, terminal device and storage medium | |
CN110942774A (en) | Man-machine interaction system, and dialogue method, medium and equipment thereof | |
CN113254615A (en) | Text processing method, device, equipment and medium | |
CN115064154A (en) | Method and device for generating mixed language voice recognition model | |
CN115204143A (en) | Method and system for calculating text similarity based on prompt | |
CN115757695A (en) | Log language model training method and system | |
CN115687609A (en) | Zero sample relation extraction method based on Prompt multi-template fusion | |
CN114662601A (en) | Intention classification model training method and device based on positive and negative samples | |
CN114925681A (en) | Knowledge map question-answer entity linking method, device, equipment and medium | |
CN113177113B (en) | Task type dialogue model pre-training method, device, equipment and storage medium | |
CN115495579A (en) | Method and device for classifying text of 5G communication assistant, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |