CN114490950A - Training method and storage medium of encoder model, and similarity prediction method and system - Google Patents

Training method and storage medium of encoder model, and similarity prediction method and system Download PDF

Info

Publication number
CN114490950A
CN114490950A CN202210360834.8A CN202210360834A CN114490950A CN 114490950 A CN114490950 A CN 114490950A CN 202210360834 A CN202210360834 A CN 202210360834A CN 114490950 A CN114490950 A CN 114490950A
Authority
CN
China
Prior art keywords
neural network
text
encoder model
text sequence
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210360834.8A
Other languages
Chinese (zh)
Other versions
CN114490950B (en
Inventor
肖清
赵文博
李剑锋
许程冲
周丽萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unicom Guangdong Industrial Internet Co Ltd
Original Assignee
China Unicom Guangdong Industrial Internet Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unicom Guangdong Industrial Internet Co Ltd filed Critical China Unicom Guangdong Industrial Internet Co Ltd
Priority to CN202210360834.8A priority Critical patent/CN114490950B/en
Publication of CN114490950A publication Critical patent/CN114490950A/en
Application granted granted Critical
Publication of CN114490950B publication Critical patent/CN114490950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Acoustics & Sound (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system, comprising the following steps: inputting the two text sequences into the embedding layer to obtain a text sequence vector; inputting the two text sequence vectors into a twin neural network encoder model so as to determine a hidden state based on the same neural network parameters; constructing an automatic supervision loss function according to the neural network parameters; inputting the hidden state into a pooling layer to perform pooling according to the hidden state, determining similarity of two text sequences according to the text sequence vectors after the pooling, and constructing a supervision loss function by using the similarity; determining a loss function according to the self-supervision and supervised loss functions to update neural network parameters; the new text sequence continues to be entered until the value of the loss function is at a minimum. The method greatly improves the reasoning bandwidth of the model when the similarity of the text sequences is calculated, and can realize the accurate calculation of the similarity of the two text sequences based on the trained neural network encoder model.

Description

Training method and storage medium of encoder model, and similarity prediction method and system
Technical Field
The invention relates to the field of text similarity, in particular to a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system.
Background
The text similarity refers to the similarity of two texts, and the application scenes comprise text classification, clustering, text topic detection, topic tracking, machine translation and the like. More specifically, monitoring the call line in the voice communication scene also requires determining the similarity between texts, but the conversation content acquired in the voice communication scene is noisy, mixed with accent and insufficient in information integrity, and in the prior art, whether the conversation content is similar or not needs to be checked manually, which consumes a lot of manpower and time.
Disclosure of Invention
The invention aims to overcome at least one defect of the prior art, and provides a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system, which are used for solving the problems that in the prior art, manual sampling inspection is relied on when text similarity is determined, the detection coverage is small, and the subjectivity is high.
The technical scheme adopted by the invention comprises the following steps:
in a first aspect, the present invention provides a method for training a deep neural network encoder model, including: performing training operations on two different text sequences; the training operation is as follows: inputting the two text sequences into an embedded layer for vectorization to obtain two text sequence vectors; inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters; simultaneously constructing an auto-supervision loss function of the neural network encoder model according to the neural network parameters; inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors, and determining similarity of the two text sequences according to the two text sequence vectors after the pooling processing; constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences; determining a loss function of the neural network encoder model according to the unsupervised loss function and the supervised loss function, so that the neural network encoder model updates neural network parameters according to the loss function; and continuing to execute the training operation on the new two different text sequences until the numerical value of the loss function is the minimum value, so as to obtain the trained neural network encoder model.
The invention provides a method for predicting similarity of text sequences, which comprises the steps of inputting two different text sequences into an embedded layer for vectorization to obtain two text sequence vectors; inputting the two text sequence vectors into a twin neural network encoder model obtained by training the deep neural network encoder model by the training method, so that the neural network encoder model outputs the hidden states of the two text sequence vectors; inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors; and determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.
In a third aspect, the present invention provides a system for predicting similarity of text sequences, including: the system comprises a word input module, a word embedding module, a twin neural network encoder model obtained by training the deep neural network encoder model by the training method, a hidden state pooling module and a vector similarity calculation module; the word input module is used for serializing two different text data input from the outside to obtain two different text sequences and outputting the two different text sequences to the word embedding module; the word embedding module is used for vectorizing the two text sequences to obtain two text sequence vectors and outputting the two text sequence vectors to the neural network encoder model; the neural network encoder model is used for determining the hidden states of the two text sequence vectors based on the neural network parameters and outputting the hidden states to a hidden state pooling module; the hidden state pooling module is used for pooling the two text sequence vectors according to the hidden states of the two text sequence vectors and outputting the pooled text sequence vectors to the vector similarity calculation module; and the vector similarity calculation module is used for determining the similarity of the two text sequences according to the two text sequence vectors after the pooling processing.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the above-mentioned method for training a deep neural network encoder model, and/or the above-mentioned method for predicting similarity of text sequences.
Compared with the prior art, the invention has the beneficial effects that:
the training method of the encoder model provided by the embodiment is used for training to obtain a trained twin neural network encoder model, and the twin neural network encoder model shares the same neural network parameter, so that the inference bandwidth of the model in calculating the semantic similarity between text sequences is greatly increased, and the trained neural network encoder model can be used for realizing the accurate calculation of the similarity between two text sequences. Meanwhile, in the training process, the neural network encoder model is trained in a combined manner of self-supervision and supervision, so that the finally updated neural network parameters are beneficial to improving the accuracy of semantic similarity calculation of the neural network encoder model at the semantic level.
Drawings
FIG. 1 is a schematic flow chart of the method steps S110-S180 in example 1.
Fig. 2 is a schematic diagram of a training process of the neural network encoder model according to embodiment 1.
Fig. 3 is a schematic diagram of a hidden state calculation process of the neural network encoder model according to embodiment 1.
FIG. 4 is a flowchart illustrating steps S210-S240 of the method of embodiment 2.
Fig. 5 is a schematic diagram of a prediction process of the prediction method of embodiment 2.
Fig. 6 is a schematic diagram of a prediction process of the prediction system of embodiment 3.
Detailed Description
The drawings are only for purposes of illustration and are not to be construed as limiting the invention. For a better understanding of the following embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
Example 1
The embodiment provides a training method of a deep neural network encoder model, which is used for training a twin neural network encoder model, wherein the twin neural network can be composed of two sub-networks or one network in a broad sense, and the key point is that the twin neural networks share the same neural network parameter.
As shown in fig. 1 and 2, the method includes the following steps:
s110, inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;
in this step, the text sequence refers to text data that has been preprocessed so as to satisfy an input format compatible with the embedding layer. In a specific embodiment, the pre-treatment comprises:
carrying out data cleaning on the original text data; reading preset special symbols, stop words and a user dictionary word list, removing the special symbols in the text data, combining the read user dictionary to perform word segmentation on the text sequence, and removing the stop words existing in the text data. And converting the text data into a plurality of sub-text sequences, sequencing and splicing the plurality of sub-text sequences according to the length, and cutting according to the preset data size of the training batch to obtain a plurality of text sequences as training data.
The training method provided by the embodiment is used for training a neural network encoder model for calculating the similarity of the text sequences, so that the label is the real similarity between two different text sequences in each group of text sequences. The sets of text sequences that have been selected as input are converted into integer data before entering the embedding layer. In a preferred embodiment, Tokenizer may be employed to convert the text data into integer data.
The embedding layer is used for converting an input text sequence into a vector with a fixed size, specifically mapping the text sequence into a vector space, thereby obtaining text sequence vectors of two text sequences.
S120, inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors;
in this step, after receiving the two text sequence vectors, the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters. The neural network parameters refer to parameters of a neural network encoder model backbone network. The hidden state is a high-dimensional vector obtained by a series of matrix operations and nonlinear transformation in a neural network.
And when the neural network encoder model is initialized, video memory space is distributed according to each internal module, the pre-trained parameters are loaded, and the neural network parameters are read. In a specific embodiment, the neural network coding model may be implemented by a (Bidirectional Encoder) pre-training language model, and when the neural network coding model is initialized, pre-trained BERT parameters are loaded and then the neural network parameters are read.
As shown in fig. 3, in a specific implementation process, the neural network encoder model is composed of N neural network encoder sub-modules, and is used for iteratively calculating the hidden state of the text sequence vector.
A single encoder model submodule in a neural network encoder model receives two text sequence vectors x1And x2And then, firstly determining the hidden state of each text sequence vector, and carrying out layer standardization processing on the obtained hidden state to relieve the problem of gradient explosion in the model training process. And inputting the hidden state subjected to layer standardization into a residual error module in the submodule for calculation so as to avoid gradient dispersion caused by excessive network layer number of the neural network encoder model. Inputting the hidden state output by the residual module into the full-link layer in the submodule for processing to obtain a corresponding text sequence vector x output by the encoder submodule1Hidden state u of1And corresponding text sequence vector x2Hidden state u of2
N encoder sub-modules are connected in series, each encoder model sub-module calculates the hidden state of the encoder model sub-module relative to the text sequence vector based on the respective internal neural network parameters, the hidden state of the text sequence vector finally output by the sub-module is output to the next encoder model sub-module and serves as the input of the next encoder model sub-module until the last encoder model sub-module outputs the hidden state of the text sequence vector and serves as the hidden state of the text sequence vector output by the final model.
In particular, each encoder model sub-module in the neural network encoder model may be according to the equation:
Figure 177999DEST_PATH_IMAGE001
determining a hidden state of the text sequence vector, wherein,
Figure 222485DEST_PATH_IMAGE002
is a hidden state of the text sequence vector,
Figure 685828DEST_PATH_IMAGE003
in order to be a non-linear activation function,
Figure 233484DEST_PATH_IMAGE004
in order to take care of the force-mechanism transformation,
Figure 465882DEST_PATH_IMAGE005
in order to be a parameter of the neural network,
Figure 706370DEST_PATH_IMAGE006
is the input text sequence vector.
S130, constructing an automatic supervision loss function of a neural network encoder model according to the neural network parameters;
the variable of the self-supervision loss function is a neural network parameter of the neural network encoder model, and is used for updating the neural network parameter in a gradient descending mode so as to enable the loss function to reach the minimum value.
In a specific embodiment, the auto-supervised loss function is:
Figure 657009DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 273935DEST_PATH_IMAGE008
the function of the probability density is represented by,
Figure 610107DEST_PATH_IMAGE009
in order to be a parameter of the neural network,
Figure 21497DEST_PATH_IMAGE010
to obscure the corresponding parameters of the language model output layer,
Figure 193852DEST_PATH_IMAGE011
the corresponding parameters of the layer are output for the next sentence of the prediction model. The Mask Language Model (MLM) refers to a Model that randomly masks some positions in an input text sequence and then predicts the positions Masked by the text sequence. Next sentence prediction model (NS)P) refers to a model for predicting whether two sentences are consecutive two sentences.
Figure 348890DEST_PATH_IMAGE012
In order to obscure the training data set of the language model,
Figure 290301DEST_PATH_IMAGE013
the training data set for the next sentence of the predictive model,
Figure 872592DEST_PATH_IMAGE014
and
Figure 781511DEST_PATH_IMAGE015
words predicted for the masked location and words that are true for the location are separately for the masked language model,
Figure 740240DEST_PATH_IMAGE016
showing the connection relation between the next sentence prediction model output and the two text sequences before and after the next sentence prediction model output,
Figure 801737DEST_PATH_IMAGE017
representing the true connection relationship with the two text sequences before and after.
S140, inputting the hidden states of the two text sequence vectors output by the neural network encoder model into a pooling layer, so that the pooling layer performs pooling treatment on the two text sequence vectors according to the hidden states of the two text sequence vectors;
in this step, after receiving the hidden states of the two text sequence vectors, the pooling layer maps the hidden states to a semantic vector space with a fixed size, so as to obtain semantic vectors of the text sequence vectors in a uniform size, that is, the text sequence vectors after pooling. The fixed size is preset.
S150, determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment;
in this step, two text sequences can be determined by the method commonly used in the prior art for calculating the similarity between two vectorsColumn similarity. In particular embodiments, the formula may be utilized
Figure 554929DEST_PATH_IMAGE018
The similarity of two text sequences is determined.
Wherein the content of the first and second substances,
Figure 701877DEST_PATH_IMAGE019
is the degree of similarity of two text sequences,
Figure 198717DEST_PATH_IMAGE020
and
Figure 98409DEST_PATH_IMAGE021
respectively, to represent two text sequences that are,
Figure 288082DEST_PATH_IMAGE022
is the vector product of the two pooled text sequence vectors,
Figure 922326DEST_PATH_IMAGE023
is the product of the moduli of the two pooled text sequence vectors.
S160, constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences;
the supervised loss function is constructed by the similarity and the real similarity of two text sequences determined by a neural network encoder model, the similarity of the two text sequences is calculated based on pooled text sequence vectors, the pooled text sequence vectors are obtained based on a hidden state output by the neural network encoder model, and the hidden state is obtained based on neural network parameters, so that the neural network parameters certainly influence the similarity calculation of the two text sequences.
In a specific embodiment, the supervised loss function is:
Figure 957278DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure 727788DEST_PATH_IMAGE025
is composed of
Figure 88362DEST_PATH_IMAGE026
And
Figure 209902DEST_PATH_IMAGE027
the degree of similarity of the real text of (c),
Figure 300742DEST_PATH_IMAGE028
is the number of text sequences that are captured each time a training operation is performed.
S170, determining a loss function of the neural network encoder model according to the self-supervision loss function and the supervision loss function, so that the neural network encoder model updates neural network parameters according to the loss function;
in the step, the loss function of the neural network encoder model is constructed by combining the self-supervision loss function and the supervised loss function, namely, the neural network encoder is trained jointly by combining the self-supervision mode and the supervised mode, which is beneficial to obtaining the optimal solution of the neural network parameters. The combination of the auto-supervised and supervised loss functions may be by adding the two or by performing any suitable operation on the two.
In a specific embodiment, the loss function is
Figure 191338DEST_PATH_IMAGE029
. Wherein the content of the first and second substances,
Figure 457234DEST_PATH_IMAGE030
is an auto-supervision loss function;
Figure 800490DEST_PATH_IMAGE031
in order to have a supervised loss function,
Figure 708404DEST_PATH_IMAGE032
for adjusting the hyperparameters of the weights, i.e. by adjusting
Figure 453506DEST_PATH_IMAGE033
The values of (a) may adjust the weights that the supervised and the unsupervised loss functions account for in the overall loss function,
Figure 155882DEST_PATH_IMAGE032
less than 1 is satisfied.
And S180, judging whether the numerical value of the loss function reaches the minimum value, if not, updating the neural network parameters, and re-executing the step S110 on the new two different text sequences, if so, obtaining the trained neural network encoder model.
Because only one group of two different text sequences are input into the neural network encoder model in the above steps, step S110 needs to be executed again, new text sequences are continuously input into the neural network encoder model to train the neural network encoder model, the neural network parameters of the neural network encoder model are continuously updated in a gradient descending manner in the training process until the numerical value of the loss function is the minimum value, and the training of the neural network encoder model is completed to obtain the trained neural network encoder model.
The training method of the deep neural network encoder model provided by the embodiment is used for training a twin neural network encoder model, the neural network encoder model obtained through training greatly improves the inference bandwidth during semantic similarity calculation between text sequences, and accurate calculation of similarity between two text sequences can be achieved based on the neural network encoder model. Meanwhile, in the training process, a loss function of the neural network encoder model is constructed in a mode of combining self-supervision and supervision to jointly train the neural network encoder model, and finally, the updated neural network parameters are beneficial to improving the accuracy of semantic similarity calculation of the neural network encoder model on the semantic level. Because the neural network encoder model captures context semantic information well, when the neural network encoder model is applied to multi-turn conversation scenes such as communication lines, different conversation scenes can be distinguished more intelligently and automatically, abnormal communication behaviors can be found in time, and the intelligent degree of voice service management is improved.
Example 2
Based on the same concept as that of embodiment 1, this embodiment provides a method for predicting similarity of text sequences, which mainly predicts the similarity of two different text sequences by using a neural network encoder model obtained by training the neural network encoder model provided in the embodiment.
As shown in fig. 3 and 4, the method includes:
s210, inputting two different text sequences into the embedded layer for vectorization to obtain two text sequence vectors;
before this step is performed, two types of text data requiring prediction similarity may be determined, and may be preprocessed by serialization or the like, so that the two types of text data become two types of text sequences and are compatible with the embedding layer, the neural network encoder model, and the pooling layer.
S220, inputting the two text sequence vectors into the trained neural network encoder model so that the neural network encoder model outputs the hidden states of the two text sequence vectors;
after the trained neural network encoder model receives the two text sequence vectors, each encoder model submodule of the neural network encoder model is according to the formula
Figure 235703DEST_PATH_IMAGE034
Determining a hidden state of the text sequence vector, wherein,
Figure 150569DEST_PATH_IMAGE035
is a hidden state of the text sequence vector,
Figure 15757DEST_PATH_IMAGE036
in order to be a non-linear activation function,
Figure 623456DEST_PATH_IMAGE037
in order to take care of the force-mechanism transformation,
Figure 941305DEST_PATH_IMAGE038
in order to be a parameter of the neural network,
Figure 925441DEST_PATH_IMAGE039
is the input text sequence vector.
In a specific implementation process, the neural network encoder model comprises a plurality of neural network encoder model sub-modules, the output of one sub-module is used as the input of the next sub-module in a front-back series connection mode and used for iteratively calculating the hidden state of the text sequence vector, and the last encoder model sub-module outputs the hidden state of the text sequence vector as the hidden state of the text sequence vector output by the final model.
S230, inputting the hidden states of the two text sequence vectors into a pooling layer so that the pooling layer performs pooling treatment on the two text sequence vectors according to the hidden states of the two text sequence vectors;
and after receiving the hidden states of the two text sequence vectors, the pooling layer maps the hidden states of the two text sequences to a semantic vector space with a fixed size to obtain the semantic vectors with a uniform size.
And S240, determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.
In this step, the formula is used
Figure 628824DEST_PATH_IMAGE040
The similarity of two text sequences is determined.
Wherein the content of the first and second substances,
Figure 673003DEST_PATH_IMAGE041
is the degree of similarity of two text sequences,
Figure 212569DEST_PATH_IMAGE042
and
Figure 396DEST_PATH_IMAGE043
are respectively provided withWhich represents two sequences of text that are to be presented,
Figure 574597DEST_PATH_IMAGE044
is the vector product of the two pooled text sequence vectors,
Figure 773366DEST_PATH_IMAGE045
is the product of the moduli of the two pooled text sequence vectors.
The twin neural network encoder model obtained by the training method provided in embodiment 1 can realize high accuracy of semantic similarity calculation at a semantic level based on the determined neural network parameters, and when the input text sequence is conversation content supervised in a communication line, the neural network encoder model can automatically distinguish different conversation scenes more intelligently, discover abnormal communication behaviors in time, and improve the intelligent degree of voice service management.
The method for predicting similarity of text sequences provided in this embodiment is based on the same concept as that of embodiment 1, and therefore, the same steps and terms, definitions, explanations, specific/preferred embodiments, and beneficial effects thereof as those in embodiment 1 can be referred to the description in embodiment 1, and are not repeated in this embodiment.
Example 3
Based on the same concept as that in embodiments 1 and 2, this embodiment provides a text sequence similarity prediction system, which mainly predicts the similarity between two different text sequences by using a neural network encoder model obtained by training through the neural network encoder model training method provided in embodiment 1.
As shown in fig. 5, the system includes: the word input module 310, the word embedding module 320, the neural network encoder model trained by the training method provided in embodiment 1, the hidden state pooling module 330, and the vector similarity calculation module 340.
The word input module 310 is configured to receive two types of text data input from the outside, serialize the two types of text data to obtain two different text sequences, and output the two different text sequences to the word embedding module 320.
The word embedding module 320 is configured to vector the two text sequences, specifically, map the text sequences into a vector space, so as to obtain text sequence vectors of the two text sequences, and output the text sequence vectors to the neural network encoder model. The neural network encoder model is used to determine the hidden states of the two text sequence vectors based on the neural network parameters and output them to the hidden state pooling module 330.
After the trained neural network encoder model receives the two text sequence vectors, each encoder model submodule of the neural network encoder model is according to the formula
Figure 800228DEST_PATH_IMAGE046
Determining a hidden state of the text sequence vector, wherein,
Figure 391747DEST_PATH_IMAGE047
is a hidden state of the text sequence vector,
Figure 820454DEST_PATH_IMAGE048
in order to be a non-linear activation function,
Figure 206436DEST_PATH_IMAGE049
in order to take care of the force-mechanism transformation,
Figure 986173DEST_PATH_IMAGE050
in order to be a parameter of the neural network,
Figure 850224DEST_PATH_IMAGE051
is the input text sequence vector.
In a specific implementation process, the neural network encoder model comprises a plurality of neural network encoder model sub-modules, the output of one sub-module is used as the input of the next sub-module in a front-back series connection mode and used for iteratively calculating the hidden state of the text sequence vector, and the last encoder model sub-module outputs the hidden state of the text sequence vector as the hidden state of the text sequence vector output by the final model.
The hidden state pooling module 330 is configured to pool the two text sequence vectors according to the hidden states of the two text sequence vectors, specifically, map the hidden states of the two text sequences to a semantic vector space with a fixed size to obtain semantic vectors with a uniform size, and output the semantic vectors to the vector similarity calculation module 340 as the text sequence vectors after pooling.
The vector similarity calculation module 340 is configured to determine the similarity between two text sequences according to the two text sequence vectors after the pooling process.
The vector similarity calculation module 340 is particularly useful for utilizing equations
Figure 385635DEST_PATH_IMAGE052
The similarity of two text sequences is determined. Wherein the content of the first and second substances,
Figure 208097DEST_PATH_IMAGE053
the degree of similarity of the two text sequences,
Figure 209551DEST_PATH_IMAGE020
and
Figure 611714DEST_PATH_IMAGE054
respectively, to represent two text sequences that are,
Figure 15013DEST_PATH_IMAGE055
is the vector product of the two pooled text sequence vectors,
Figure 8377DEST_PATH_IMAGE056
is the product of the moduli of the two pooled text sequence vectors.
The similarity prediction system for text sequences provided in this embodiment is based on the same concept as that of embodiments 1 and 2, and therefore, the same steps and terms, definitions, explanations, specific/preferred embodiments, and the beneficial effects thereof as those of embodiments 1 and 2 can be referred to the descriptions in embodiments 1 and 2, and are not repeated in this embodiment.
It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the technical solutions of the present invention, and are not intended to limit the specific embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention claims should be included in the protection scope of the present invention claims.

Claims (10)

1. A training method of a deep neural network encoder model is characterized by comprising the following steps:
performing training operation on two different text sequences;
the training operation is as follows:
inputting the two text sequences into an embedded layer for vectorization to obtain two text sequence vectors;
inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters;
simultaneously constructing an auto-supervision loss function of the neural network encoder model according to the neural network parameters;
inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer pools the two text sequence vectors according to the hidden states of the two text sequence vectors, and determines the similarity of the two text sequences according to the two text sequence vectors after pooling;
constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences;
determining a loss function of the neural network encoder model according to the unsupervised loss function and the supervised loss function, so that the neural network encoder model updates neural network parameters according to the loss function;
and continuing to execute the training operation on the new two different text sequences until the numerical value of the loss function is the minimum value, so as to obtain the trained neural network encoder model.
2. The method of claim 1, wherein the deep neural network encoder model is a deep neural network encoder model,
determining a loss function of the neural network encoder model according to the auto-supervised loss function and the supervised loss function, specifically comprising: taking the sum of the auto-supervised loss function and the supervised loss function as a loss function of the neural network encoder model.
3. The method of claim 1, wherein the deep neural network encoder model is a deep neural network encoder model,
determining the similarity of the two text sequences according to the two text sequence vectors after the pooling process, specifically comprising: utilizing type
Figure 789973DEST_PATH_IMAGE001
Determining the similarity of the two text sequences;
wherein the content of the first and second substances,
Figure 535075DEST_PATH_IMAGE002
for the similarity of two of said text sequences,
Figure 971873DEST_PATH_IMAGE003
and
Figure 802426DEST_PATH_IMAGE004
respectively, to represent two text sequences that are,
Figure 248451DEST_PATH_IMAGE005
the vector product of the two text sequence vectors after the pooling treatment is obtained;
Figure 582480DEST_PATH_IMAGE006
is the product of the moduli of the two pooled text sequence vectors.
4. The method of claim 3, wherein the deep neural network encoder model is trainedCharacterized in that said supervised loss function is:
Figure 705026DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 22874DEST_PATH_IMAGE008
is composed of
Figure 7011DEST_PATH_IMAGE009
And
Figure 726705DEST_PATH_IMAGE010
the degree of similarity of the real text of (c),
Figure 505305DEST_PATH_IMAGE011
is the number of text sequences that are captured each time a training operation is performed.
5. The method of claim 4, wherein the auto-supervised loss function is:
Figure 310450DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 98278DEST_PATH_IMAGE013
the function of the probability density is represented by,
Figure 924676DEST_PATH_IMAGE014
for the purpose of the neural network parameters,
Figure 139757DEST_PATH_IMAGE015
and
Figure 901039DEST_PATH_IMAGE016
respectively represent the masked language modelsThe next sentence prediction model corresponds to the parameters of the output layer,
Figure 492558DEST_PATH_IMAGE017
and
Figure 921265DEST_PATH_IMAGE018
training data sets for the masking language model and the next sentence prediction model respectively,
Figure 307247DEST_PATH_IMAGE019
and
Figure 86984DEST_PATH_IMAGE020
respectively for the predicted words and the real words of the masked language model,
Figure 200302DEST_PATH_IMAGE021
showing the connection relation between the next sentence prediction model output and the two text sequences before and after the next sentence prediction model output,
Figure 483516DEST_PATH_IMAGE022
representing the true connection relationship with the two text sequences before and after.
6. The method of claim 5, wherein the loss function is:
Figure 305979DEST_PATH_IMAGE023
wherein the content of the first and second substances,
Figure 307433DEST_PATH_IMAGE024
is the auto-supervision loss function;
Figure 975174DEST_PATH_IMAGE025
for the purpose of the supervised function of loss,
Figure 378474DEST_PATH_IMAGE026
for the hyper-parameters used to adjust the weights of the supervised and the unsupervised loss functions,
Figure 355526DEST_PATH_IMAGE027
less than 1 is satisfied.
7. The method for training the deep neural network encoder model according to claim 1, wherein the neural network encoder model determines hidden states of two text sequence vectors based on the same neural network parameters, and specifically comprises:
the neural network encoder model utilizes
Figure 844276DEST_PATH_IMAGE028
Determining the hidden states of the two text sequence vectors;
wherein the content of the first and second substances,
Figure 50130DEST_PATH_IMAGE029
is a hidden state of the text sequence vector,
Figure 307936DEST_PATH_IMAGE030
in order to be a non-linear activation function,
Figure 472201DEST_PATH_IMAGE031
in order to take care of the force-mechanism transformation,
Figure 182668DEST_PATH_IMAGE032
for the purpose of the neural network parameters,
Figure 457791DEST_PATH_IMAGE033
is the input text sequence vector.
8. A method for predicting similarity of text sequences is characterized in that,
inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;
inputting two text sequence vectors into a twin neural network encoder model obtained by training the deep neural network encoder model according to any one of claims 1 to 7, so that the neural network encoder model outputs the hidden states of the two text sequence vectors;
inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors;
and determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.
9. A system for predicting similarity of text sequences, comprising: the system comprises a word input module, a word embedding module, a twin neural network encoder model obtained by training the deep neural network encoder model according to any one of claims 1-7, a hidden state pooling module and a vector similarity calculation module;
the word input module is used for serializing two different text data input from the outside to obtain two different text sequences and outputting the two different text sequences to the word embedding module;
the word embedding module is used for vectorizing the two text sequences to obtain two text sequence vectors and outputting the two text sequence vectors to the neural network encoder model;
the neural network encoder model is used for determining the hidden states of the two text sequence vectors based on the neural network parameters and outputting the hidden states to a hidden state pooling module;
the hidden state pooling module is used for pooling the two text sequence vectors according to the hidden states of the two text sequence vectors and outputting the pooled text sequence vectors to the vector similarity calculation module;
and the vector similarity calculation module is used for determining the similarity of the two text sequences according to the two text sequence vectors after the pooling processing.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for training a deep neural network encoder model according to any one of claims 1 to 7 and/or the method for predicting similarity of text sequences according to claim 8.
CN202210360834.8A 2022-04-07 2022-04-07 Method and storage medium for training encoder model, and method and system for predicting similarity Active CN114490950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210360834.8A CN114490950B (en) 2022-04-07 2022-04-07 Method and storage medium for training encoder model, and method and system for predicting similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210360834.8A CN114490950B (en) 2022-04-07 2022-04-07 Method and storage medium for training encoder model, and method and system for predicting similarity

Publications (2)

Publication Number Publication Date
CN114490950A true CN114490950A (en) 2022-05-13
CN114490950B CN114490950B (en) 2022-07-12

Family

ID=81487384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210360834.8A Active CN114490950B (en) 2022-04-07 2022-04-07 Method and storage medium for training encoder model, and method and system for predicting similarity

Country Status (1)

Country Link
CN (1) CN114490950B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743545A (en) * 2022-06-14 2022-07-12 联通(广东)产业互联网有限公司 Dialect type prediction model training method and device and storage medium
CN115357690A (en) * 2022-10-19 2022-11-18 有米科技股份有限公司 Text repetition removing method and device based on text mode self-supervision
CN115660871A (en) * 2022-11-08 2023-01-31 上海栈略数据技术有限公司 Medical clinical process unsupervised modeling method, computer device, and storage medium
WO2024067779A1 (en) * 2022-09-30 2024-04-04 华为技术有限公司 Data processing method and related apparatus

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3054403A2 (en) * 2015-02-06 2016-08-10 Google, Inc. Recurrent neural networks for data item generation
CN108388888A (en) * 2018-03-23 2018-08-10 腾讯科技(深圳)有限公司 A kind of vehicle identification method, device and storage medium
CN109614471A (en) * 2018-12-07 2019-04-12 北京大学 A kind of open-ended question automatic generation method based on production confrontation network
CN110009013A (en) * 2019-03-21 2019-07-12 腾讯科技(深圳)有限公司 Encoder training and characterization information extracting method and device
CN110347839A (en) * 2019-07-18 2019-10-18 湖南数定智能科技有限公司 A kind of file classification method based on production multi-task learning model
US20200026954A1 (en) * 2019-09-27 2020-01-23 Intel Corporation Video tracking with deep siamese networks and bayesian optimization
CN111144565A (en) * 2019-12-27 2020-05-12 中国人民解放军军事科学院国防科技创新研究院 Self-supervision field self-adaptive deep learning method based on consistency training
CN112149689A (en) * 2020-09-28 2020-12-29 上海交通大学 Unsupervised domain adaptation method and system based on target domain self-supervised learning
CN112396479A (en) * 2021-01-20 2021-02-23 成都晓多科技有限公司 Clothing matching recommendation method and system based on knowledge graph
CN113159945A (en) * 2021-03-12 2021-07-23 华东师范大学 Stock fluctuation prediction method based on multitask self-supervision learning
US20210326660A1 (en) * 2020-04-21 2021-10-21 Google Llc Supervised Contrastive Learning with Multiple Positive Examples
CN113553906A (en) * 2021-06-16 2021-10-26 之江实验室 Method for discriminating unsupervised cross-domain pedestrian re-identification based on class center domain alignment
CN113705772A (en) * 2021-07-21 2021-11-26 浪潮(北京)电子信息产业有限公司 Model training method, device and equipment and readable storage medium
CN113936647A (en) * 2021-12-17 2022-01-14 中国科学院自动化研究所 Training method of voice recognition model, voice recognition method and system
CN114003698A (en) * 2021-12-27 2022-02-01 成都晓多科技有限公司 Text retrieval method, system, equipment and storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3054403A2 (en) * 2015-02-06 2016-08-10 Google, Inc. Recurrent neural networks for data item generation
CN108388888A (en) * 2018-03-23 2018-08-10 腾讯科技(深圳)有限公司 A kind of vehicle identification method, device and storage medium
CN109614471A (en) * 2018-12-07 2019-04-12 北京大学 A kind of open-ended question automatic generation method based on production confrontation network
CN110009013A (en) * 2019-03-21 2019-07-12 腾讯科技(深圳)有限公司 Encoder training and characterization information extracting method and device
CN110347839A (en) * 2019-07-18 2019-10-18 湖南数定智能科技有限公司 A kind of file classification method based on production multi-task learning model
US20200026954A1 (en) * 2019-09-27 2020-01-23 Intel Corporation Video tracking with deep siamese networks and bayesian optimization
CN111144565A (en) * 2019-12-27 2020-05-12 中国人民解放军军事科学院国防科技创新研究院 Self-supervision field self-adaptive deep learning method based on consistency training
US20210326660A1 (en) * 2020-04-21 2021-10-21 Google Llc Supervised Contrastive Learning with Multiple Positive Examples
CN112149689A (en) * 2020-09-28 2020-12-29 上海交通大学 Unsupervised domain adaptation method and system based on target domain self-supervised learning
CN112396479A (en) * 2021-01-20 2021-02-23 成都晓多科技有限公司 Clothing matching recommendation method and system based on knowledge graph
CN113159945A (en) * 2021-03-12 2021-07-23 华东师范大学 Stock fluctuation prediction method based on multitask self-supervision learning
CN113553906A (en) * 2021-06-16 2021-10-26 之江实验室 Method for discriminating unsupervised cross-domain pedestrian re-identification based on class center domain alignment
CN113705772A (en) * 2021-07-21 2021-11-26 浪潮(北京)电子信息产业有限公司 Model training method, device and equipment and readable storage medium
CN113936647A (en) * 2021-12-17 2022-01-14 中国科学院自动化研究所 Training method of voice recognition model, voice recognition method and system
CN114003698A (en) * 2021-12-27 2022-02-01 成都晓多科技有限公司 Text retrieval method, system, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
徐流畅: "预训练深度学习架构下的语义地址匹配与语义空间融合模型研究", 《中国优秀博硕士学位论文全文数据库(博士)基础科学辑》 *
赵龙龙: "基于深度学习组合模型的滚动轴承故障诊断", 《中国优秀博硕士学位论文全文数据库(硕士)工程科技Ⅱ辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743545A (en) * 2022-06-14 2022-07-12 联通(广东)产业互联网有限公司 Dialect type prediction model training method and device and storage medium
CN114743545B (en) * 2022-06-14 2022-09-02 联通(广东)产业互联网有限公司 Dialect type prediction model training method and device and storage medium
WO2024067779A1 (en) * 2022-09-30 2024-04-04 华为技术有限公司 Data processing method and related apparatus
CN115357690A (en) * 2022-10-19 2022-11-18 有米科技股份有限公司 Text repetition removing method and device based on text mode self-supervision
CN115660871A (en) * 2022-11-08 2023-01-31 上海栈略数据技术有限公司 Medical clinical process unsupervised modeling method, computer device, and storage medium
CN115660871B (en) * 2022-11-08 2023-06-06 上海栈略数据技术有限公司 Unsupervised modeling method for medical clinical process, computer equipment and storage medium

Also Published As

Publication number Publication date
CN114490950B (en) 2022-07-12

Similar Documents

Publication Publication Date Title
CN114490950B (en) Method and storage medium for training encoder model, and method and system for predicting similarity
CN112116030B (en) Image classification method based on vector standardization and knowledge distillation
Gu et al. Stack-captioning: Coarse-to-fine learning for image captioning
CN108427771B (en) Abstract text generation method and device and computer equipment
WO2022142041A1 (en) Training method and apparatus for intent recognition model, computer device, and storage medium
CN111708882B (en) Transformer-based Chinese text information missing completion method
CN110648659B (en) Voice recognition and keyword detection device and method based on multitask model
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN109902301B (en) Deep neural network-based relationship reasoning method, device and equipment
CN111930914B (en) Problem generation method and device, electronic equipment and computer readable storage medium
US20220343139A1 (en) Methods and systems for training a neural network model for mixed domain and multi-domain tasks
WO2020155619A1 (en) Method and apparatus for chatting with machine with sentiment, computer device and storage medium
CN111813954B (en) Method and device for determining relationship between two entities in text statement and electronic equipment
JP6738769B2 (en) Sentence pair classification device, sentence pair classification learning device, method, and program
CN112101042A (en) Text emotion recognition method and device, terminal device and storage medium
CN110942774A (en) Man-machine interaction system, and dialogue method, medium and equipment thereof
CN113254615A (en) Text processing method, device, equipment and medium
CN115064154A (en) Method and device for generating mixed language voice recognition model
CN115204143A (en) Method and system for calculating text similarity based on prompt
CN115757695A (en) Log language model training method and system
CN115687609A (en) Zero sample relation extraction method based on Prompt multi-template fusion
CN114662601A (en) Intention classification model training method and device based on positive and negative samples
CN114925681A (en) Knowledge map question-answer entity linking method, device, equipment and medium
CN113177113B (en) Task type dialogue model pre-training method, device, equipment and storage medium
CN115495579A (en) Method and device for classifying text of 5G communication assistant, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant