CN114490950A

CN114490950A - Training method and storage medium of encoder model, and similarity prediction method and system

Info

Publication number: CN114490950A
Application number: CN202210360834.8A
Authority: CN
Inventors: 肖清; 赵文博; 李剑锋; 许程冲; 周丽萍
Original assignee: China Unicom Guangdong Industrial Internet Co Ltd
Current assignee: China Unicom Guangdong Industrial Internet Co Ltd
Priority date: 2022-04-07
Filing date: 2022-04-07
Publication date: 2022-05-13
Anticipated expiration: 2042-04-07
Also published as: CN114490950B

Abstract

The invention provides a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system, comprising the following steps: inputting the two text sequences into the embedding layer to obtain a text sequence vector; inputting the two text sequence vectors into a twin neural network encoder model so as to determine a hidden state based on the same neural network parameters; constructing an automatic supervision loss function according to the neural network parameters; inputting the hidden state into a pooling layer to perform pooling according to the hidden state, determining similarity of two text sequences according to the text sequence vectors after the pooling, and constructing a supervision loss function by using the similarity; determining a loss function according to the self-supervision and supervised loss functions to update neural network parameters; the new text sequence continues to be entered until the value of the loss function is at a minimum. The method greatly improves the reasoning bandwidth of the model when the similarity of the text sequences is calculated, and can realize the accurate calculation of the similarity of the two text sequences based on the trained neural network encoder model.

Description

Training method and storage medium of encoder model, and similarity prediction method and system

Technical Field

The invention relates to the field of text similarity, in particular to a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system.

Background

The text similarity refers to the similarity of two texts, and the application scenes comprise text classification, clustering, text topic detection, topic tracking, machine translation and the like. More specifically, monitoring the call line in the voice communication scene also requires determining the similarity between texts, but the conversation content acquired in the voice communication scene is noisy, mixed with accent and insufficient in information integrity, and in the prior art, whether the conversation content is similar or not needs to be checked manually, which consumes a lot of manpower and time.

Disclosure of Invention

The invention aims to overcome at least one defect of the prior art, and provides a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system, which are used for solving the problems that in the prior art, manual sampling inspection is relied on when text similarity is determined, the detection coverage is small, and the subjectivity is high.

The technical scheme adopted by the invention comprises the following steps:

in a first aspect, the present invention provides a method for training a deep neural network encoder model, including: performing training operations on two different text sequences; the training operation is as follows: inputting the two text sequences into an embedded layer for vectorization to obtain two text sequence vectors; inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters; simultaneously constructing an auto-supervision loss function of the neural network encoder model according to the neural network parameters; inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors, and determining similarity of the two text sequences according to the two text sequence vectors after the pooling processing; constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences; determining a loss function of the neural network encoder model according to the unsupervised loss function and the supervised loss function, so that the neural network encoder model updates neural network parameters according to the loss function; and continuing to execute the training operation on the new two different text sequences until the numerical value of the loss function is the minimum value, so as to obtain the trained neural network encoder model.

The invention provides a method for predicting similarity of text sequences, which comprises the steps of inputting two different text sequences into an embedded layer for vectorization to obtain two text sequence vectors; inputting the two text sequence vectors into a twin neural network encoder model obtained by training the deep neural network encoder model by the training method, so that the neural network encoder model outputs the hidden states of the two text sequence vectors; inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors; and determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.

In a third aspect, the present invention provides a system for predicting similarity of text sequences, including: the system comprises a word input module, a word embedding module, a twin neural network encoder model obtained by training the deep neural network encoder model by the training method, a hidden state pooling module and a vector similarity calculation module; the word input module is used for serializing two different text data input from the outside to obtain two different text sequences and outputting the two different text sequences to the word embedding module; the word embedding module is used for vectorizing the two text sequences to obtain two text sequence vectors and outputting the two text sequence vectors to the neural network encoder model; the neural network encoder model is used for determining the hidden states of the two text sequence vectors based on the neural network parameters and outputting the hidden states to a hidden state pooling module; the hidden state pooling module is used for pooling the two text sequence vectors according to the hidden states of the two text sequence vectors and outputting the pooled text sequence vectors to the vector similarity calculation module; and the vector similarity calculation module is used for determining the similarity of the two text sequences according to the two text sequence vectors after the pooling processing.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the above-mentioned method for training a deep neural network encoder model, and/or the above-mentioned method for predicting similarity of text sequences.

Compared with the prior art, the invention has the beneficial effects that:

the training method of the encoder model provided by the embodiment is used for training to obtain a trained twin neural network encoder model, and the twin neural network encoder model shares the same neural network parameter, so that the inference bandwidth of the model in calculating the semantic similarity between text sequences is greatly increased, and the trained neural network encoder model can be used for realizing the accurate calculation of the similarity between two text sequences. Meanwhile, in the training process, the neural network encoder model is trained in a combined manner of self-supervision and supervision, so that the finally updated neural network parameters are beneficial to improving the accuracy of semantic similarity calculation of the neural network encoder model at the semantic level.

Drawings

FIG. 1 is a schematic flow chart of the method steps S110-S180 in example 1.

Fig. 2 is a schematic diagram of a training process of the neural network encoder model according to embodiment 1.

Fig. 3 is a schematic diagram of a hidden state calculation process of the neural network encoder model according to embodiment 1.

FIG. 4 is a flowchart illustrating steps S210-S240 of the method of embodiment 2.

Fig. 5 is a schematic diagram of a prediction process of the prediction method of embodiment 2.

Fig. 6 is a schematic diagram of a prediction process of the prediction system of embodiment 3.

Detailed Description

The drawings are only for purposes of illustration and are not to be construed as limiting the invention. For a better understanding of the following embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

Example 1

The embodiment provides a training method of a deep neural network encoder model, which is used for training a twin neural network encoder model, wherein the twin neural network can be composed of two sub-networks or one network in a broad sense, and the key point is that the twin neural networks share the same neural network parameter.

As shown in fig. 1 and 2, the method includes the following steps:

s110, inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;

in this step, the text sequence refers to text data that has been preprocessed so as to satisfy an input format compatible with the embedding layer. In a specific embodiment, the pre-treatment comprises:

carrying out data cleaning on the original text data; reading preset special symbols, stop words and a user dictionary word list, removing the special symbols in the text data, combining the read user dictionary to perform word segmentation on the text sequence, and removing the stop words existing in the text data. And converting the text data into a plurality of sub-text sequences, sequencing and splicing the plurality of sub-text sequences according to the length, and cutting according to the preset data size of the training batch to obtain a plurality of text sequences as training data.

The training method provided by the embodiment is used for training a neural network encoder model for calculating the similarity of the text sequences, so that the label is the real similarity between two different text sequences in each group of text sequences. The sets of text sequences that have been selected as input are converted into integer data before entering the embedding layer. In a preferred embodiment, Tokenizer may be employed to convert the text data into integer data.

The embedding layer is used for converting an input text sequence into a vector with a fixed size, specifically mapping the text sequence into a vector space, thereby obtaining text sequence vectors of two text sequences.

S120, inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors;

in this step, after receiving the two text sequence vectors, the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters. The neural network parameters refer to parameters of a neural network encoder model backbone network. The hidden state is a high-dimensional vector obtained by a series of matrix operations and nonlinear transformation in a neural network.

And when the neural network encoder model is initialized, video memory space is distributed according to each internal module, the pre-trained parameters are loaded, and the neural network parameters are read. In a specific embodiment, the neural network coding model may be implemented by a (Bidirectional Encoder) pre-training language model, and when the neural network coding model is initialized, pre-trained BERT parameters are loaded and then the neural network parameters are read.

As shown in fig. 3, in a specific implementation process, the neural network encoder model is composed of N neural network encoder sub-modules, and is used for iteratively calculating the hidden state of the text sequence vector.

A single encoder model submodule in a neural network encoder model receives two text sequence vectors x₁And x₂And then, firstly determining the hidden state of each text sequence vector, and carrying out layer standardization processing on the obtained hidden state to relieve the problem of gradient explosion in the model training process. And inputting the hidden state subjected to layer standardization into a residual error module in the submodule for calculation so as to avoid gradient dispersion caused by excessive network layer number of the neural network encoder model. Inputting the hidden state output by the residual module into the full-link layer in the submodule for processing to obtain a corresponding text sequence vector x output by the encoder submodule₁Hidden state u of₁And corresponding text sequence vector x₂Hidden state u of₂。

N encoder sub-modules are connected in series, each encoder model sub-module calculates the hidden state of the encoder model sub-module relative to the text sequence vector based on the respective internal neural network parameters, the hidden state of the text sequence vector finally output by the sub-module is output to the next encoder model sub-module and serves as the input of the next encoder model sub-module until the last encoder model sub-module outputs the hidden state of the text sequence vector and serves as the hidden state of the text sequence vector output by the final model.

In particular, each encoder model sub-module in the neural network encoder model may be according to the equation:

determining a hidden state of the text sequence vector, wherein,

is a hidden state of the text sequence vector,

in order to be a non-linear activation function,

in order to take care of the force-mechanism transformation,

in order to be a parameter of the neural network,

is the input text sequence vector.

S130, constructing an automatic supervision loss function of a neural network encoder model according to the neural network parameters;

the variable of the self-supervision loss function is a neural network parameter of the neural network encoder model, and is used for updating the neural network parameter in a gradient descending mode so as to enable the loss function to reach the minimum value.

In a specific embodiment, the auto-supervised loss function is:

wherein the content of the first and second substances,

the function of the probability density is represented by,

in order to be a parameter of the neural network,

to obscure the corresponding parameters of the language model output layer,

the corresponding parameters of the layer are output for the next sentence of the prediction model. The Mask Language Model (MLM) refers to a Model that randomly masks some positions in an input text sequence and then predicts the positions Masked by the text sequence. Next sentence prediction model (NS)P) refers to a model for predicting whether two sentences are consecutive two sentences.

In order to obscure the training data set of the language model,

the training data set for the next sentence of the predictive model,

and

words predicted for the masked location and words that are true for the location are separately for the masked language model,

showing the connection relation between the next sentence prediction model output and the two text sequences before and after the next sentence prediction model output,

representing the true connection relationship with the two text sequences before and after.

S140, inputting the hidden states of the two text sequence vectors output by the neural network encoder model into a pooling layer, so that the pooling layer performs pooling treatment on the two text sequence vectors according to the hidden states of the two text sequence vectors;

in this step, after receiving the hidden states of the two text sequence vectors, the pooling layer maps the hidden states to a semantic vector space with a fixed size, so as to obtain semantic vectors of the text sequence vectors in a uniform size, that is, the text sequence vectors after pooling. The fixed size is preset.

S150, determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment;

in this step, two text sequences can be determined by the method commonly used in the prior art for calculating the similarity between two vectorsColumn similarity. In particular embodiments, the formula may be utilized

The similarity of two text sequences is determined.

Wherein the content of the first and second substances,

is the degree of similarity of two text sequences,

and

respectively, to represent two text sequences that are,

is the vector product of the two pooled text sequence vectors,

is the product of the moduli of the two pooled text sequence vectors.

S160, constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences;

the supervised loss function is constructed by the similarity and the real similarity of two text sequences determined by a neural network encoder model, the similarity of the two text sequences is calculated based on pooled text sequence vectors, the pooled text sequence vectors are obtained based on a hidden state output by the neural network encoder model, and the hidden state is obtained based on neural network parameters, so that the neural network parameters certainly influence the similarity calculation of the two text sequences.

In a specific embodiment, the supervised loss function is:

。

wherein the content of the first and second substances,

is composed of

And

the degree of similarity of the real text of (c),

is the number of text sequences that are captured each time a training operation is performed.

S170, determining a loss function of the neural network encoder model according to the self-supervision loss function and the supervision loss function, so that the neural network encoder model updates neural network parameters according to the loss function;

in the step, the loss function of the neural network encoder model is constructed by combining the self-supervision loss function and the supervised loss function, namely, the neural network encoder is trained jointly by combining the self-supervision mode and the supervised mode, which is beneficial to obtaining the optimal solution of the neural network parameters. The combination of the auto-supervised and supervised loss functions may be by adding the two or by performing any suitable operation on the two.

In a specific embodiment, the loss function is

. Wherein the content of the first and second substances,

is an auto-supervision loss function;

in order to have a supervised loss function,

for adjusting the hyperparameters of the weights, i.e. by adjusting

The values of (a) may adjust the weights that the supervised and the unsupervised loss functions account for in the overall loss function,

less than 1 is satisfied.

And S180, judging whether the numerical value of the loss function reaches the minimum value, if not, updating the neural network parameters, and re-executing the step S110 on the new two different text sequences, if so, obtaining the trained neural network encoder model.

Because only one group of two different text sequences are input into the neural network encoder model in the above steps, step S110 needs to be executed again, new text sequences are continuously input into the neural network encoder model to train the neural network encoder model, the neural network parameters of the neural network encoder model are continuously updated in a gradient descending manner in the training process until the numerical value of the loss function is the minimum value, and the training of the neural network encoder model is completed to obtain the trained neural network encoder model.

The training method of the deep neural network encoder model provided by the embodiment is used for training a twin neural network encoder model, the neural network encoder model obtained through training greatly improves the inference bandwidth during semantic similarity calculation between text sequences, and accurate calculation of similarity between two text sequences can be achieved based on the neural network encoder model. Meanwhile, in the training process, a loss function of the neural network encoder model is constructed in a mode of combining self-supervision and supervision to jointly train the neural network encoder model, and finally, the updated neural network parameters are beneficial to improving the accuracy of semantic similarity calculation of the neural network encoder model on the semantic level. Because the neural network encoder model captures context semantic information well, when the neural network encoder model is applied to multi-turn conversation scenes such as communication lines, different conversation scenes can be distinguished more intelligently and automatically, abnormal communication behaviors can be found in time, and the intelligent degree of voice service management is improved.

Example 2

Based on the same concept as that of embodiment 1, this embodiment provides a method for predicting similarity of text sequences, which mainly predicts the similarity of two different text sequences by using a neural network encoder model obtained by training the neural network encoder model provided in the embodiment.

As shown in fig. 3 and 4, the method includes:

s210, inputting two different text sequences into the embedded layer for vectorization to obtain two text sequence vectors;

before this step is performed, two types of text data requiring prediction similarity may be determined, and may be preprocessed by serialization or the like, so that the two types of text data become two types of text sequences and are compatible with the embedding layer, the neural network encoder model, and the pooling layer.

S220, inputting the two text sequence vectors into the trained neural network encoder model so that the neural network encoder model outputs the hidden states of the two text sequence vectors;

after the trained neural network encoder model receives the two text sequence vectors, each encoder model submodule of the neural network encoder model is according to the formula

Determining a hidden state of the text sequence vector, wherein,

is a hidden state of the text sequence vector,

in order to be a non-linear activation function,

in order to take care of the force-mechanism transformation,

in order to be a parameter of the neural network,

is the input text sequence vector.

In a specific implementation process, the neural network encoder model comprises a plurality of neural network encoder model sub-modules, the output of one sub-module is used as the input of the next sub-module in a front-back series connection mode and used for iteratively calculating the hidden state of the text sequence vector, and the last encoder model sub-module outputs the hidden state of the text sequence vector as the hidden state of the text sequence vector output by the final model.

S230, inputting the hidden states of the two text sequence vectors into a pooling layer so that the pooling layer performs pooling treatment on the two text sequence vectors according to the hidden states of the two text sequence vectors;

and after receiving the hidden states of the two text sequence vectors, the pooling layer maps the hidden states of the two text sequences to a semantic vector space with a fixed size to obtain the semantic vectors with a uniform size.

And S240, determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.

In this step, the formula is used

The similarity of two text sequences is determined.

Wherein the content of the first and second substances,

is the degree of similarity of two text sequences,

and

are respectively provided withWhich represents two sequences of text that are to be presented,

is the vector product of the two pooled text sequence vectors,

is the product of the moduli of the two pooled text sequence vectors.

The twin neural network encoder model obtained by the training method provided in embodiment 1 can realize high accuracy of semantic similarity calculation at a semantic level based on the determined neural network parameters, and when the input text sequence is conversation content supervised in a communication line, the neural network encoder model can automatically distinguish different conversation scenes more intelligently, discover abnormal communication behaviors in time, and improve the intelligent degree of voice service management.

The method for predicting similarity of text sequences provided in this embodiment is based on the same concept as that of embodiment 1, and therefore, the same steps and terms, definitions, explanations, specific/preferred embodiments, and beneficial effects thereof as those in embodiment 1 can be referred to the description in embodiment 1, and are not repeated in this embodiment.

Example 3

Based on the same concept as that in embodiments 1 and 2, this embodiment provides a text sequence similarity prediction system, which mainly predicts the similarity between two different text sequences by using a neural network encoder model obtained by training through the neural network encoder model training method provided in embodiment 1.

As shown in fig. 5, the system includes: the word input module 310, the word embedding module 320, the neural network encoder model trained by the training method provided in embodiment 1, the hidden state pooling module 330, and the vector similarity calculation module 340.

The word input module 310 is configured to receive two types of text data input from the outside, serialize the two types of text data to obtain two different text sequences, and output the two different text sequences to the word embedding module 320.

The word embedding module 320 is configured to vector the two text sequences, specifically, map the text sequences into a vector space, so as to obtain text sequence vectors of the two text sequences, and output the text sequence vectors to the neural network encoder model. The neural network encoder model is used to determine the hidden states of the two text sequence vectors based on the neural network parameters and output them to the hidden state pooling module 330.

Determining a hidden state of the text sequence vector, wherein,

is a hidden state of the text sequence vector,

in order to be a non-linear activation function,

in order to take care of the force-mechanism transformation,

in order to be a parameter of the neural network,

is the input text sequence vector.

The hidden state pooling module 330 is configured to pool the two text sequence vectors according to the hidden states of the two text sequence vectors, specifically, map the hidden states of the two text sequences to a semantic vector space with a fixed size to obtain semantic vectors with a uniform size, and output the semantic vectors to the vector similarity calculation module 340 as the text sequence vectors after pooling.

The vector similarity calculation module 340 is configured to determine the similarity between two text sequences according to the two text sequence vectors after the pooling process.

The vector similarity calculation module 340 is particularly useful for utilizing equations

The similarity of two text sequences is determined. Wherein the content of the first and second substances,

the degree of similarity of the two text sequences,

and

respectively, to represent two text sequences that are,

is the vector product of the two pooled text sequence vectors,

is the product of the moduli of the two pooled text sequence vectors.

The similarity prediction system for text sequences provided in this embodiment is based on the same concept as that of embodiments 1 and 2, and therefore, the same steps and terms, definitions, explanations, specific/preferred embodiments, and the beneficial effects thereof as those of embodiments 1 and 2 can be referred to the descriptions in embodiments 1 and 2, and are not repeated in this embodiment.

It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the technical solutions of the present invention, and are not intended to limit the specific embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention claims should be included in the protection scope of the present invention claims.

Claims

1. A training method of a deep neural network encoder model is characterized by comprising the following steps:

performing training operation on two different text sequences;

the training operation is as follows:

inputting the two text sequences into an embedded layer for vectorization to obtain two text sequence vectors;

inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters;

simultaneously constructing an auto-supervision loss function of the neural network encoder model according to the neural network parameters;

inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer pools the two text sequence vectors according to the hidden states of the two text sequence vectors, and determines the similarity of the two text sequences according to the two text sequence vectors after pooling;

constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences;

determining a loss function of the neural network encoder model according to the unsupervised loss function and the supervised loss function, so that the neural network encoder model updates neural network parameters according to the loss function;

and continuing to execute the training operation on the new two different text sequences until the numerical value of the loss function is the minimum value, so as to obtain the trained neural network encoder model.

2. The method of claim 1, wherein the deep neural network encoder model is a deep neural network encoder model,

determining a loss function of the neural network encoder model according to the auto-supervised loss function and the supervised loss function, specifically comprising: taking the sum of the auto-supervised loss function and the supervised loss function as a loss function of the neural network encoder model.

3. The method of claim 1, wherein the deep neural network encoder model is a deep neural network encoder model,

determining the similarity of the two text sequences according to the two text sequence vectors after the pooling process, specifically comprising: utilizing type

Determining the similarity of the two text sequences;

wherein the content of the first and second substances,

for the similarity of two of said text sequences,

and

respectively, to represent two text sequences that are,

the vector product of the two text sequence vectors after the pooling treatment is obtained;

is the product of the moduli of the two pooled text sequence vectors.

4. The method of claim 3, wherein the deep neural network encoder model is trainedCharacterized in that said supervised loss function is:

；

wherein the content of the first and second substances,

is composed of

And

the degree of similarity of the real text of (c),

5. The method of claim 4, wherein the auto-supervised loss function is:

；

wherein the content of the first and second substances,

the function of the probability density is represented by,

for the purpose of the neural network parameters,

and

respectively represent the masked language modelsThe next sentence prediction model corresponds to the parameters of the output layer,

and

training data sets for the masking language model and the next sentence prediction model respectively,

and

respectively for the predicted words and the real words of the masked language model,

6. The method of claim 5, wherein the loss function is:

；

wherein the content of the first and second substances,

is the auto-supervision loss function;

for the purpose of the supervised function of loss,

for the hyper-parameters used to adjust the weights of the supervised and the unsupervised loss functions,

less than 1 is satisfied.

7. The method for training the deep neural network encoder model according to claim 1, wherein the neural network encoder model determines hidden states of two text sequence vectors based on the same neural network parameters, and specifically comprises:

the neural network encoder model utilizes

Determining the hidden states of the two text sequence vectors;

wherein the content of the first and second substances,

is a hidden state of the text sequence vector,

in order to be a non-linear activation function,

in order to take care of the force-mechanism transformation,

for the purpose of the neural network parameters,

is the input text sequence vector.

8. A method for predicting similarity of text sequences is characterized in that,

inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;

inputting two text sequence vectors into a twin neural network encoder model obtained by training the deep neural network encoder model according to any one of claims 1 to 7, so that the neural network encoder model outputs the hidden states of the two text sequence vectors;

inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors;

and determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.

9. A system for predicting similarity of text sequences, comprising: the system comprises a word input module, a word embedding module, a twin neural network encoder model obtained by training the deep neural network encoder model according to any one of claims 1-7, a hidden state pooling module and a vector similarity calculation module;

the word input module is used for serializing two different text data input from the outside to obtain two different text sequences and outputting the two different text sequences to the word embedding module;

the word embedding module is used for vectorizing the two text sequences to obtain two text sequence vectors and outputting the two text sequence vectors to the neural network encoder model;

the neural network encoder model is used for determining the hidden states of the two text sequence vectors based on the neural network parameters and outputting the hidden states to a hidden state pooling module;

the hidden state pooling module is used for pooling the two text sequence vectors according to the hidden states of the two text sequence vectors and outputting the pooled text sequence vectors to the vector similarity calculation module;

and the vector similarity calculation module is used for determining the similarity of the two text sequences according to the two text sequence vectors after the pooling processing.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for training a deep neural network encoder model according to any one of claims 1 to 7 and/or the method for predicting similarity of text sequences according to claim 8.