CN115221977A - Text similarity calculation model training method, calculation method and related device - Google Patents

Text similarity calculation model training method, calculation method and related device Download PDF

Info

Publication number
CN115221977A
CN115221977A CN202211000798.0A CN202211000798A CN115221977A CN 115221977 A CN115221977 A CN 115221977A CN 202211000798 A CN202211000798 A CN 202211000798A CN 115221977 A CN115221977 A CN 115221977A
Authority
CN
China
Prior art keywords
text
training
semantic feature
feature vector
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211000798.0A
Other languages
Chinese (zh)
Inventor
赵韦人
李睿濠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202211000798.0A priority Critical patent/CN115221977A/en
Publication of CN115221977A publication Critical patent/CN115221977A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application discloses a training method, a calculation method and a related device of a text similarity calculation model, wherein a twin network comprising a first branch network, a second branch network and a prediction layer is constructed; respectively carrying out preprocessing, semantic feature extraction and weight configuration on the first training text and the second training text in sequence through a first branch network and a second branch network to obtain a first high-level semantic feature vector and a second high-level semantic feature vector; calculating Euclidean distance between two high-level semantic feature vectors through a prediction layer, and acquiring a similarity prediction value between a training text pair based on the Euclidean distance; the twin network parameters are updated according to the similarity prediction value and the actual similarity value between the training text pairs to obtain a text similarity calculation model, and the technical problems that semantic contents of two texts are not extracted and compared in the prior art, and the effect of different words on semantic expression in different degrees is ignored, so that the digital expression of text semantics is influenced, and the accuracy of similarity calculation results is low are solved.

Description

Text similarity calculation model training method, calculation method and related device
Technical Field
The application relates to the technical field of natural language processing, in particular to a text similarity calculation model training method, a calculation method and a related device.
Background
At present, in the prior art, when the text similarity is calculated, generally, semantic contents of texts to be compared are not extracted and compared, and the effect of different words on semantic expression in different degrees is ignored, so that the digital expression of the text semantics is influenced, further, the calculation result of the text similarity is influenced, and the accuracy of the calculation result is low.
Disclosure of Invention
The application provides a training method and a calculating method of a text similarity calculation model and a related device, which are used for solving the technical problem that the semantic content of a text needing to be compared is not extracted and compared in the prior art, and the effect of different words on semantic expression in different degrees is ignored, so that the digital expression of text semantics is influenced, and the accuracy of a similarity calculation result is low.
In view of this, a first aspect of the present application provides a text similarity calculation model training method, including:
constructing a twin network comprising a first branch network, a second branch network and a prediction layer;
sequentially preprocessing and extracting semantic features of a first training text in the training text pair through the first branch network to obtain a first semantic feature vector, and configuring weights for word vectors in the first semantic feature vector based on an attention mechanism to obtain a first high-level semantic feature vector;
sequentially preprocessing and extracting semantic features of a second training text in the training text pair through the second branch network to obtain a second semantic feature vector, and configuring weights for word vectors in the second semantic feature vector based on an attention mechanism to obtain a second high-level semantic feature vector;
calculating Euclidean distance between the first high-level semantic feature vector and the second high-level semantic feature vector through the prediction layer, and acquiring a similarity prediction value between the training text pairs based on the Euclidean distance;
calculating a loss value according to the similarity predicted value between the training text pairs and the actual similarity value between the training text pairs, and updating the network parameters of the twin network according to the loss value until the twin network converges to obtain a trained text similarity calculation model.
Optionally, the first training text in the training text pair is sequentially preprocessed and semantic feature extracted through the first branch network to obtain a first semantic feature vector, and a weight is configured on a word vector in the first semantic feature vector based on an attention mechanism to obtain a first high-level semantic feature vector, where the method further includes:
and respectively carrying out vectorization processing on the first training text and the second training text in the training text pair, adding a first coding mark at the beginning of the first training text after vectorization processing and a second coding mark at the end of the second training text after vectorization processing, and respectively obtaining a first initial vector corresponding to the first training text and a second initial vector corresponding to the second training text.
Optionally, the first branch network and the second branch network share the same feature extraction network, and the feature extraction network is formed by serially connecting an encoder model, a bidirectional long-short term memory network, and an attention mechanism network;
the encoder model is used for encoding the input initial vector to obtain a text vector;
the bidirectional long and short term memory network is used for extracting semantic features of the text vectors to obtain semantic feature vectors;
the attention mechanism network is used for calculating the semantic feature vectors to obtain total attention distribution, calculating the weight of each word vector in the semantic feature vectors according to the total attention distribution, and weighting the corresponding word vectors according to the weight of each word vector in the semantic feature vectors to obtain high-level semantic feature vectors.
Optionally, the calculating, by the prediction layer, a euclidean distance between the first high-level semantic feature vector and the second high-level semantic feature vector, and obtaining a similarity prediction value between the training text pair based on the euclidean distance includes:
and calculating Euclidean distance between the first high-level semantic feature vector and the second high-level semantic feature vector through the prediction layer, and calculating the Euclidean distance through an activation function to obtain a similarity prediction value between the training text pairs.
A second aspect of the present application provides a text similarity calculation method, including:
acquiring a text pair to be compared;
and inputting the text pairs to be compared into a text similarity calculation model for similarity calculation to obtain similarity values between the text pairs to be compared, wherein the text similarity calculation model is obtained by training through any one of the text similarity calculation model training methods in the first aspect.
The third aspect of the present application provides a text similarity calculation model training apparatus, including:
a construction unit for constructing a twin network comprising a first branch network, a second branch network and a prediction layer;
the first feature extraction unit is used for sequentially preprocessing and extracting semantic features from a first training text in the training text pair through the first branch network to obtain a first semantic feature vector, and configuring weights for word vectors in the first semantic feature vector based on an attention mechanism to obtain a first high-level semantic feature vector;
the second feature extraction unit is used for sequentially preprocessing and extracting semantic features from a second training text in the training text pair through the second branch network to obtain a second semantic feature vector, and configuring weights for word vectors in the second semantic feature vector based on an attention mechanism to obtain a second high-level semantic feature vector;
the prediction unit is used for calculating the Euclidean distance between the first high-level semantic feature vector and the second high-level semantic feature vector through the prediction layer and acquiring a similarity prediction value between the training text pairs based on the Euclidean distance;
and the parameter updating unit is used for calculating a loss value according to the similarity prediction value between the training text pairs and the actual similarity value between the training text pairs, and updating the network parameters of the twin network according to the loss value until the twin network converges to obtain a trained text similarity calculation model.
Optionally, the method further includes:
and the vectorization processing unit is used for respectively carrying out vectorization processing on the first training text and the second training text in the training text pair, adding a first coding mark to the beginning of the sentence of the vectorized first training text and the vectorized second training text, and adding a second coding mark to the end of the sentence to respectively obtain a first initial vector corresponding to the first training text and a second initial vector corresponding to the second training text.
The present application in a fourth aspect provides a text similarity calculation apparatus comprising:
the acquisition unit is used for acquiring the text pairs to be compared;
and the calculation unit is used for inputting the text pairs to be compared into a text similarity calculation model for similarity calculation to obtain similarity values between the text pairs to be compared, wherein the text similarity calculation model is obtained by training through any one of the text similarity calculation model training methods in the first aspect.
A fifth aspect of the present application provides an electronic device comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the text similarity calculation model training method according to any one of the first aspect or the text similarity calculation method according to the second aspect according to instructions in the program code.
A sixth aspect of the present application provides a computer-readable storage medium for storing program code that, when executed by a processor, implements the text similarity calculation model training method according to any one of the first aspect or the text similarity calculation method according to the second aspect.
According to the technical scheme, the method has the following advantages:
the application provides a text similarity calculation model training method, which comprises the following steps: constructing a twin network, wherein the twin network comprises a first branch network, a second branch network and a prediction layer; sequentially preprocessing and extracting semantic features of a first training text in a training text pair through a first branch network to obtain a first semantic feature vector, and configuring weights for word vectors in the first semantic feature vector based on an attention mechanism to obtain a first high-level semantic feature vector; sequentially preprocessing and extracting semantic features of a second training text in the training text pair through a second branch network to obtain a second semantic feature vector, and configuring weights for word vectors in the second semantic feature vector based on an attention mechanism to obtain a second high-level semantic feature vector; calculating Euclidean distance between the first high-level semantic feature vector and the second high-level semantic feature vector through a prediction layer, and acquiring a similarity prediction value between the training text pairs based on the Euclidean distance; and calculating a loss value according to the similarity predicted value between the training text pairs and the actual similarity value between the training text pairs, and updating the network parameters of the twin network through the loss value until the twin network converges to obtain a trained text similarity calculation model.
According to the method, a twin network is used for sequentially preprocessing a training text pair and extracting semantic features respectively to obtain a first semantic feature vector and a second semantic feature vector, weights are configured on word vectors in the two semantic feature vectors based on an attention mechanism to obtain higher semantic feature vectors, and finally the Euclidean distance between the two higher semantic feature vectors is calculated to obtain the similarity value between the two training texts.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive labor.
Fig. 1 is a schematic flowchart of a text similarity calculation model training method according to an embodiment of the present disclosure;
FIG. 2 is a general schematic diagram of a twin network structure provided in an embodiment of the present application;
fig. 3 is a schematic flowchart of a text similarity meter method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a text similarity calculation model training apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a text similarity calculation apparatus according to an embodiment of the present application.
Detailed Description
The application provides a training method and a calculating method of a text similarity calculation model and a related device, which are used for solving the technical problem that the semantic content of a text needing to be compared is not extracted and compared in the prior art, and the effect of different words on semantic expression in different degrees is ignored, so that the digital expression of text semantics is influenced, and the accuracy of a similarity calculation result is low.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For easy understanding, please refer to fig. 1, an embodiment of the present application provides a text similarity calculation model training method, including:
step 101, constructing a twin network, wherein the twin network comprises a first branch network, a second branch network and a prediction layer.
The twin network constructed in the embodiment of the present application includes a first branch network, a second branch network, and a prediction layer, where the first branch network and the second branch network share the same feature extraction network, and output ends of the first branch network and the second branch network are both connected to an input end of the prediction layer, and a specific structure of the twin network may refer to fig. 2.
Step 102, sequentially carrying out preprocessing and semantic feature extraction on a first training text in a training text pair through a first branch network to obtain a first semantic feature vector, and configuring weights for word vectors in the first semantic feature vector based on an attention mechanism to obtain a first high-level semantic feature vector.
Before inputting the training text pair into the twin network, vectorization processing may be performed on the first training text and the second training text in the training text pair, respectively, a first coding flag, such as [ CLS ], may be added at a beginning of a sentence of the vectorized first training text and the vectorized second training text, and a second coding flag, such as [ SEP ], may be added at an end of the sentence, and used as coding flags for starting and ending of the training text, to obtain a first initial vector corresponding to the first training text and a second initial vector corresponding to the second training text, respectively.
The feature extraction network in the first branch network in the embodiment of the application is composed of an encoder model, a bidirectional long-short term memory network and an attention mechanism network which are connected in series;
the encoder model is used for encoding the input initial vector to obtain a text vector;
the bidirectional long-short term memory network is used for extracting semantic features of the text vectors to obtain semantic feature vectors;
and the attention mechanism network is used for calculating the semantic feature vectors to obtain total attention distribution, calculating the weight of each word vector in the semantic feature vectors according to the total attention distribution, and weighting the corresponding word vectors according to the weight of each word vector in the semantic feature vectors to obtain high-level semantic feature vectors.
After the first branch network receives a first initial vector corresponding to an input first training text, the first initial vector is coded through a coder model, and a first text vector is obtained. Specifically, the encoder model in this embodiment may be a Bert model, and the first initial vector is encoded by the Bert model to obtain a first text vector E n =[e 1 ,e 2 ,..,e n ],e 1 ,e 2 ,..,e n Is the word vector in the first text vector, and n is the number of word vectors in the first text vector.
And extracting semantic features of the first text vector through a bidirectional long-short term memory network to obtain a first semantic feature vector. The bidirectional long and short term memory network in the embodiment of the application can be a three-layer bidirectional cyclic long and short term memory network, and is used as a text feature extractor for extracting the semantic feature vector of the first text vector to obtain the first semantic feature vector. Basic parameters of the three-layer bidirectional cyclic long-short term memory network can be set as follows: the first layer dimension is 256, the second layer dimension is 128, and the third layer dimension is 64,dropout =0.5, dropout means that in the training process of the deep learning network, for the neural network unit, the neural network unit is temporarily discarded from the network according to a certain probability. The calculation function formula of the three-layer bidirectional cycle long-short term memory network is as follows:
f t =σ(W f ·[h t-1 ,e t ]+b f );
i t =σ(W i ·[h t-1 ,e t ]+b i );
Figure BDA0003807335930000071
Figure BDA0003807335930000072
o t =σ(W o ·[h t-1 ,e t ]+b o );
h t =o t *tanh(C t );
in the formula, e t For inputting word vectors of text vectors in a three-layer bidirectional cyclic long-short term memory network, h t For hidden states at the t-th time step, W f 、W i 、W o To share the weight matrix, b f 、b i 、b o Is an offset vector, σ () is a sigmoid function, C t Is a long memory cell in a three-layer bidirectional cycle long and short term memory network i t 、o t An input gate and an output gate of the three-layer bidirectional cycle long-short term memory network respectively,
Figure BDA0003807335930000073
is based on the forgetting gate f in three-layer bidirectional cycle long-short term memory network t The resulting transient state, tanh () is the activation function.
And calculating the first semantic feature vector through an attention mechanism network to obtain first total attention distribution, calculating the weight of each word vector in the first semantic feature vector according to the first total attention distribution, and weighting the corresponding word vector according to the weight of each word vector in the first semantic feature vector to obtain a first high-level semantic feature vector. The embodiment of the application utilizes the attention mechanism network to give different weights to each word vector in the semantic feature vectors output in the last stage according to different degrees of the word vectors for semantic representation. Firstly, calculating the correlation between semantic feature information of each word in a first training text and text representation to obtain a first total attention distribution of a first text vector; and then, calculating the weight of each word vector in the first semantic feature vector corresponding to the first text vector according to the first total attention distribution. The calculation formula at this stage is as follows:
Figure BDA0003807335930000081
x t =tanh(W x H x +b x );
Figure BDA0003807335930000082
in the formula (I), the compound is shown in the specification,
Figure BDA0003807335930000083
a first semantic feature vector, h, corresponding to the first text vector obtained at the previous stage x As a word vector in a first semantic feature vector, W x 、b x Is a network parameter, x, of the first attention mechanism network t Is a first total attention distribution, α, of a first text vector t Is the weight of the word vector in the first semantic feature vector,
Figure BDA0003807335930000084
the weighted word vector in the first semantic feature vector is used to finally obtain a first high-level semantic feature vector corresponding to the first text vector
Figure BDA0003807335930000085
And 103, sequentially preprocessing and extracting semantic features of a second training text in the training text pair through a second branch network to obtain a second semantic feature vector, and configuring weights for word vectors in the second semantic feature vector based on an attention mechanism to obtain a second high-level semantic feature vector.
Due to the first branch network andthe second branch networks share the same feature extraction network, so that after the second branch networks receive a second initial vector corresponding to the input second training text, the second initial vector is encoded through an encoder model to obtain a second text vector E m =[e 1 ,e 2 ,..,e m ],e 1 ,e 2 ,..,e m Is the word vector in the second text vector, and m is the number of word vectors in the second text vector.
And semantic feature extraction is carried out on the second text vector through a bidirectional long-short term memory network to obtain a second semantic feature vector.
And calculating the second semantic feature vector through an attention mechanism network to obtain a second total attention distribution, calculating the weight of each word vector in the second semantic feature vector according to the second total attention distribution, and weighting the corresponding word vector according to the weight of each word vector in the second semantic feature vector to obtain a second high-level semantic feature vector. And utilizing an attention mechanism network to give different weights to each word vector in the second semantic feature vector output by the last stage according to different degrees of the word vectors to the semantic representation. The overall training method comprises two stages, firstly, calculating the correlation between semantic feature information of each word in a second training text and text representation to obtain a second total attention distribution of a second text vector; and then, calculating the weight of each word vector in a second semantic feature vector corresponding to the second text vector according to the second total attention distribution. The calculation formula at this stage is as follows:
Figure BDA0003807335930000091
y t =tanh(W y H y +b y );
Figure BDA0003807335930000092
in the formula (I), the compound is shown in the specification,
Figure BDA0003807335930000093
a second semantic feature vector, h, corresponding to the second text vector obtained at the previous stage y As a word vector in a second semantic feature vector, W y 、b y Network parameter, y, for the second attention mechanism network t For a second overall attention distribution of a second text vector,
Figure BDA0003807335930000094
is the weight of the word vector in the second semantic feature vector,
Figure BDA0003807335930000095
the weighted word vector in the second semantic feature vector is used to finally obtain a second high-level semantic feature vector corresponding to the second text vector
Figure BDA0003807335930000096
Wherein n = m.
It should be noted that, a twin network structure is preferably adopted as the whole network structure in the embodiment of the present application, the first branch network and the second branch network share the same feature extraction network, and feature extraction is performed on the first training sample and the second training sample sequentially through the feature extraction network, so that model parameters can be reduced, and information interaction can be increased.
And 104, calculating Euclidean distance between the first high-level semantic feature vector and the second high-level semantic feature vector through a prediction layer, and acquiring a similarity prediction value between the training text pairs based on the Euclidean distance.
Calculating a Euclidean Distance between the first high-level semantic feature vector and the second high-level semantic feature vector through the prediction layer, namely:
Figure BDA0003807335930000097
and calculating the Euclidean distance through an activation function (such as a softmax function) to obtain a similarity prediction value between the training text pairs.
And 105, calculating a loss value according to the similarity prediction value between the training text pairs and the actual similarity value between the training text pairs, and updating the network parameters of the twin network through the loss value until the twin network converges to obtain a trained text similarity calculation model.
Calculating a loss value according to the similarity prediction value between the training text pairs and the actual similarity value between the training text pairs, wherein the specific loss function can adopt the existing loss function, and is not specifically limited herein. And reversely updating the network parameters of the twin network through the loss value until the twin network converges, and obtaining a trained text similarity calculation model if the training iteration number reaches the maximum iteration number or the training error is lower than a preset error threshold value.
In the embodiment of the application, the training text pairs are respectively subjected to preprocessing and semantic feature extraction in sequence through a twin network to obtain a first semantic feature vector and a second semantic feature vector, the weights of word vectors in the two semantic feature vectors are configured based on an attention mechanism to obtain a higher-level semantic feature vector, and finally the Euclidean distance between the two higher-level semantic feature vectors is calculated to obtain the similarity value between the two training texts.
The above is an embodiment of a text similarity calculation model training method provided by the present application, and the following is an embodiment of a text similarity calculation method provided by the present application.
Referring to fig. 3, a text similarity calculation method provided in an embodiment of the present application includes:
and 301, acquiring a text pair to be compared.
And taking the two texts needing to be compared as a text pair to be compared.
Step 302, inputting the text pairs to be compared into a text similarity calculation model for similarity calculation, and obtaining similarity values between the text pairs to be compared.
Inputting the text pairs to be compared into a text similarity calculation model for similarity calculation, and respectively preprocessing the two texts to be compared by the text similarity calculation model to obtain two text vectors; the text similarity calculation model respectively extracts semantic features of the two text vectors to obtain semantic feature vectors corresponding to the two text vectors; the text similarity calculation model configures weights for word vectors in semantic feature vectors corresponding to the two text vectors based on an attention mechanism, and obtains high-level semantic feature vectors corresponding to the two text vectors; the text similarity calculation model calculates the Euclidean distance between high-level semantic feature vectors corresponding to the two text vectors, calculates the Euclidean distance through an activation function to obtain a similarity value between the text pairs to be compared, and can determine the similarity degree of the two texts according to the similarity value calculated by the text similarity calculation model. The text similarity calculation model is obtained by training through the text similarity calculation model training method in the method embodiment.
In the embodiment of the application, after two texts to be compared are respectively preprocessed to obtain two text vectors, semantic feature extraction is respectively carried out on the two text vectors to obtain semantic feature vectors, weights are configured on word vectors in the two semantic feature vectors based on an attention mechanism network to obtain higher-level semantic feature vectors, finally, a similarity value between the two text vectors is obtained by calculating the distance between the higher-level semantic feature vectors corresponding to the two text vectors, semantic contents of the texts to be compared are refined and compared, effects of different words on different degrees of semantic expression are also considered, the accuracy of text similarity calculation results is improved, the accuracy of the text similarity comparison results is improved, the technical problem that in the prior art, the semantic contents of the texts to be compared are not refined and compared, different degrees of effects of different words on semantic expression are ignored, the digital expression of the text semantics is influenced, and the accuracy of the similarity calculation results is low is solved.
The text similarity calculation method provided by the present application is an embodiment of a text similarity calculation model training apparatus provided by the present application.
Referring to fig. 4, an apparatus for training a text similarity calculation model according to an embodiment of the present application includes:
the device comprises a construction unit, a prediction layer and a first branch network, wherein the construction unit is used for constructing a twin network which comprises a first branch network, a second branch network and the prediction layer;
the first feature extraction unit is used for sequentially preprocessing and extracting semantic features from a first training text in a training text pair through a first branch network to obtain a first semantic feature vector, and configuring weights for word vectors in the first semantic feature vector based on an attention mechanism to obtain a first high-level semantic feature vector;
the second feature extraction unit is used for sequentially preprocessing and extracting semantic features of a second training text in the training text pair through a second branch network to obtain a second semantic feature vector, and configuring weights for word vectors in the second semantic feature vector based on an attention mechanism to obtain a second high-level semantic feature vector;
the prediction unit is used for calculating the Euclidean distance between the first high-level semantic feature vector and the second high-level semantic feature vector through the prediction layer and acquiring a similarity prediction value between the training text pairs based on the Euclidean distance;
and the parameter updating unit is used for calculating a loss value according to the similarity predicted value between the training text pairs and the actual similarity value between the training text pairs, and updating the network parameters of the twin network through the loss value until the twin network converges to obtain a trained text similarity calculation model.
As a further improvement, the method further comprises the following steps:
and the vectorization processing unit is used for respectively carrying out vectorization processing on the first training text and the second training text in the training text pair, adding a first coding mark to the beginning of the first training text after the vectorization processing and the second training text after the vectorization processing, and adding a second coding mark to the end of the sentence to respectively obtain a first initial vector corresponding to the first training text and a second initial vector corresponding to the second training text.
As a further improvement, the prediction unit is specifically configured to:
and calculating Euclidean distance between the first high-level semantic feature vector and the second high-level semantic feature vector through a prediction layer, and calculating the Euclidean distance through an activation function to obtain a similarity prediction value between the training text pairs.
In the embodiment of the application, the training text pairs are respectively subjected to preprocessing and semantic feature extraction in sequence through a twin network to obtain a first semantic feature vector and a second semantic feature vector, the weights of word vectors in the two semantic feature vectors are configured based on an attention mechanism to obtain a higher-level semantic feature vector, and finally the Euclidean distance between the two higher-level semantic feature vectors is calculated to obtain the similarity value between the two training texts.
The above is an embodiment of a text similarity calculation model training apparatus provided in the present application, and the following is an embodiment of a text similarity calculation apparatus provided in the present application.
Referring to fig. 5, a text similarity calculation apparatus according to an embodiment of the present application includes:
the acquisition unit is used for acquiring the text pairs to be compared;
and the calculation unit is used for inputting the text pairs to be compared into the text similarity calculation model for similarity calculation to obtain similarity values between the text pairs to be compared, wherein the text similarity calculation model is obtained by training through the text similarity calculation model training method in the method embodiment.
In the embodiment of the application, after two texts to be compared are respectively preprocessed to obtain two text vectors, semantic feature extraction is respectively carried out on the two text vectors to obtain semantic feature vectors, weights are configured on word vectors in the two semantic feature vectors based on an attention mechanism network to obtain higher-level semantic feature vectors, finally, a similarity value between the two text vectors is obtained by calculating the distance between the higher-level semantic feature vectors corresponding to the two text vectors, semantic contents of the texts to be compared are refined and compared, and the effects of different words on different degrees of semantic expression are also considered, so that the accuracy of a text similarity calculation result is improved, the accuracy of the text similarity comparison result is improved, the technical problems that the semantic contents of the texts to be compared are not refined and compared in the prior art, different degrees of effects of different words on semantic expression are ignored, the digital expression of the text semantics is influenced, and the accuracy of the similarity calculation result is low are solved.
The embodiment of the application also provides electronic equipment, which comprises a processor and a memory;
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is used for executing the text similarity calculation model training method or the text similarity calculation method in the foregoing method embodiments according to instructions in the program code.
The embodiment of the present application further provides a computer-readable storage medium, which is used for storing program codes, and when the program codes are executed by a processor, the method for training a text similarity calculation model or the method for calculating text similarity in the foregoing method embodiments is implemented.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for executing all or part of the steps of the method described in the embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device). And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present application.

Claims (10)

1. A training method for a text similarity calculation model is characterized by comprising the following steps:
constructing a twin network comprising a first branch network, a second branch network and a prediction layer;
sequentially preprocessing and extracting semantic features of a first training text in the training text pair through the first branch network to obtain a first semantic feature vector, and configuring weights for word vectors in the first semantic feature vector based on an attention mechanism to obtain a first high-level semantic feature vector;
sequentially preprocessing and extracting semantic features of a second training text in the training text pair through the second branch network to obtain a second semantic feature vector, and configuring weights for word vectors in the second semantic feature vector based on an attention mechanism to obtain a second high-level semantic feature vector;
calculating Euclidean distance between the first high-level semantic feature vector and the second high-level semantic feature vector through the prediction layer, and acquiring a similarity prediction value between the training text pairs based on the Euclidean distance;
calculating a loss value according to the similarity predicted value between the training text pairs and the actual similarity value between the training text pairs, and updating the network parameters of the twin network according to the loss value until the twin network converges to obtain a trained text similarity calculation model.
2. The training method of the text similarity calculation model according to claim 1, wherein the first training text in the training text pair is sequentially preprocessed and extracted with semantic features through the first branch network to obtain a first semantic feature vector, and a first high-level semantic feature vector is obtained by configuring weights for word vectors in the first semantic feature vector based on an attention mechanism, and the method before includes:
and respectively carrying out vectorization processing on the first training text and the second training text in the training text pair, adding a first coding mark at the beginning of the first training text after vectorization processing and a second coding mark at the end of the second training text after vectorization processing, and respectively obtaining a first initial vector corresponding to the first training text and a second initial vector corresponding to the second training text.
3. The training method of the text similarity calculation model according to claim 2, wherein the first branch network and the second branch network share the same feature extraction network, and the feature extraction network is composed of an encoder model, a bidirectional long-short term memory network, and an attention mechanism network connected in series;
the encoder model is used for encoding the input initial vector to obtain a text vector;
the bidirectional long and short term memory network is used for extracting semantic features of the text vectors to obtain semantic feature vectors;
the attention mechanism network is used for calculating the semantic feature vectors to obtain total attention distribution, calculating the weight of each word vector in the semantic feature vectors according to the total attention distribution, and weighting the corresponding word vectors according to the weight of each word vector in the semantic feature vectors to obtain high-level semantic feature vectors.
4. The training method of the text similarity calculation model according to claim 1, wherein the calculating, by the prediction layer, the euclidean distance between the first high-level semantic feature vector and the second high-level semantic feature vector, and obtaining the similarity prediction value between the training text pair based on the euclidean distance comprises:
and calculating Euclidean distance between the first high-level semantic feature vector and the second high-level semantic feature vector through the prediction layer, and calculating the Euclidean distance through an activation function to obtain a similarity prediction value between the training text pairs.
5. A text similarity calculation method, comprising:
acquiring a text pair to be compared;
inputting the text pairs to be compared into a text similarity calculation model for similarity calculation to obtain similarity values between the text pairs to be compared, wherein the text similarity calculation model is obtained by training through the text similarity calculation model training method according to any one of claims 1 to 4.
6. A text similarity calculation model training device, comprising:
a construction unit for constructing a twin network comprising a first branch network, a second branch network and a prediction layer;
the first feature extraction unit is used for sequentially preprocessing and extracting semantic features from a first training text in the training text pair through the first branch network to obtain a first semantic feature vector, and configuring weights for word vectors in the first semantic feature vector based on an attention mechanism to obtain a first high-level semantic feature vector;
the second feature extraction unit is used for sequentially preprocessing and extracting semantic features of a second training text in the training text pair through the second branch network to obtain a second semantic feature vector, and configuring weights for word vectors in the second semantic feature vector based on an attention mechanism to obtain a second high-level semantic feature vector;
the prediction unit is used for calculating Euclidean distance between the first high-level semantic feature vector and the second high-level semantic feature vector through the prediction layer and acquiring a similarity prediction value between the training text pairs based on the Euclidean distance;
and the parameter updating unit is used for calculating a loss value according to the similarity prediction value between the training text pairs and the actual similarity value between the training text pairs, and updating the network parameters of the twin network according to the loss value until the twin network converges to obtain a trained text similarity calculation model.
7. The training apparatus for a text similarity calculation model according to claim 6, further comprising:
and the vectorization processing unit is used for respectively carrying out vectorization processing on the first training text and the second training text in the training text pair, adding a first coding mark to the beginning of the sentence of the vectorized first training text and the vectorized second training text, and adding a second coding mark to the end of the sentence to respectively obtain a first initial vector corresponding to the first training text and a second initial vector corresponding to the second training text.
8. A text similarity calculation apparatus, characterized by comprising:
the acquisition unit is used for acquiring the text pairs to be compared;
a calculating unit, configured to input the text pairs to be compared into a text similarity calculation model for similarity calculation, so as to obtain similarity values between the text pairs to be compared, where the text similarity calculation model is obtained by training according to the text similarity calculation model training method of any one of claims 1 to 4.
9. An electronic device, wherein the device comprises a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the text similarity calculation model training method according to any one of claims 1 to 4 or the text similarity calculation method according to claim 5 according to instructions in the program code.
10. A computer-readable storage medium for storing a program code, wherein the program code when executed by a processor implements the text similarity calculation model training method according to any one of claims 1 to 4 or the text similarity calculation method according to claim 5.
CN202211000798.0A 2022-08-19 2022-08-19 Text similarity calculation model training method, calculation method and related device Pending CN115221977A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211000798.0A CN115221977A (en) 2022-08-19 2022-08-19 Text similarity calculation model training method, calculation method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211000798.0A CN115221977A (en) 2022-08-19 2022-08-19 Text similarity calculation model training method, calculation method and related device

Publications (1)

Publication Number Publication Date
CN115221977A true CN115221977A (en) 2022-10-21

Family

ID=83614855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211000798.0A Pending CN115221977A (en) 2022-08-19 2022-08-19 Text similarity calculation model training method, calculation method and related device

Country Status (1)

Country Link
CN (1) CN115221977A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573815A (en) * 2024-01-17 2024-02-20 之江实验室 Retrieval enhancement generation method based on vector similarity matching optimization
CN117573815B (en) * 2024-01-17 2024-04-30 之江实验室 Retrieval enhancement generation method based on vector similarity matching optimization

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573815A (en) * 2024-01-17 2024-02-20 之江实验室 Retrieval enhancement generation method based on vector similarity matching optimization
CN117573815B (en) * 2024-01-17 2024-04-30 之江实验室 Retrieval enhancement generation method based on vector similarity matching optimization

Similar Documents

Publication Publication Date Title
CN110263323B (en) Keyword extraction method and system based on barrier type long-time memory neural network
CN109947912B (en) Model method based on intra-paragraph reasoning and joint question answer matching
CN109960800B (en) Weak supervision text classification method and device based on active learning
CN108920460B (en) Training method of multi-task deep learning model for multi-type entity recognition
CN107506414B (en) Code recommendation method based on long-term and short-term memory network
CN110969020B (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN111460807B (en) Sequence labeling method, device, computer equipment and storage medium
CN110032632A (en) Intelligent customer service answering method, device and storage medium based on text similarity
CN108509413A (en) Digest extraction method, device, computer equipment and storage medium
CN113220876B (en) Multi-label classification method and system for English text
CN110019758B (en) Core element extraction method and device and electronic equipment
CN110188175A (en) A kind of question and answer based on BiLSTM-CRF model are to abstracting method, system and storage medium
CN112580328A (en) Event information extraction method and device, storage medium and electronic equipment
CN111930952A (en) Method, system, equipment and storage medium for long text cascade classification
CN112989796A (en) Text named entity information identification method based on syntactic guidance
CN114358007A (en) Multi-label identification method and device, electronic equipment and storage medium
CN111597326A (en) Method and device for generating commodity description text
CN111695591A (en) AI-based interview corpus classification method, device, computer equipment and medium
CN112463989A (en) Knowledge graph-based information acquisition method and system
CN116050352A (en) Text encoding method and device, computer equipment and storage medium
CN114638228A (en) Chinese named entity recognition method based on word set self-attention
CN117094325B (en) Named entity identification method in rice pest field
CN116402064B (en) Comment generation method, comment generation system, storage medium and electronic equipment
CN115269768A (en) Element text processing method and device, electronic equipment and storage medium
CN116720519A (en) Seedling medicine named entity identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination