CN113377909B

CN113377909B - Paraphrasing analysis model training method and device, terminal equipment and storage medium

Info

Publication number: CN113377909B
Application number: CN202110642143.2A
Authority: CN
Inventors: 赵盟盟; 王媛; 吴文哲; 王磊; 苏亮州
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2023-07-11
Anticipated expiration: 2041-06-09
Also published as: CN113377909A; WO2022257453A1

Abstract

The application is applicable to the technical field of artificial intelligence, and provides a paraphrasing analysis model training method, a device, terminal equipment and a storage medium, wherein the method comprises the following steps: acquiring a training sample containing two text sections; inputting the training sample into a first network structure of a pre-training model to obtain a target embedded vector of the training sample; inputting the target embedded vector into a high-pass filter layer for information noise filtering processing to obtain a filter vector; respectively inputting the target embedded vector and the filter vector into a second network structure for vector processing to obtain a target loss function value; and carrying out back propagation training on the pre-training model based on the objective loss function value to obtain an objective training model. By adopting the method, the enhancement of the key information can be realized in the vector processing process of the text to be identified, and the prediction accuracy of the target training model can be improved.

Description

Paraphrasing analysis model training method and device, terminal equipment and storage medium

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a paraphrasing analysis model training method, a paraphrasing analysis model training device, terminal equipment and a storage medium.

Background

The text paraphrasing analysis task is one of the common indexes for evaluating the performance quality of the model in the field of natural language processing. Specifically, two text segments are input into a model, and whether the meanings expressed by the two text segments are the same meaning is predicted through model processing, namely, the paraphrase similarity between the two text segments.

Currently, in the training process of text paraphrasing analysis models, vector processing (such as text paraphrasing extraction) is generally performed directly based on a large amount of manually labeled text data, so as to perform model training. However, in the above-mentioned process of vector processing of text data, the processing of the disturbance information commonly existing in the text data is not performed, and the prediction accuracy of the final model is low.

Disclosure of Invention

The embodiment of the application provides a paraphrasing analysis model training method, a device, terminal equipment and a storage medium, which can solve the problem of low prediction accuracy of a text paraphrasing analysis model trained in the prior art.

In a first aspect, an embodiment of the present application provides a paraphrasing analysis model training method, including:

obtaining a training sample, wherein the training sample at least comprises two sections of texts;

inputting the training sample into a first network structure of a pre-training model to obtain a target embedded vector of the training sample; the pre-training model further comprises a high-pass filtering layer and a second network structure;

Inputting the target embedded vector into the high-pass filter layer to perform information noise filtering processing to obtain a filter vector;

respectively inputting the target embedded vector and the filtering vector into the second network structure to perform vector processing, so as to obtain a target loss function value of the training sample;

and carrying out back propagation training on the pre-training model based on the target loss function value to obtain a target training model, wherein the target training model is used for processing two arbitrarily input texts to be identified and outputting the paraphrasing similarity of the two texts to be identified.

In an embodiment, the inputting the training samples into the first network structure of the pre-training model to obtain the target embedded vectors of the training samples includes:

identifying a starting symbol and a segmentation symbol in the training sample;

determining text content between the start symbol and the segmentation symbol as a first text, and determining text content following the segmentation symbol as a second text;

inputting the first text and the second text into the first network structure to obtain a first embedded vector of the first text and a second embedded vector of the second text;

And calculating the average value of the first embedded vector and the second embedded vector, and taking the average value as a target embedded vector of the training sample.

In an embodiment, the first text includes at least one first word segment, the second text includes at least one second word segment, the first embedded vector is composed of word segment embedded vectors corresponding to the first word segment, and the second embedded vector is composed of word segment embedded vectors corresponding to the second word segment;

the inputting the first text and the second text into the first network structure, to obtain a first embedded vector of the first text and a second embedded vector of the second text, includes:

determining a first word vector of a first word segment for any first word segment of the first text; and determining a second word vector of the second word segment for any second word segment of the second text;

determining a first word position vector of the first word segment in the first text relative to the start symbol; and determining a second word position vector of the second word in the second text relative to the segmentation symbol;

Vector addition processing is carried out according to the first word position vector, the first word vector and preset embedded information of the first text, so that a first word segmentation embedded vector of the first word is obtained; vector addition processing is carried out according to the second word position vector, the second word vector and preset embedded information of the second text, so that a second word embedding vector of the second word is obtained;

generating a first word segmentation embedding vector based on the first word segmentation; and generating the second embedded vector based on a second word-segmentation embedded vector of the second word segment.

In an embodiment, the inputting the target embedded vector and the filtering vector into the second network structure to perform vector processing, to obtain a target loss function value of the training sample, includes:

inputting the target embedded vector into the second network structure to perform vector processing, so as to obtain a first prediction result of the second network structure for predicting the paraphrasing similarity of the two text sections; inputting the filtering vector into the second network structure for vector processing to obtain a second prediction result of the second network structure for predicting the paraphrasing similarity of the two text sections;

Respectively calculating the first prediction result and the second prediction result by adopting a preset cross entropy loss function to obtain an original loss function and a filtering loss function;

and calculating the target loss function value according to the original loss function and the filtering loss function.

In an embodiment, the calculating the objective loss function value from the original loss function and the filter loss function comprises:

calculating a corrected original loss function based on a preset first weight value corresponding to the original loss function; and calculating a corrected filter loss function based on a preset second weight value corresponding to the filter loss function;

and taking the sum of the corrected original loss function and the corrected filter loss function as the target loss function value.

In an embodiment, the performing the back propagation training on the pre-training model based on the objective loss function value to obtain an objective training model includes:

and based on the objective loss function value, sequentially carrying out iterative updating on the model parameters in the second network structure and the high-pass filter layer to obtain the objective training model, wherein the objective training model comprises the first network structure, the updated second network structure and the updated high-pass filter layer.

In an embodiment, after the target training model is generated, if the number of texts to be recognized input to the target training model exceeds two, the method further includes:

inputting the text to be identified into a first network structure and a high-pass filter layer in the target training model in sequence aiming at any text to be identified to obtain a filter vector of the text to be identified;

and respectively calculating cosine similarity of the filtering vectors of any two sections of texts to be recognized based on the filtering vectors of the multi-section texts to be recognized, wherein the cosine similarity is used for representing paraphrasing similarity of any two sections of texts to be recognized.

In a second aspect, embodiments of the present application provide a paraphrasing analytical model training device, including:

the acquisition module is used for acquiring a training sample, wherein the training sample at least comprises two sections of texts;

the first input module is used for inputting the training sample into a first network structure of a pre-training model to obtain a target embedded vector of the training sample; the pre-training model further comprises a high-pass filtering layer and a second network structure;

the second input module is used for inputting the target embedded vector into the high-pass filtering layer to perform information noise filtering processing to obtain a filtering vector;

The third input module is used for respectively inputting the target embedded vector and the filtering vector into the second network structure for vector processing to obtain a target loss function value of the training sample;

and the training module is used for carrying out back propagation training on the pre-training model based on the target loss function value to obtain a target training model, wherein the target training model is used for processing two arbitrarily input texts to be recognized and outputting the paraphrase similarity of the two texts to be recognized.

In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method according to any one of the first aspects when the processor executes the computer program.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements a method as in any one of the first aspects above.

In a fifth aspect, embodiments of the present application provide a computer program product for, when run on a terminal device, causing the terminal device to perform the method of any one of the first aspects.

Compared with the prior art, the embodiment of the application has the beneficial effects that: by adopting the existing first network structure capable of carrying out vector processing on the text, after the training sample is subjected to vector processing, a target embedded vector which can contain paraphrase information between two sections of text can be preliminarily obtained, and the time for redesigning the first network structure for carrying out vector processing on the text in the training model is reduced. And then, performing high-pass filtering processing on the target embedded vector to reduce the interference of information noise in the target embedded vector to the model. And then, performing model processing based on the target embedded vector and a more accurate filtering vector to obtain a target loss function value. Therefore, the pre-training model can not only maximally retain the characteristic information between the original two text sections in the vector processing process, but also realize the enhancement of key information in the two text sections based on the filtering vector. And finally, carrying out fine adjustment on various learning parameters and weight parameters in the pre-training model according to the target loss function value to obtain a target training model so as to improve the prediction accuracy of the target training model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an implementation of a paraphrasing analytical model training method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an implementation of S102 of a paraphrase analysis model training method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an implementation of S1023 of a paraphrase analysis model training method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an implementation of S104 of a paraphrasing analytical model training method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an implementation of S1043 of a paraphrasing analytical model training method according to an embodiment of the present application;

FIG. 6 is a flowchart of an implementation of a paraphrasing analytical model training method provided in another embodiment of the present application;

FIG. 7 is a schematic diagram of a paraphrasing analysis model training device according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a terminal device provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

The paraphrase analysis model training method provided by the embodiment of the application can be applied to terminal equipment such as tablet computers, notebook computers, ultra-mobile personal computer (UMPC) and netbooks, and the specific type of the terminal equipment is not limited.

Referring to fig. 1, fig. 1 shows a flowchart of an implementation of a paraphrasing analysis model training method according to an embodiment of the present application, where the method includes the following steps:

s101, acquiring a training sample, wherein the training sample at least comprises two sections of texts.

In one embodiment, the training samples may be obtained from a plurality of existing practical applications in different industries and/or research areas. For example, the practical application of different industries and/or research fields is pre-collected with corresponding sample data sets, and each sample data set contains a corresponding training sample. In this embodiment, the sample data set is specifically a data set for performing paraphrase similarity analysis training. The sample data set specifically includes, but is not limited to, MNLI data set, SNLI data set, and SICK data set.

In one embodiment, when the training samples are samples for paraphrase similarity analysis, each training sample needs to include two pieces of text, and include an actual result (similar result or dissimilar result) between the two pieces of text. Based on the above, after the training model receives the training sample, it can be subjected to model processing, and a prediction result is output. And then, calculating a training loss value based on the predicted result and the actual result so as to update the training model.

In one embodiment, the language of the two text segments includes, but is not limited to, languages in the form of Chinese, english, and the like. In addition, when the languages of the two text segments are not identical, the language of one text segment can be translated by utilizing the existing language translation technology so as to obtain the two text segments with the same language.

S102, inputting the training sample into a first network structure of a pre-training model to obtain a target embedded vector of the training sample; the pre-training model further includes a high pass filter layer and a second network structure.

In an embodiment, the pre-training model may be an existing model or a newly designed network model, which is not limited thereto. In this embodiment, the pre-training model is specifically a Bert model that is improved by a transducer, and is generally used to perform text paraphrasing analysis tasks, so as to reduce the training time of the pre-training model.

In one embodiment, the target embedded vector is a feature vector representing a paraphrasing relationship between two pieces of text. That is, after the training samples are input to the first network structure and vector processed, the resulting target embedded vector may be used to represent paraphrasing relationships between two pieces of text. Then, the second network structure and the high-pass filtering layer can process the target embedded vector and output a prediction result of the paraphrasing similarity between the two text sections. Finally, the second network structure may further calculate a corresponding loss function value based on the prediction result and an actual result of the training sample, and iteratively update the pre-training model.

In an embodiment, the first network structure is a structure for processing training samples to obtain a target embedded vector, which may be a network structure for processing text in an existing Bert model to obtain a text feature vector, which is not described in detail. The second network structure may specifically include a Dropout layer, a Linear layer, a Softmax layer, and a loss function calculation layer. The Dropout layer can solve the problem of model overfitting in the model training process. The Linear function in the Linear layer can gradually converge the model in the training process. The activation function in the Softmax layer can be calculated based on the input feature vector, and a prediction result (paraphrase similarity) between two pieces of text in the training sample is output. The loss function calculation layer may then calculate a loss function value based on the predicted outcome and the actual outcome of the training sample.

Wherein the actual results between training samples are typicallyTaking 1 or 0 as a parameter to participate in calculation. The loss function in the loss function layer may be specifically a cross entropy loss function, and the softmax function is specifically:

wherein xi is the initial probability that the pre-training model predicts that the training sample belongs to the i-th type of predicted result, and f (xi) is the final probability that the obtained predicted training sample belongs to the i-th type of predicted result by performing activation function calculation based on the initial probability of the predicted result. In the process of performing the two-class classification, the class of i is only two classes. However, in the prediction process of performing multi-classification, the number of categories of i is not limited. At this time, the final probability of f (xi) output can also be considered as paraphrase similarity of the text at both ends of the pre-training model output. It should be added that, after the initial probability xi of the i-th type prediction result is obtained, the activation function f (xi) needs to be further modified to improve the expression capability of the second network structure on the whole pre-training model.

In an embodiment, the high-pass filtering layer may be used to perform an information noise filtering process on the input target embedded vector, so as to remove the low-frequency component in the target embedded vector and retain the high-frequency component therein.

S103, inputting the target embedded vector into the high-pass filter layer to perform information noise filtering processing, and obtaining a filter vector.

In an embodiment, the above-mentioned S102 has already explained the high-pass filtering layer, which will not be explained. It should be added that the high frequency component generally includes more accurate vector information in the target embedded vector, so that the obtained filtered vector can better represent the paraphrase information included between two text segments.

In an embodiment, the high-pass filter layer includes a high-pass filter, which can normally pass high-frequency signals when processing the signals, and block and attenuate low-frequency signals below a set threshold. But the magnitude of the blocking and attenuation will vary depending on the frequency and the filtering procedure. In a specific application, when the target embedded vector is input to the high-pass filtering layer, the high-pass filtering layer can convert the target embedded vector into a signal in frequency for representation, and the signal is formed by superposition of signals between a plurality of different frequencies. The conversion of the vector into the signal may be implemented by using an existing test vector conversion technique, which will not be described in detail. And then, the high-frequency signals obtained after the processing of the high-pass filter can be converted into vectors again based on the conversion technology to carry out subsequent model processing.

S104, respectively inputting the target embedded vector and the filter vector into the second network structure for vector processing to obtain a target loss function value of the training sample

In an embodiment, the second network structure is explained above, and will not be explained. It should be added that the target embedded vector and the filtering vector can be simultaneously input into the second network structure to perform model processing, so as to obtain a prediction result between two text segments output by the pre-training model. And then, the pre-training model calculates based on the prediction result and the actual result of the training sample to obtain the objective loss function value. Or the terminal device may input the target embedded vector and the filtering vector into the second network structure, respectively, and may correspondingly obtain two prediction results output by the pre-training model at this time. And then, the second network structure can calculate each predicted result and the actual result respectively, and two loss function values are correspondingly obtained. Finally, the second network structure may sum the two loss function values to obtain a target loss function value, which is not limited.

It should be noted that, after the information noise filtering process is performed on the target embedded vector, although the paraphrase information between the two text segments can be accurately maintained, in practical situations, the information noise filtering process will inevitably lose part of the feature information between the two text segments. Based on this, in this embodiment, the filtered filtering vector and the target embedded vector are input into the second network structure for processing, so that when feature information of the text is reserved to the maximum extent, enhancement of key information in two pieces of text can be achieved based on the filtering vector.

S105, based on the target loss function value, carrying out back propagation training on the pre-training model to obtain a target training model, wherein the target training model is used for processing two arbitrarily input texts to be recognized and outputting paraphrasing similarity of the two texts to be recognized.

In an embodiment, after the objective loss function value is obtained, the pre-training model may perform back propagation training based on the value, so as to update learning parameters and weight parameters of each network layer in the pre-training model, to obtain the objective training model. In addition, the target training model obtained based on the method is specifically a two-class model, which can be used for carrying out model processing on two input texts to be identified and outputting the paraphrasing similarity of the two texts to be identified.

It should be added that, when performing paraphrase similarity prediction, the processing procedure of the target training model on the two sections of texts to be identified may specifically refer to the steps of S102 and S103 to obtain the target embedded vectors and the filtering vectors of the two sections of texts to be identified. Then, when the target training model inputs the target embedded vector and the filter vector into the second network structure, the current target training model does not need to perform back propagation training, that is, does not need to perform loss function value calculation. Therefore, the target training model only needs to input the target embedded vector and the filtering vector into the Dropout layer, the Linear layer and the Softmax layer in the second network structure for processing, and a prediction result can be obtained.

In this embodiment, by adopting the existing first network structure capable of performing vector processing on the text, after performing vector processing on the training sample, a target embedded vector which can include paraphrase information between two pieces of text can be obtained preliminarily, so that the time for redesigning the first network structure for performing vector processing on the text in the training model is reduced. And then, performing high-pass filtering processing on the target embedded vector to reduce the interference of information noise in the target embedded vector to the model. And then, performing model processing based on the target embedded vector and a more accurate filtering vector to obtain a target loss function value. Therefore, the pre-training model can not only maximally retain the characteristic information between the original two text sections in the vector processing process, but also realize the enhancement of key information in the two text sections based on the filtering vector. And finally, carrying out fine adjustment on various learning parameters and weight parameters in the pre-training model according to the target loss function value to obtain a target training model so as to improve the prediction accuracy of the target training model.

Referring to fig. 2, in one embodiment, the training samples are input into the first network structure of the pre-training model at S102 to obtain the target embedded vector of the training samples, which specifically includes the following substeps S1021-1024, which are described in detail below:

S1021, identifying a starting symbol and a segmentation symbol in the training sample.

S1022, determining text content between the start symbol and the segmentation symbol as a first text, and determining text content following the segmentation symbol as a second text.

In an embodiment, the start symbol and the split symbol may be set by the user according to the actual situation, and include but are not limited to letters, numbers, etc., and the present embodiment is not limited to the representation of the start symbol and the split symbol.

In this embodiment, since the above-mentioned step S102 has already described that the training samples are vector-processed using the existing Bert model, the start symbol SEP and the division symbol CLS for dividing the training samples in the Bert model can be used for identification. The terminal device may then determine the text between the SEP symbol and the CLS symbol as a first text and the text following the CLS symbol as a second text.

S1023, inputting the first text and the second text into the first network structure to obtain a first embedded vector of the first text and a second embedded vector of the second text.

In an embodiment, the first embedded vector may be a vector obtained by processing each word segment in the first text. For example, for any first word in the first text, a first word vector of the first word may be obtained; and determining a first word position vector of the first segmentation word in the first text based on the start symbol. And then, carrying out comprehensive processing based on the first word vector, the first word position vector and a preset embedding vector of the first text to obtain a first word segmentation embedding vector of the first word. And then, adding the three vectors to obtain a first word segmentation embedded vector capable of representing the first word segmentation. And finally, executing the steps on each first word to obtain a first word segmentation embedding vector of each first word in the first text. Based on this, the first embedding vector may be considered to be composed of first word segment embedding vectors of the respective first word segments in the first text.

It will be appreciated that the process of obtaining the second embedded vector of the second text is similar to the process of obtaining the first embedded vector described above, and reference is made to the above description.

S1024, calculating the mean value of the first embedded vector and the second embedded vector, and taking the mean value as a target embedded vector of the training sample.

In one embodiment, the first embedded vector and the second embedded vector may be represented by specific digital forms. In this regard, after the first embedding vector and the second embedding vector are obtained, an average value of numbers between the two vectors may be used as a target embedding vector of the training sample, which is not limited. At this time, it can be understood that the target embedded vector is processed based on the first embedded vector and the second embedded vector, and thus the target embedded vector can be considered to include vector information of both embedded vectors.

Referring to fig. 3, in an embodiment, the first text includes at least one first word segment, the second text includes at least one second word segment, the first embedded vector is composed of word segment embedded vectors corresponding to the first word segment, and the second embedded vector is composed of word segment embedded vectors corresponding to the second word segment; inputting the first text and the second text into the first network structure in S1023 to obtain a first embedded vector of the first text and a second embedded vector of the second text, which specifically includes the following substeps S10231-10234, which are described in detail below:

S10231, determining a first word vector of any first word of the first text; and determining a second word vector of the second word segment for any second word segment of the second text.

S10232, determining a first word position vector of the first word relative to the initial symbol in the first text; and determining a second word position vector of the second word in the second text relative to the segmentation symbol.

In an embodiment, each of the first text and the second text includes at least one word, and the first text includes a plurality of first words for explanation. The terminal device may perform text segmentation on the first text based on a preset word vector library to obtain a plurality of first segmentation words. The word vector library comprises a plurality of word segments, and each word segment corresponds to a unique word vector. Based on the above, the terminal device may first use the entire first text as a word segment, and compare the word vectors in the word vector library. If the corresponding word segmentation does not exist, the first character or the last character is reduced, and the rest text is used as a word segmentation to be compared in a word vector library until each character in the first text is matched with the corresponding word segmentation and the word vector. At this time, the corresponding word is the first word.

In one embodiment, after text word segmentation is performed on the first text, a word order of the first word segment in a plurality of first word segments included in the first text may be determined based on the word segmentation result. Then, the word order is used as a word position vector of the first word segmentation.

S10233, vector addition processing is carried out according to the first word position vector, the first word vector and preset embedded information of the first text, and a first word segmentation embedded vector of the first word is obtained; and vector addition processing is carried out according to the second word position vector, the second word vector and preset embedded information of the second text, so as to obtain a second word segmentation embedded vector of the second word.

S10234, generating a first embedding vector based on a first word segmentation embedding vector of the first word segmentation; and generating the second embedded vector based on a second word-segmentation embedded vector of the second word segment.

In an embodiment, the first pre-embedded vector is used to distinguish the text to which the first word belongs, and may be set by the user according to the actual situation. It should be noted that, for a plurality of first words in the first text, the first pre-embedded vector corresponding to each first word is consistent.

It should be noted that, based on the descriptions of S10231-S10233, the first word vector, the first word position vector, and the preset embedding vector may be expressed in specific digital forms. Thus, for the three vectors of the first word segment, the first word segment embedding vector expressed as the first word segment by adding the sum of the three vectors can be used. And finally, determining first word segmentation embedding vectors corresponding to the plurality of first words in the first text respectively, namely obtaining the first embedding vectors of the first text.

In an embodiment, the process of obtaining the second embedded vector of the second text is similar to the process of obtaining the first embedded vector, and reference is made to the above description.

Referring to fig. 4, in an embodiment, in S104, the step of inputting the target embedded vector and the filtering vector into the second network structure to perform vector processing, and obtaining the target loss function value of the training sample specifically includes the following substeps S1041-1043, which are described in detail below:

s1041, inputting the target embedded vector into the second network structure for vector processing, and obtaining a first prediction result of the second network structure for predicting the paraphrasing similarity of the two text sections; and inputting the filtering vector into the second network structure to perform vector processing, so as to obtain a second prediction result of the second network structure for predicting the paraphrasing similarity of the two text sections.

In an embodiment, the second network structure is already explained in the step S102, which will not be explained. It should be noted that, for the target embedded vector and the filtering vector, the second network structure processes the two vectors respectively to obtain the corresponding first prediction result and the second prediction result respectively.

It will be appreciated that the filtered vector is based on processing the target embedded vector. Thus, the second predicted result may also be considered to be closer to the actual result of the training sample than the first predicted result.

S1042, calculating the first prediction result and the second prediction result by adopting a preset cross entropy loss function to obtain an original loss function and a filtering loss function.

S1043, calculating the objective loss function value according to the original loss function and the filtering loss function.

In one embodiment, the cross entropy is used to evaluate the difference between the probability distribution (predicted result) and the true distribution (true result) predicted by the current pre-training model. The reduction of cross entropy loss can improve the prediction accuracy of the pre-training model. It should be added that when Sigmoid or Softmax is used as the activation function in the second network structure, the problem of slow iterative update of the pre-training model with the square loss function can be solved by using the cross entropy to calculate the loss function, compared to using other loss functions (such as the square loss function).

In an embodiment, the calculating the objective loss function value according to the original loss function and the filtering loss function may specifically be: and calculating the sum of the original loss function and the filtering loss function to obtain a target loss function. However, in another embodiment, referring to fig. 5, the above-mentioned calculation of the objective loss function can also be performed by the following sub-steps S10431-S10432, which are described in detail below:

s10431, calculating a corrected original loss function based on a preset first weight value corresponding to the original loss function; and calculating the corrected filter loss function based on a preset second weight value corresponding to the filter loss function.

S10432, taking the sum of the modified original loss function and the modified filter loss function as the objective loss function value.

In one embodiment, the filtering vector described in S105 above may be used to enhance the key information in two pieces of text. Therefore, for the two loss functions, the preset second weight value may be considered to be larger than the preset first weight value. Therefore, the target loss function obtained by calculating the preset first weight value and the preset second weight value can better complete training of the pre-training model so as to obtain the target training model with high prediction accuracy.

In one embodiment, after the objective loss function value is obtained, all the weight parameters and learning parameters of the pre-training model are typically updated. However, in the present embodiment, since the first network structure is a network structure that performs vector processing on text in the existing Bert model, the first network structure can be considered to be a mature network structure. Based on the method, when the model parameters of the pre-training model are iteratively updated, only the model parameters in the second network structure and the high-pass filter layer can be iteratively updated, so that the trained target training model is ensured to have a certain prediction accuracy, and the training time of the target training model can be reduced.

Referring to fig. 6, in an embodiment, after the target training model is generated, if the number of texts to be recognized input to the target training model exceeds two, the method further includes the following steps S11-S12, which are described in detail below:

s11, inputting the text to be identified into a first network structure and a high-pass filtering layer in the target training model in sequence aiming at any text to be identified, and obtaining a filtering vector of the text to be identified.

S12, based on the filtering vectors of the multi-section text to be recognized, respectively calculating cosine similarity of the filtering vectors of any two sections of the text to be recognized, wherein the cosine similarity is used for representing paraphrasing similarity of any two sections of the text to be recognized.

In an embodiment, the cosine similarity is based on the cosine value of the included angle between two vectors in the vector space as a measure for measuring the difference between two individuals (two pieces of text to be recognized), and the closer the value is to 1, the more similar the two vectors are considered, i.e. the more similar the two pieces of text to be recognized are.

In an embodiment, the target training model obtained by the method is suitable for identifying paraphrase similarity of two text segments. However, if the target training model needs to identify the number of multiple texts to be identified, the above method needs to be performed on each text segment.

Specifically, for any text to be identified, after the text to be identified is input into the first network structure of the target training model, the target embedded vector to be identified of the text to be identified can be obtained. And then, inputting the target vector to be identified into a high-pass filtering layer to obtain a filtering vector of the text to be identified. At this time, the filtering vector contains more accurate vector information in the embedded vector of the target to be identified. Therefore, the target training model can calculate cosine similarity of the filtering vectors of the texts to be recognized in pairs as paraphrasing similarity of the texts to be recognized in pairs. At this time, the target training model not only can realize the analysis of paraphrase similarity between two sections of texts to be identified, but also can identify and predict the paraphrase similarity between multiple sections of texts.

Referring to fig. 7, fig. 7 is a block diagram of a paraphrasing analysis model training device according to an embodiment of the present application. The paraphrasing analysis model training device in this embodiment includes modules for executing the steps in the embodiments corresponding to fig. 1 to 6. Please refer to fig. 1 to 6 and the related descriptions in the embodiments corresponding to fig. 1 to 6. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 7, the paraphrasing analytical model training apparatus 700 includes: an acquisition module 710, a first input module 720, a second input module 730, a third input module 740, and a training module 750, wherein:

an obtaining module 710 is configured to obtain a training sample, where the training sample includes at least two pieces of text.

A first input module 720, configured to input the training sample into a first network structure of a pre-training model, to obtain a target embedded vector of the training sample; the pre-training model further includes a high pass filter layer and a second network structure.

And a second input module 730, configured to input the target embedded vector to the high-pass filtering layer for performing information noise filtering processing, so as to obtain a filtered vector.

And a third input module 740, configured to input the target embedded vector and the filtering vector into the second network structure respectively for vector processing, so as to obtain a target loss function value of the training sample.

And the training module 750 is configured to perform back propagation training on the pre-training model based on the objective loss function value to obtain a target training model, where the target training model is used to process two arbitrarily input texts to be identified, and output paraphrase similarity of the two texts to be identified.

In an embodiment, the first input module 720 is further configured to:

identifying a starting symbol and a segmentation symbol in the training sample; determining text content between the start symbol and the segmentation symbol as a first text, and determining text content following the segmentation symbol as a second text; inputting the first text and the second text into the first network structure to obtain a first embedded vector of the first text and a second embedded vector of the second text; and calculating the average value of the first embedded vector and the second embedded vector, and taking the average value as a target embedded vector of the training sample.

In an embodiment, the first text includes at least one first word segment, the second text includes at least one second word segment, the first embedded vector is composed of word segment embedded vectors corresponding to the first word segment, and the second embedded vector is composed of word segment embedded vectors corresponding to the second word segment; the first input module 720 is further configured to:

Determining a first word vector of a first word segment for any first word segment of the first text; and determining a second word vector of the second word segment for any second word segment of the second text; determining a first word position vector of the first word segment in the first text relative to the start symbol; and determining a second word position vector of the second word in the second text relative to the segmentation symbol; vector addition processing is carried out according to the first word position vector, the first word vector and preset embedded information of the first text, so that a first word segmentation embedded vector of the first word is obtained; vector addition processing is carried out according to the second word position vector, the second word vector and preset embedded information of the second text, so that a second word embedding vector of the second word is obtained; generating a first word segmentation embedding vector based on the first word segmentation; and generating the second embedded vector based on a second word-segmentation embedded vector of the second word segment.

In an embodiment, the third input module 740 is further configured to:

inputting the target embedded vector into the second network structure to perform vector processing, so as to obtain a first prediction result of the second network structure for predicting the paraphrasing similarity of the two text sections; inputting the filtering vector into the second network structure for vector processing to obtain a second prediction result of the second network structure for predicting the paraphrasing similarity of the two text sections; respectively calculating the first prediction result and the second prediction result by adopting a preset cross entropy loss function to obtain an original loss function and a filtering loss function; and calculating the target loss function value according to the original loss function and the filtering loss function.

In an embodiment, the third input module 740 is further configured to:

calculating a corrected original loss function based on a preset first weight value corresponding to the original loss function; and calculating a corrected filter loss function based on a preset second weight value corresponding to the filter loss function; and taking the sum of the corrected original loss function and the corrected filter loss function as the target loss function value.

In one embodiment, training module 750 is further to:

In one embodiment, paraphrasing analytical model training apparatus 700 further comprises:

and the fourth input module is used for inputting the text to be identified into the first network structure and the high-pass filter layer in the target training model in sequence aiming at any text to be identified, so as to obtain the filter vector of the text to be identified.

The computing module is used for respectively computing cosine similarity of the filtering vectors of any two sections of texts to be recognized based on the filtering vectors of the multi-section texts to be recognized, wherein the cosine similarity is used for representing paraphrasing similarity of any two sections of texts to be recognized.

It should be understood that, in the block diagram of the paraphrasing analysis model training apparatus shown in fig. 7, each module is configured to perform each step in the embodiments corresponding to fig. 1 to 6, and each step in the embodiments corresponding to fig. 1 to 6 has been explained in detail in the foregoing embodiments, and specific reference is made to fig. 1 to 6 and related descriptions in the embodiments corresponding to fig. 1 to 6, which are not repeated herein.

Fig. 8 is a block diagram of a terminal device according to another embodiment of the present application. As shown in fig. 8, the terminal device 800 of this embodiment includes: a processor 810, a memory 820, and a computer program 830 stored in the memory 820 and executable on the processor 810, such as a program for a paraphrasing analytical model training method. The processor 810, when executing the computer program 830, implements the steps of the various embodiments of the paraphrase analysis model training method described above, such as S101 through S105 shown in fig. 1. Alternatively, the processor 810 may execute the computer program 830 to implement the functions of the modules in the embodiment corresponding to fig. 7, for example, the functions of the units 710 to 750 shown in fig. 7, and refer to the related description in the embodiment corresponding to fig. 7.

By way of example, the computer program 830 may be partitioned into one or more modules, one or more modules stored in the memory 820 and executed by the processor 810 to complete the present application. One or more of the modules may be a series of computer program instruction segments capable of performing particular functions for describing the execution of the computer program 830 in the terminal device 800. For example, the computer program 830 may be divided into an acquisition module, a first input module, a second input module, a third input module, and a training module, each module functioning specifically as above.

Terminal device 800 can include, but is not limited to, a processor 810, a memory 820. It will be appreciated by those skilled in the art that fig. 8 is merely an example of a terminal device 800 and is not intended to limit the terminal device 800, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.

The processor 810 may be a central processing unit, or may be other general purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or any conventional processor or the like.

The memory 820 may be an internal storage unit of the terminal device 800, such as a hard disk or a memory of the terminal device 800. The memory 820 may also be an external storage device of the terminal device 800, such as a plug-in hard disk, a smart memory card, a flash memory card, etc. provided on the terminal device 800. Further, the memory 820 may also include both internal storage units and external storage devices of the terminal device 800.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A paraphrasing analytical model training method, comprising:

2. The paraphrasing analytical model training method of claim 1, wherein the inputting the training samples into the first network structure of the pre-training model to obtain the target embedded vectors of the training samples comprises:

identifying a starting symbol and a segmentation symbol in the training sample;

3. The paraphrasing analysis model training method of claim 2, wherein the first text comprises at least one first word segment, the second text comprises at least one second word segment, the first embedded vector is composed of word segment embedded vectors corresponding to the first word segment, and the second embedded vector is composed of word segment embedded vectors corresponding to the second word segment;

4. A paraphrasing analysis model training method according to any one of claims 1-3, wherein the inputting the target embedded vector and the filtering vector into the second network structure respectively for vector processing to obtain the target loss function value of the training sample comprises:

5. The paraphrasing analytical model training method of claim 4, wherein the calculating the objective loss function value from the original loss function and the filter loss function comprises:

6. A paraphrasing analytical model training method as claimed in any one of claims 1 to 3, wherein said performing a back propagation training on said pre-training model based on said objective loss function value to obtain an objective training model comprises:

7. A paraphrasing analytical model training method as claimed in any one of claims 1 to 3, wherein after the target training model is generated, if the number of text to be identified input to the target training model exceeds two, the method further comprises:

8. A paraphrasing analytical model training device, comprising:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 7.