CN113919424A

CN113919424A - Training of text processing model, text processing method, device, equipment and medium

Info

Publication number: CN113919424A
Application number: CN202111175460.4A
Authority: CN
Inventors: 卢宇翔; 黄世维
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-09
Filing date: 2021-10-09
Publication date: 2022-01-11

Abstract

The disclosure provides a method, a device, equipment and a medium for training and processing a text processing model, and relates to the technical field of data processing, in particular to the technical field of artificial intelligence, intelligent search and deep learning. The specific implementation scheme is as follows: obtaining a training sample, wherein the training sample comprises a short text and a long text extracted from the same document; inputting the short text into a first text processing model to obtain a first output, and inputting the long text into a second text processing model to obtain a second output; and calculating consistency loss according to the first output and the second output, and adjusting parameters of the first text processing model and parameters of the second text processing model which are shared by parameters. The embodiment of the disclosure can improve the accuracy of the text processing model and improve the processing efficiency of the text processing model.

Description

Training of text processing model, text processing method, device, equipment and medium

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to the field of artificial intelligence, intelligent search, and deep learning technologies, and in particular, to a method, an apparatus, a device, and a medium for training a text processing model.

Background

Today, the internet has become the most important source of information in daily life, and the target can be quickly found by processing the information.

In the field of search, given a query text, it is usually considered to match and score the query text with the title and content text of the candidate set web pages through a model, and rank the top-ranked web pages in front.

Disclosure of Invention

The disclosure provides a method, a device, equipment and a medium for training a text processing model and processing a text.

According to an aspect of the present disclosure, there is provided a training method of a text processing model, including:

obtaining a training sample, wherein the training sample comprises a short text and a long text extracted from the same document;

inputting the short text into a first text processing model to obtain a first output, and inputting the long text into a second text processing model to obtain a second output;

and calculating consistency loss according to the first output and the second output, and adjusting parameters of the first text processing model and parameters of the second text processing model which are shared by parameters.

According to an aspect of the present disclosure, there is also provided a text processing method, including:

inputting a text to be processed into a text processing model to obtain a text processing result;

the text processing model is obtained by training according to a training method of the text processing model according to any one embodiment of the present disclosure.

According to an aspect of the present disclosure, there is provided a training apparatus for a text processing model, including:

the training sample acquisition module is used for acquiring a training sample, wherein the training sample comprises a short text and a long text extracted from the same document;

the model output acquisition module is used for inputting the short text into a first text processing model to obtain first output and inputting the long text into a second text processing model to obtain second output;

and the first parameter adjusting module is used for calculating consistency loss according to the first output and the second output and adjusting the parameters of the first text processing model and the parameters of the second text processing model which are shared by the parameters.

According to an aspect of the present disclosure, there is also provided a text processing apparatus including:

the text processing result determining module is used for inputting the text to be processed into the text processing model to obtain a text processing result; the text processing model is obtained by training according to a training method of the text processing model according to any one embodiment of the present disclosure.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of training a text processing model according to any of the embodiments of the disclosure, or to perform a method of text processing according to any of the embodiments of the disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a training method of a text processing model according to any one of the embodiments of the present disclosure or perform a text processing method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program, which when executed by a processor, implements a method of training a text processing model according to any of the embodiments of the present disclosure, or performs a method of text processing according to any of the embodiments of the present disclosure.

The embodiment of the disclosure can improve the accuracy of the text processing model and improve the processing efficiency of the text processing model.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of a method for training a text processing model according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a training method of a text processing model according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a training method of a text processing model according to an embodiment of the present disclosure;

FIG. 4 is a scene diagram of a training method of a text processing model according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a training method of a text processing model according to an embodiment of the present disclosure;

FIG. 6 is a scene diagram of a training method of a text processing model according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a text processing method provided in accordance with an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a training apparatus for a text processing model provided in accordance with an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a text processing apparatus provided in accordance with an embodiment of the present disclosure;

fig. 10 is a schematic diagram of an electronic device for implementing a text processing model training method or a text processing method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a flowchart of a training method of a text processing model according to an embodiment of the present disclosure, which may be applied to a case of training a model for text processing. The method of this embodiment may be executed by a training apparatus of a text processing model, where the apparatus may be implemented in a software and/or hardware manner, and is specifically configured in an electronic device with a certain data operation capability, where the electronic device may be a client device or a server device, and the client device may be, for example, a mobile phone, a tablet computer, a vehicle-mounted terminal, a desktop computer, and the like.

S101, obtaining a training sample, wherein the training sample comprises a short text and a long text extracted from the same document.

The training samples are used to train the text processing model. The training samples include short text and long text. The text length differs between short and long text. Illustratively, the text length of the short text is 128; the length of the long text is 512, and the unit can be word number, character number or byte number. The text length may refer to a maximum text length, for example, a short text refers to a text having a text length equal to or less than a first text length, and a long text refers to a text having a text length equal to or less than a second text length, wherein the first text length is less than the second text length.

The short text and the long text are texts extracted from the same document. The text extracted from the same document can make the attribute information of the short text and the long text consistent, and the processing results obtained by respectively processing the short text and the long text should be consistent. Wherein the attribute information may include at least one of: title, content, language, etc. Documents may be retrieved from a web page and short and long texts of different lengths extracted therefrom. For example, a first number of sentences may be extracted from the document as short text and a second number of sentences may be extracted as long text, where the first number is less than the second number. The short text and the long text may have at least one of the same word, phrase, sentence, and the like, or may be completely different.

S102, inputting the short text into a first text processing model to obtain a first output, and inputting the long text into a second text processing model to obtain a second output.

The first text processing model is used for processing the short text to obtain a first processing result and determine a first output. The second text processing model is used for processing the long text to obtain a second processing result and determining a second output. The first text processing model and the second text processing model have the same structure and the same parameters. The text processing model may include a feature extraction network, a classification network, and the like, and the feature extraction network is a convolutional neural network, and the classification network is a support vector machine. The text processing model is used for processing the text, and the obtained processing result can be as follows: text vectors and/or probabilities. The processing result is a result directly output by the text processing model, and the processing result can be directly output, or the processing result can be further processed to obtain output.

Illustratively, a text vector is used to represent features of the entered text. The probability is used to classify the text and determine the type of the text, wherein the classification type may include a content type, a format type, a language type, or the like. For example, the content types may include: news type, game type, movie type, etc., and the format type may be a title type, a body type, etc. The language type may be an english type, a chinese type, or a russian type, etc. In addition, the classification type may also be other types, such as, for example, a type of interest to the user and a type of disinterest to the user. This is not particularly limited.

S103, according to the first output and the second output, consistency loss is calculated, and parameters of the first text processing model and parameters of the second text processing model shared by the parameters are adjusted.

In fact, the short text and the long text are texts extracted from the same document, and accordingly, the first text processing model processes the short text, the determined first output, and the second text processing model shared with the parameters processes the long text, and the determined second output should be consistent. But the first output corresponding to the short text is different from the second output corresponding to the long text because the content richness of the extractable feature of the short text is less than the content richness of the extractable feature of the long text. The loss of consistency is used to describe the degree of inconsistency between the first output and the second output. And training the first text processing model and the second text processing model according to the consistency loss to reduce the consistency loss, namely reducing the difference between the first output and the second output, so that the first output corresponding to the short text can be continuously close to the second output corresponding to the long text, thereby realizing that the prediction effect of the first text processing model is close to that of the second text processing model, and further improving the prediction accuracy of the first text processing model for the short text.

The loss of consistency may be determined from a difference between the first output and the second output. Illustratively, the first output is a probability and the second output is a probability, and a difference or ratio between the two probabilities may be calculated and determined as a loss of consistency. As another example, the first output and the second output are vectors, and the distance between the two vectors can be calculated and determined as a loss of consistency.

The first text processing model and the second text processing model are trained simultaneously, and parameters are shared. The parameter sharing refers to adjusting parameters of the first text processing model and the second text processing model to the same attribute value. For example, parameters of the second text processing model may be shared to the first text processing model. And after each round of training is finished, the first text processing model and the second text processing model are the same. The parameter adjustment may be to minimize consistency loss and to update the parameters by gradient feedback. The first text processing model and the second text processing model are trained simultaneously, and after the training is completed, the first text processing model and the second text processing model are the same, and any one text processing model can be applied, for example, a text input by a user is processed on line to obtain a text processing result.

In the prior art, in a search application scenario, a user inputs a query text, matches and scores the query text with titles and content texts of candidate set web pages, and ranks the web pages with high scores in front. However, the text length of the web page content is usually long, and if the whole text is used, the online model prediction performance can have very large pressure. If the performance problem is solved directly by setting the maximum input length to be relatively small (e.g., maximum length to be 128 characters), the effect of the model is lost in part compared to using longer text content (e.g., maximum length to be 512 characters).

According to the technical scheme of the disclosure, a short text is input into a first text processing model to obtain a first output, a long text extracted from the same document is input into a second text processing model to obtain a second output, consistency loss between the first text processing model and the second text processing model is calculated according to the first output and the second output, parameters of the first text processing model and parameters of the second text processing model shared by the parameters are adjusted, the prediction effect of the first text processing model for processing the short text is close to the prediction effect of the second text processing model for processing the long text, the prediction effect of the first text processing model for the short text can be improved, the text processing model learns characteristics with larger difference between the short text and the long text, so that the text processing model learns more text details, and the capability of the text processing model for learning text characteristics can be improved, meanwhile, the length of the input text can be reduced, the processing time of the first text processing model on the input text is shortened, and the text processing efficiency of the first text processing model is improved.

Fig. 2 is a flowchart of another training method of a text processing model according to an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above optional embodiments. And calculating consistency loss according to the first output and the second output, wherein the consistency loss is calculated by: determining a difference between the first output and the second output from the first output and the second output; calculating the loss of consistency from the difference.

S201, obtaining a training sample, wherein the training sample comprises a short text and a long text extracted from the same document.

S202, inputting the short text into a first text processing model to obtain a first output, and inputting the long text into a second text processing model to obtain a second output.

S203, determining the difference between the first output and the second output according to the first output and the second output.

The difference between the first output and the second output may refer to data quantized for the difference between the first output and the second output. The difference between the first output and the second output is used to calculate a loss of consistency. Further, the first output includes a plurality of items of information, the second output includes a plurality of items of information, a difference between the item of information in the first output and the item of information in the second output may be calculated for each item of information, and accordingly, the difference between the first output and the second output may include the difference between the items of information.

S204, calculating the consistency loss according to the difference, and adjusting the parameters of the first text processing model and the parameters of the second text processing model which are shared by the parameters.

From the differences, a consistency loss is calculated, which may be determined from at least one of the differences. In the case where the difference is a difference of a plurality of items of information, the differences may be fused to obtain a loss of consistency. For example, the fusion may include weighted sum, accumulation, multiplication, or the like.

Optionally, the determining a difference between the first output and the second output according to the first output and the second output includes: calculating a difference between the first output text vector and the second output text vector; and/or calculating a difference between the classification probability of the first output and the classification probability of the second output.

The first output and the second output comprise a text vector and/or a classification probability. Text vectors are used to represent features of text. The text vector may include 1 x N elements. The classification probability is used to represent the classification type of the text. The classification probability can be represented by 1 × M elements, and the ith element represents the probability that the text is the ith classification. The difference between the text vectors may be determined by the distance or mean square error between the text vectors, or the like. The difference between the classification probabilities can be determined by the KL divergence (Kullback-Leibler divergence), where the KL divergence represents the difference of the two probability distributions.

In the case where the first output and the second output include only text vectors, a difference between the text vector of the first output and the text vector of the second output is calculated, and a loss of consistency is determined. In the case where the first output and the second output include only the classification probability, a difference between the classification probability of the first output and the classification probability of the second output is calculated, and it is determined as a loss of consistency. In the case where the first output and the second output include a text vector and a classification probability, a difference between the text vector of the first output and the text vector of the second output and a difference between the classification probability of the first output and the classification probability of the second output are calculated, and the accumulated sum is determined as a loss of consistency.

In a specific example, the text vector of the first output is r1, the text vector of the second output is r2, the classification probability of the first output is p1, and the classification probability of the second output is p 2. The differences between text vectors are loss1 ═ MSE (r1, r2) and loss2 ═ KL _ div (p1, p 2). In the case where the output includes only text vectors, the consistency is lost L_A2BMSE (r1, r 2); in case the output only includes classification probabilities, the consistency loss L_A2BKL _ div (p1, p 2); where the output includes text vectors and classification probabilities, the consistency loss L_A2BMSE (r1, r2) + KL _ div (p1, p 2). Wherein MSE is a mean square error function, and KL _ div represents a divergence calculation function.

By calculating the difference between text vectors and/or the difference between classification probabilities as the difference between the first output and the second output, the difference between the first output and the second output can be determined according to the specific content included in the output, the difference between the first output and the second output can be accurately described, the detection accuracy of the difference is improved, the coverage of the difference is increased, and the representativeness of consistency loss is improved, so that the text processing model learns the characteristic information of vectors or probabilities with larger difference between short texts and long texts, and the prediction accuracy of the text processing model is improved.

Optionally, the method for training a text processing model further includes: calculating short text loss according to the difference between the first output and target output corresponding to the short text; calculating a long text loss according to a difference between the second output and a target output corresponding to the long text; and adjusting parameters of the first text processing model and parameters of the second text processing model which are shared by parameters according to the short text loss and the long text loss.

The target output corresponding to the short text may refer to the true value output obtained by inputting the short text into the first text processing model; the target output corresponding to the long text may refer to a true value output obtained by inputting the long text into the second text processing model. Short text loss refers to the difference between the first output and the true value output; long text loss refers to the difference between the second output and the true output. The short text loss is used for adjusting parameters of the first text processing model; the long text loss is used to adjust parameters of the second text processing model.

The training samples may also include target outputs for short text, as well as outputs for long text. Under the condition that the first output and the second output comprise text vectors, the target output corresponding to the short text is a short text standard vector, and the output corresponding to the long text is a long text standard vector; and under the condition that the first output and the second output only comprise the classification probability, the target output corresponding to the short text is the short text standard probability, and the output corresponding to the long text is the long text standard probability. And under the condition that the first output and the second output comprise text vectors and classification probabilities, the target output corresponding to the short text is the short text standard vector and the short text standard probability, and the output corresponding to the long text is the long text standard vector and the long text standard probability. Accordingly, the short text loss includes a difference between the text vector of the first output and the short text standard vector, and/or a difference between the classification probability of the first output and the short text standard probability. The long text loss includes a difference between the text vector of the second output and the long text standard vector, and/or a difference between the classification probability of the second output and the long text standard probability.

In a specific example, the text vector of the first output is r1, the text vector of the second output is r2, the classification probability of the first output is p1, the classification probability of the second output is p2, the short term standard vector is r3, the short term standard probability is p3, the long term standard vector is r4, and the long term standard probability is p 4. In the case where the output includes a text vector and classification probabilities, short text loss L1 ═ MSE (r1, r3) + KL _ div (p1, p3), and long text loss L2 ═ MSE (r2, r4) + KL _ div (p2, p 4).

The target output corresponding to the short text is used as a true value of the first output, the short text loss is calculated according to the difference between the target output and the first output, the target output corresponding to the long text is used as a true value of the second output, the long text loss is calculated according to the difference between the target output and the second output, and the parameters of the two text processing models are adjusted, so that the output of the text processing models approaches to the true value output, and the prediction accuracy of the text processing models is improved.

According to the technical scheme, the difference between the first output and the second output is determined, the consistency loss is calculated, the difference between the output for the short text and the output for the long text can be accurately calculated, the parameters of the two text processing models shared by the parameters are adjusted based on the difference, the prediction effect of the first text processing model for processing the short text is close to the prediction effect of the second text processing model for processing the long text, the prediction effect of the first text processing model for the short text can be improved, the text processing model learns the feature with the larger difference between the short text and the long text, the capability of the text processing model for learning the text feature is improved, and the prediction accuracy of the text processing model is improved.

Fig. 3 is a flowchart of another training method of a text processing model according to an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above optional embodiments. The first target model comprises the first text processing model and the second text processing model; and inputting the short text into a first text processing model to obtain a first output, and inputting the long text into a second text processing model to obtain a second output, wherein the method comprises the following steps: inputting the short text into a first text processing model to obtain a processing result output by the first text processing model, and determining first output; and inputting the long text into a second text processing model to obtain a processing result output by the second text processing model, and determining second output.

S301, obtaining a training sample, wherein the training sample comprises a short text and a long text extracted from the same document, and the first target model comprises the first text processing model and the second text processing model.

The first target model is a single tower model. The single tower model is used to express a type of information, such as to express items (items), as well as, for example, to express users (users). In the single-tower model, only one type of information is vectorized, and interaction with other types of information does not exist, so that the single-tower model is used for independently vectorizing a certain type of information. The first object model includes a first text processing model and a second text processing model. The first text processing model and the second text processing model carry out vectorization processing on the same type of information, and the processing processes of the first text processing model and the second text processing model are independent from each other, so that no interactive behavior exists. The trained first text processing model or the trained second text processing model may be applied separately.

S302, inputting the short text into a first text processing model in the first target model, obtaining a processing result output by the first text processing model, and determining first output.

And aiming at the single tower model, the processing result output by the first text processing model is the first output.

S303, inputting the long text into a second text processing model in the first target model to obtain a processing result output by the second text processing model, and determining second output.

And aiming at the single tower model, the processing result output by the second text processing model is the second output.

S304, according to the first output and the second output, consistency loss is calculated, and parameters of the first text processing model and parameters of the second text processing model shared by parameters are adjusted.

In a specific example, as shown in fig. 4, the training process of the first target model is applied in the scenario. The first target model only comprises a first text processing model 402 and a second text processing model 406, the short text 401 is input into the first text processing model 402, the processing result of the first text processing model 402 is obtained and determined as a first output 403, and the short text loss 404 is calculated according to the target output of the first output 403 corresponding to the short text 401. The long text 405 is input into the second text processing model 406 to obtain a processing result of the second text processing model 406, the processing result is determined as a second output 407, and the long text loss 408 is calculated according to a target output of the second output 407 corresponding to the long text 405. From the difference between the first output 403 and the second output 407, a consistency loss 409 is calculated. The total loss for the first target model is the sum of short text loss 404, long text loss 408, and consistency loss 409. The first target model is trained according to the total loss, namely, the first text processing model 402 and the second text processing model 403 shared by the total loss training parameters are trained according to the minimum total loss, so that the prediction effect of approximating the long text by using the short text is achieved.

As in the previous example, where the output includes a text vector and a classification probability, the total loss L is calculated based on the following formula:

L＝LA2B+L1+L2＝MSE(r1,r2)+KL_div(p1,p2)+MSE(r1,r3)

+KL_div(p1,p3)+MSE(r2,r4)+KL_div(p2,p4)。

in the case where the output includes only text vectors, the total loss L is calculated based on the following formula:

L＝LA2B+L1+L2＝MSE(r1,r2)+MSE(r1,r3)+MSE(r2,r4)。

in the case where the output includes only classification probabilities, the total loss L is calculated based on the following formula:

L＝LA2B+L1+L2＝KL_div(p1,p2)+KL_div(p1,p3)+KL_div(p2,p4)。

according to the technical scheme, the first text processing model and the second text processing model in the single tower model are trained through training the first text processing model and the second text processing model in the first target model, the prediction effect of the single tower model on short texts is improved in an application scene of the single tower model, and the prediction accuracy of the single tower model is improved.

Fig. 5 is a flowchart of another training method of a text processing model according to an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above optional embodiments. The second target model comprises the first text processing model, the second text processing model and a question processing model, and the training sample comprises a question; and inputting the short text into a first text processing model to obtain a first output, and inputting the long text into a second text processing model to obtain a second output, wherein the method comprises the following steps: inputting the short text into a first text processing model in the second target model to obtain a processing result output by the first text processing model; inputting the long text into a second text processing model in the second target model to obtain a processing result output by the second text processing model; inputting the problem into a problem processing model in the second target model to obtain a processing result output by the problem processing model; fusing a processing result output by the first text processing model and a processing result output by the problem processing model to obtain a first output; and fusing the processing result output by the second text processing model and the processing result output by the problem processing model to obtain a second output.

S501, obtaining a training sample, wherein the training sample comprises a short text and a long text extracted from the same document, the second target model comprises the first text processing model, the second text processing model and a problem processing model, and the training sample comprises a problem.

The second target model is a double tower model. The double tower model comprises two towers, wherein one tower is used for expressing one type of information, the other tower is used for expressing the other type of information, one tower is used for expressing items (item), and the other tower is used for expressing users (user). In fact, the two-tower model includes two sub-networks (towers) with independent user and item, the parameters of the two sub-networks are not shared, and the user sub-network is used for extracting user features, for example, the user features include at least one of the following: user identification information, user behavior information, user attribute information, a mobile phone system held by the user and the like. The item sub-network is used for extracting item features, for example, the item features include at least one of the following: item identification information, item category, item attribute information, item source and the like. And performing interactive processing on the user characteristic and the Item characteristic to obtain the association relationship between the user characteristic and the Item characteristic, wherein for example, the items clicked by the user are closer, and the items not clicked by the user or having no objection are farther. The double-tower model can carry out vectorization on two types of information, and different types of information interaction processes exist.

The second object model includes a question processing model, a first text processing model, and a second text processing model. The problem processing model and the first text processing model form a double-tower model, and the problem processing model and the second text processing model form a double-tower model. The first text processing model and the second text processing model carry out vectorization processing on the same type of information, and the processing processes of the first text processing model and the second text processing model are independent from each other, so that no interactive behavior exists. The first text processing model and the second text processing model respectively have interactive behaviors with the problem processing model. The trained first text processing model or the trained second text processing model may need to be applied in conjunction with the problem-handling model. Illustratively, the target output is obtained based on processing the output of the first text processing model and the output of the question processing model.

S502, inputting the short text into a first text processing model in the second target model to obtain a processing result output by the first text processing model.

S503, inputting the long text into a second text processing model in the second target model to obtain a processing result output by the second text processing model.

S504, inputting the question into a question processing model in the second target model, and obtaining a processing result output by the question processing model.

The problem processing model is used for processing the problems to obtain a processing result. The processing result output by the problem processing model is a problem vector corresponding to the problem and used for describing the characteristics of the problem. The problem-processing model may include a feature extraction network. The question may be text to be searched that is input by the user. It may be a pre-trained model, a trained but not yet trained model, or an untrained model.

And S505, fusing the processing result output by the first text processing model and the processing result output by the problem processing model to obtain a first output.

The first output is used to determine a degree of match between the short text and the question. The first output may refer to a feature of the matching content between the short text and the question. The first output can be obtained by fusing the processing result output by the first text processing model and the processing result output by the problem processing model. The fusion method may be concatenation, bitwise multiplication, dot Product (Inner Product), Outer Product (Outer Product), or the like.

The first output includes a text vector and/or a classification probability. The text vector may be obtained by fusing a processing result output by the first text processing model and a processing result output by the problem processing model. The classification probability may further require further processing of the text vector, for example, inputting the text vector into a classification network, for example, the classification network may use a sigmoid function to obtain the classification probability. Correspondingly, the second target model may further include a classification network for processing the fusion result to obtain a classification probability.

S506, the processing result output by the second text processing model and the processing result output by the problem processing model are fused to obtain a second output.

The second output is used to determine the degree of match between the long text and the question. The second output may be characteristic of the matching content between the long text and the question. The second output can be obtained by fusing the processing result output by the second text processing model and the processing result output by the problem processing model. The fusion method can be referred to the above.

The second output includes a text vector and/or a classification probability. The text vector can be directly obtained by fusing the processing result output by the second text processing model and the processing result output by the problem processing model. The classification probability also needs to be obtained by further processing the text vector, for example, inputting the text vector into a classification network to obtain the classification probability.

And S507, calculating consistency loss according to the first output and the second output, and adjusting parameters of the first text processing model and parameters of the second text processing model shared by parameters.

In a specific example, as shown in fig. 6, the training process of the second object model is applied to the scenario. The second object model includes a question processing model 608, a first text processing model 602, and a second text processing model 606. The short text 601 is input into the first text processing model 602, and a processing result 603 of the first text processing model 602 is obtained. The long text 604 is input into the second text processing model 605, resulting in a processing result 606 of the second text processing model 605. The question 607 is input into a question processing model 608, resulting in a processing result 609 of the question processing model 608. The processing result 603 of the first text processing model 602 and the processing result 609 of the problem processing model 608 are fused to obtain a fusion vector, the fusion vector is input into a classification network to obtain a classification probability, and the fusion vector and/or the classification probability are determined as a first output 610. The processing result 606 of the second text processing model 605 and the processing result 609 of the problem processing model 608 are fused to obtain a fusion vector, the fusion vector is input into the classification network to obtain a classification probability, and the fusion vector and/or the classification probability are determined as a second output 611. According to the short text 601 and the question 607, the target output corresponding to the short text 601 is determined, and according to the target output corresponding to the short text 601 and the first output 612, the short text loss 612 is calculated. Based on the long text 604 and the question 607, a target output corresponding to the long text 604 is determined, and based on the target output corresponding to the long text 604 and the second output 611, a long text loss 614 is calculated. From the difference between the first output 610 and the second output 611, a consistency loss 613 is calculated. The total loss for the first target model is the sum of short text loss 612, long text loss 614, and consistency loss 613. The second target model, specifically the first text-processing model 602 and the second text-processing model 605 shared according to the total loss training parameters, is trained according to the total loss. The formula for calculating the total loss can be referred to the previous example.

According to the technical scheme, the processing result output by the problem processing model in the second target model is fused with the processing result output by the first text processing model and the processing result output by the second text processing model respectively to obtain the first output and the second output, so that the first text processing model and the second text processing model are trained, the first text processing model and the second text processing model in the double-tower model are trained, the application scenes of text processing can be increased, the prediction effect of the double-tower model on short texts in the application scenes of the double-tower model is improved, and the prediction accuracy of the double-tower model is improved.

Fig. 7 is a flowchart of a text processing method disclosed in an embodiment of the present disclosure, which may be applied to a case where an input text is processed according to a trained text processing model to obtain a text processing result. The method of the embodiment may be executed by a text processing apparatus, which may be implemented in a software and/or hardware manner and is specifically configured in an electronic device with certain data operation capability, where the electronic device may be a client device or a server device, and the client device may be, for example, a mobile phone, a tablet computer, a vehicle-mounted terminal, a desktop computer, and the like.

S701, inputting a text to be processed into a text processing model to obtain a text processing result, wherein the text processing model is obtained by training according to a training method of the text processing model according to any one embodiment of the disclosure.

The text to be processed is input of the text processing model and can be text input by a user. The text processing results may include text vectors and/or classification probabilities. The text processing result can be an intermediate result, can be processed continuously, or can be a final result and is directly output. The text processing model is obtained by training the training method of the text processing model according to any embodiment of the disclosure. The text processing model may be the first text processing model described above, or may be the second text processing model. The maximum length of the text to be processed is the same as the maximum length of the short text in the training method of the text processing model.

In a specific example, the text processing result includes a text vector, and the text vector is processed continuously to obtain the abstract content of the text to be processed. For another example, the text processing result includes a classification probability, which indicates a classification result of the text to be processed, for example, the type of the text to be processed is a news type, or the type of the text to be processed is a type of interest of the user. For another example, the text processing result includes the classification probability, and the corresponding recommendation information may be queried according to the classification result of the text to be processed, and provided to the user, and the like. And outputting a text processing result or continuously processing the text processing result according to the task content to obtain and output a final result.

Optionally, the text processing method further includes: inputting the problem to be processed corresponding to the text to be processed into a problem processing model to obtain a problem processing result; and determining the matching degree of the text to be processed and the problem to be processed according to the problem processing result and the text processing result so as to determine the search result of the problem to be processed.

The text processing model may also be combined with the problem processing model into a two-tower model. The problem to be processed is the text which needs to be searched and is input by the user. The matching degree is used for determining the matching degree between the text to be processed and the problem to be processed so as to screen out the text matched with the problem to be processed. The search result of the to-be-processed question may refer to a to-be-processed text which is obtained by querying according to the to-be-processed question and the multiple candidate to-be-processed texts in the search application scenario and is matched with the to-be-processed question. It should be noted that the search application scenario may be a scenario in which the user directly provides a question to be searched and wants to obtain an answer to the question, or a scenario in which the user inputs a content that does not need to be searched during browsing or communication, and may need a recommended content of the content. Further, the search application scenario is typically an application scenario in which a search is recalled.

The question processing result may include a text vector of the question to be processed, the text processing result may include a text vector of the text to be processed, and the matching degree may be a similarity value between the text vector of the question to be processed and the text vector of the text to be processed, for example, a distance between two vectors is calculated; the method can also be used for obtaining a text vector of a problem to be processed and a fusion vector of the text to be processed, classifying the fusion vectors, and determining the classification probability, wherein the classification probability is determined as the matching degree. And calculating the matching degree of the problem to be processed and other texts to be processed. According to the matching degree between the problem to be processed and each text to be processed, at least one text to be processed can be screened out and determined as a search result of the problem to be processed.

In one specific example, the problem to be processed is: what is the news of XX? The text to be processed is the text in the news webpage that includes the story of XX. For another example, the problem to be processed is: want to go to the XX restaurant; the text to be processed is the text in the webpage of the address and comment information of the XX restaurant.

The problem processing model is additionally configured to form a double-tower model with the text processing model, the problem to be processed is processed to obtain a problem processing result, the matching degree between the problem to be processed and the text to be processed is calculated by combining the text processing result, the search result of the problem to be processed is determined, the problem to be processed can be applied to interactive scenes between users and objects, information of the objects required by the users is inquired, the application scenes of object search are increased, the application scenes can be enriched, diversified requirements of the users can be met, and user experience is improved

According to the technical scheme, the text to be processed is obtained and input into the text processing model to obtain the text processing result, so that the text with accurate length and less content can be processed, the prediction accuracy of the short text is improved, the processing data amount of text processing is reduced, and the text processing efficiency is improved.

Fig. 8 is a structural diagram of a training device for a text processing model in an embodiment of the present disclosure, and the embodiment of the present disclosure is applied to a case where a model for performing text processing is trained. Where the text processing model is used in the case of converting a source domain style word to a target domain style word. The device is realized by software and/or hardware and is specifically configured in electronic equipment with certain data operation capacity.

Fig. 8 shows a training apparatus 800 for a text processing model, comprising: a training sample acquisition module 801, a model output acquisition module 802 and a first parameter adjustment module 803; wherein,

a training sample obtaining module 801, configured to obtain a training sample, where the training sample includes a short text and a long text extracted from a same document;

a model output obtaining module 802, configured to input the short text into a first text processing model to obtain a first output, and input the long text into a second text processing model to obtain a second output;

a first parameter adjusting module 803, configured to calculate a consistency loss according to the first output and the second output, and adjust a parameter of the first text processing model and a parameter of the second text processing model shared by the parameters.

Further, the first parameter adjusting module 803 includes: a long and short text output difference determining unit, configured to determine a difference between the first output and the second output according to the first output and the second output; and the consistency loss calculating unit is used for calculating the consistency loss according to the difference.

Further, the long and short text output difference determining unit includes: a vector difference calculation subunit for calculating a difference between the first output text vector and the second output text vector; and/or a probability difference calculation subunit for calculating a difference between the classification probability of the first output and the classification probability of the second output.

Further, the first target model comprises the first text processing model and the second text processing model; the model output obtaining module 802 includes: a first output obtaining unit, configured to input the short text into a first text processing model in the first target model, obtain a processing result output by the first text processing model, and determine a first output; and the second output acquisition unit is used for inputting the long text into a second text processing model in the first target model, obtaining a processing result output by the second text processing model and determining second output.

Further, a second target model comprises the first text processing model, the second text processing model and a question processing model, and the training sample comprises a question; the model output obtaining module 802 includes: a first processing result obtaining unit, configured to input the short text into a first text processing model in the second target model, so as to obtain a processing result output by the first text processing model; the second processing result acquisition unit is used for inputting the long text into a second text processing model in the second target model to obtain a processing result output by the second text processing model; the problem processing result acquisition unit is used for inputting the problem into a problem processing model in the second target model to obtain a processing result output by the problem processing model; the first output acquisition unit is used for fusing the processing result output by the first text processing model and the processing result output by the problem processing model to obtain a first output; and the second output acquisition unit is used for fusing the processing result output by the second text processing model and the processing result output by the problem processing model to obtain a second output.

Further, the training device of the text processing model further comprises: the short text loss calculation module is used for calculating short text loss according to the difference between the first output and target output corresponding to the short text; the long text loss calculation module is used for calculating the long text loss according to the difference between the second output and the target output corresponding to the long text; and the second parameter adjusting module is used for adjusting the parameters of the first text processing model and the parameters of the second text processing model which are shared by parameters according to the short text loss and the long text loss.

The training device of the text processing model can execute the training method of the text processing model provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of executing the training method of the text processing model.

Fig. 9 is a structural diagram of a text processing apparatus in an embodiment of the present disclosure, and the embodiment of the present disclosure is suitable for a case where a text processing result is obtained by processing an input text according to a trained text processing model. The device is realized by software and/or hardware and is specifically configured in electronic equipment with certain data operation capacity.

A text processing apparatus 900 shown in fig. 9 includes: a text processing result determining module 901; wherein,

a text processing result determining module 901, configured to input a text to be processed into a text processing model to obtain a text processing result; the text processing model is obtained by training according to a training method of the text processing model according to any one embodiment of the present disclosure.

Further, the text processing apparatus further includes: the problem processing result determining module is used for inputting the problem to be processed corresponding to the text to be processed into a problem processing model to obtain a problem processing result; and the search result determining module is used for determining the matching degree of the text to be processed and the problem to be processed according to the problem processing result and the text processing result so as to determine the search result of the problem to be processed.

The text processing device can execute the text processing method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of executing the text processing method.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 10 illustrates a schematic area diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can also be stored. The calculation unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

A number of components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 executes the respective methods and processes described above, such as a training method of a text processing model or a text processing method. For example, in some embodiments, the training method of the text processing model or the text processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the training method of the text processing model or the text processing method described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured in any other suitable way (e.g. by means of firmware) to perform a training method or a text processing method of a text processing model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or area diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A training method of a text processing model comprises the following steps:

2. The method of claim 1, wherein said calculating a consistency loss from said first output and said second output comprises:

determining a difference between the first output and the second output from the first output and the second output;

calculating the loss of consistency from the difference.

3. The method of claim 2, the determining, from the first output and the second output, a difference between the first output and the second output, comprising:

calculating a difference between the first output text vector and the second output text vector; and/or

Calculating a difference between the classification probability of the first output and the classification probability of the second output.

4. The method of claim 1, wherein a first target model comprises the first text processing model and the second text processing model;

the inputting the short text into a first text processing model to obtain a first output, and the inputting the long text into a second text processing model to obtain a second output includes:

inputting the short text into a first text processing model in the first target model to obtain a processing result output by the first text processing model and determine first output;

and inputting the long text into a second text processing model in the first target model to obtain a processing result output by the second text processing model, and determining second output.

5. The method of claim 1, wherein a second target model comprises the first text processing model, the second text processing model, and a question processing model, the training sample comprising a question;

inputting the short text into a first text processing model in the second target model to obtain a processing result output by the first text processing model;

inputting the long text into a second text processing model in the second target model to obtain a processing result output by the second text processing model;

inputting the problem into a problem processing model in the second target model to obtain a processing result output by the problem processing model;

fusing a processing result output by the first text processing model and a processing result output by the problem processing model to obtain a first output;

and fusing the processing result output by the second text processing model and the processing result output by the problem processing model to obtain a second output.

6. The method of claim 1, further comprising:

calculating short text loss according to the difference between the first output and target output corresponding to the short text;

calculating a long text loss according to a difference between the second output and a target output corresponding to the long text;

and adjusting parameters of the first text processing model and parameters of the second text processing model which are shared by parameters according to the short text loss and the long text loss.

7. A text processing method, comprising:

wherein the text processing model is trained according to the training method of the text processing model according to any one of claims 1 to 6.

8. The method of claim 7, further comprising:

inputting the problem to be processed corresponding to the text to be processed into a problem processing model to obtain a problem processing result;

and determining the matching degree of the text to be processed and the problem to be processed according to the problem processing result and the text processing result so as to determine the search result of the problem to be processed.

9. A training apparatus for a text processing model, comprising:

10. The apparatus of claim 9, wherein the first parameter adjustment module comprises:

a long and short text output difference determining unit, configured to determine a difference between the first output and the second output according to the first output and the second output;

and the consistency loss calculating unit is used for calculating the consistency loss according to the difference.

11. The apparatus of claim 10, wherein the apparatus for determining difference between outputs of the long and short texts comprises:

a vector difference calculation subunit for calculating a difference between the first output text vector and the second output text vector; and/or

A probability difference calculation subunit for calculating a difference between the classification probability of the first output and the classification probability of the second output.

12. The apparatus of claim 9, wherein a first target model comprises the first text processing model and the second text processing model;

the model output acquisition module comprises:

a first output obtaining unit, configured to input the short text into a first text processing model in the first target model, obtain a processing result output by the first text processing model, and determine a first output;

and the second output acquisition unit is used for inputting the long text into a second text processing model in the first target model, obtaining a processing result output by the second text processing model and determining second output.

13. The apparatus of claim 9, wherein a second target model comprises the first text processing model, the second text processing model, and a question processing model, the training sample comprising a question;

the model output acquisition module comprises:

a first processing result obtaining unit, configured to input the short text into a first text processing model in the second target model, so as to obtain a processing result output by the first text processing model;

the second processing result acquisition unit is used for inputting the long text into a second text processing model in the second target model to obtain a processing result output by the second text processing model;

the problem processing result acquisition unit is used for inputting the problem into a problem processing model in the second target model to obtain a processing result output by the problem processing model;

the first output acquisition unit is used for fusing the processing result output by the first text processing model and the processing result output by the problem processing model to obtain a first output;

and the second output acquisition unit is used for fusing the processing result output by the second text processing model and the processing result output by the problem processing model to obtain a second output.

14. The apparatus of claim 9, further comprising:

the short text loss calculation module is used for calculating short text loss according to the difference between the first output and target output corresponding to the short text;

the long text loss calculation module is used for calculating the long text loss according to the difference between the second output and the target output corresponding to the long text;

and the second parameter adjusting module is used for adjusting the parameters of the first text processing model and the parameters of the second text processing model which are shared by parameters according to the short text loss and the long text loss.

15. A text processing apparatus comprising:

the text processing result determining module is used for inputting the text to be processed into the text processing model to obtain a text processing result; wherein the text processing model is trained according to the training method of the text processing model according to any one of claims 1 to 6.

16. The apparatus of claim 15, further comprising:

the problem processing result determining module is used for inputting the problem to be processed corresponding to the text to be processed into a problem processing model to obtain a problem processing result;

and the search result determining module is used for determining the matching degree of the text to be processed and the problem to be processed according to the problem processing result and the text processing result so as to determine the search result of the problem to be processed.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of training a text processing model according to any of claims 1-6 or a method of text processing according to any of claims 7-8.

18. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a training method of a text processing model according to any one of claims 1-6 or a text processing method according to any one of claims 7-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements a method of training a text processing model according to any one of claims 1-6, or a method of text processing according to any one of claims 7-8.