CN115841144A

CN115841144A - Training method and device for text retrieval model

Info

Publication number: CN115841144A
Application number: CN202211699919.5A
Authority: CN
Inventors: 暴宇健; 董辉
Original assignee: Beijing Longzhi Digital Technology Service Co Ltd
Current assignee: Beijing Longzhi Digital Technology Service Co Ltd
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-03-24

Abstract

The disclosure relates to the technical field of artificial intelligence, and provides a training method and device of a text retrieval model, computer equipment and a computer readable storage medium. The method comprises the steps of classifying training text samples by using a sample query text in each group of training text samples and a real article title corresponding to the sample query text, and screening different sample types, so that different loss function weight values can be set according to different sample types, and the distribution of the training weights (namely the loss function weight values) of the training text samples of different sample types is effectively improved, so that the training process of a text retrieval model is more sufficient, the training efficiency and effect of the text retrieval model can be improved, the performance of the text retrieval model is further improved, and the text retrieval effect of the text retrieval model in an actual service scene (for example, the accuracy of the text retrieval result of the text retrieval model is improved).

Description

Training method and device for text retrieval model

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to a training method and a training device for a text retrieval model.

Background

In the search service, in the training process of a commonly used text retrieval model based on a neural network, a large number of easily-separable negative samples with relatively small training effect on the neural network can be obtained through negative sampling. The negative samples which can be obtained in large quantities and are easy to distinguish have a very limited training gain for the model, so that problems of overfitting and weak generalization capability of the model can occur due to insufficient information in the characteristic features of the negative samples in the training process of the text retrieval model, and thus, under the scene of performing text retrieval by using the text retrieval model, the retrieved text retrieval result is not the retrieval result really wanted by the user, so that the user experience is poor.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a method, an apparatus, a computer device, and a computer-readable storage medium for training a text retrieval model, so as to solve the problem that in the prior art, a large number of easily separable negative samples having a relatively small training effect on a neural network are obtained through negative sampling, and a training gain for the model is very limited, which may cause overfitting and weak generalization of the model due to insufficient information in the characterization features of the negative samples in a training process of the text retrieval model, so that in a scene of performing text retrieval using the text retrieval model, a retrieved text retrieval result is not a retrieval result really intended by a user, thereby resulting in poor user experience.

In a first aspect of the embodiments of the present disclosure, a method for training a text retrieval model is provided, where the method includes:

acquiring a training text sample set; the training text sample set comprises a plurality of training text samples, and each set of training text samples comprises a sample query text and a real article title corresponding to the sample query text; the real article title is an article title in the preset text database;

inputting a sample query text in each training text sample into a preset text retrieval model aiming at each round of training of each group of training text samples to obtain a text sentence characteristic corresponding to the sample query text; inquiring in the preset text database according to the text sentence characteristics corresponding to the sample inquiry text to obtain a predicted article title corresponding to the sample inquiry text, wherein the predicted article title is an article title in the preset text database; determining a loss function value of the training text sample in the current round according to the predicted article title and the real article title;

determining the sample type of each group of training text samples according to the loss function value of each group of training text samples in the training text sample set in each training round of N training rounds;

and training the text retrieval model by using the training text sample set, the sample type of each training text sample in the training text sample set and a loss function weight value corresponding to a preset sample type to obtain the trained text retrieval model.

In a second aspect of the embodiments of the present disclosure, there is provided a training apparatus for a text retrieval model, the apparatus including:

the set acquisition unit is used for acquiring a training text sample set; the training text sample set comprises a plurality of training text samples, and each set of training text samples comprises a sample query text and a real article title corresponding to the sample query text; the real article title is an article title in the preset text database;

the numerical value determining unit is used for inputting the sample query texts in the training text samples into a preset text retrieval model aiming at each round of training of each group of training text samples to obtain text sentence characteristics corresponding to the sample query texts; inquiring in the preset text database according to the text sentence characteristics corresponding to the sample inquiry text to obtain a predicted article title corresponding to the sample inquiry text, wherein the predicted article title is an article title in the preset text database; determining a loss function value of the training text sample in the current round according to the predicted article title and the real article title;

the type determining unit is used for determining the sample type of each group of training text samples according to the loss function value of each group of training text samples in each training round of N training rounds in the training text sample set;

and the model training unit is used for training the text retrieval model by utilizing the training text sample set, the sample type of each training text sample in the training text sample set and a loss function weight value corresponding to a preset sample type to obtain the trained text retrieval model.

In a third aspect of the embodiments of the present disclosure, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the above method when executing the computer program.

In a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, which stores a computer program, which when executed by a processor, implements the steps of the above-mentioned method.

Compared with the prior art, the embodiment of the disclosure has the following beneficial effects: the embodiment of the disclosure can firstly obtain a training text sample set; the training text sample set comprises a plurality of training text samples, and each set of training text samples comprises a sample query text and a real article title corresponding to the sample query text; the real article title is an article title in the preset text database. Then, aiming at each round of training of each group of training text samples, inputting the sample query texts in the training text samples into a preset text retrieval model to obtain text sentence characteristics corresponding to the sample query texts; inquiring in the preset text database according to the text sentence characteristics corresponding to the sample inquiry text to obtain a predicted article title corresponding to the sample inquiry text, wherein the predicted article title is an article title in the preset text database; and determining the loss function value of the training text sample in the current round according to the predicted article title and the real article title. Then, the sample type of each training text sample can be determined according to the loss function value of each training text sample in the training text sample set in each training round of the N training rounds. And finally, training the text retrieval model by using the training text sample set, the sample type of each training text sample in the training text sample set and a loss function weight value corresponding to a preset sample type to obtain the trained text retrieval model. It can be seen that, in this embodiment, a sample query text in each set of training text samples and a real article title corresponding to the sample query text are first utilized to determine a loss function value of each set of training text samples in each training round, and then, each set of training text samples is classified according to the loss function value of each set of training text samples in each training round, so as to determine a sample type of each set of training text samples; thus, the training text samples are fully globally analyzed and mined, different sample types (such as simple samples, common samples and difficult samples) are distinguished by classifying loss function values of the training text samples in the first-stage training process (namely N rounds of training), and different loss function weight values can be set according to different sample types because the training text samples of different sample types have different influence degrees (such as the generalization capability of the text retrieval model) on the training effect of the text retrieval model, so that in the process of training the text retrieval model, the loss function weight values of the training text samples of the sample types which do not influence the training effect of the model can be reduced by using the training text sample set, the sample types of each training text sample in the training text sample set and the loss function weight values corresponding to the preset sample types, and the loss function weight values of the training text samples of the sample types which greatly influence the training effect of the model can be increased, and the potential of the training text samples of the sample types which greatly influence the training effect of the model can be fully exerted, and the influence on the training effect of the training text samples of the training model which do not influence the training effect or the training effect of the training samples which greatly influences the training model on the training model; therefore, the embodiment can effectively improve the training weight (i.e., the weight value of the loss function) distribution of the training text samples of different sample types, so that the training process of the text retrieval model is more sufficient, the training efficiency and effect of the text retrieval model can be improved, the performance of the text retrieval model is further improved, and the text retrieval effect of the text retrieval model in the actual service scene is further improved (for example, the accuracy of the text retrieval result of the text retrieval model is improved).

Drawings

To more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive efforts.

FIG. 1 is a scenario diagram of an application scenario of an embodiment of the present disclosure;

FIG. 2 is a flowchart of a training method of a text retrieval model provided by an embodiment of the present disclosure;

FIG. 3 is a block diagram of a training apparatus for a text retrieval model provided by an embodiment of the present disclosure;

fig. 4 is a schematic diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

A method and an apparatus for training a text retrieval model according to an embodiment of the present disclosure will be described in detail below with reference to the accompanying drawings.

In the prior art, in the conventional search service, a large number of easily separable negative samples with relatively small training effect on the neural network can be obtained through negative sampling in the training process of a commonly used text retrieval model based on the neural network. The negative samples which can be obtained in large quantities and are easy to distinguish have a very limited training gain for the model, so that problems of overfitting and weak generalization capability of the model can occur due to insufficient information in the characteristic features of the negative samples in the training process of the text retrieval model, and thus, under the scene of performing text retrieval by using the text retrieval model, the retrieved text retrieval result is not the retrieval result really wanted by the user, so that the user experience is poor.

To solve the above problems. The invention provides a training method of a text retrieval model, in the method, as the embodiment can firstly utilize a sample query text in each group of training text samples and a real article title corresponding to the sample query text, the loss function value of each group of training text samples in each training round is determined, then, the loss function value of each group of training text samples in each training round is classified for each group of training text samples according to the loss function value of each group of training text samples in each training round, and the sample type of each group of training text samples is determined; thus, the training text samples are fully analyzed and mined globally, different sample types (such as simple samples, common samples and difficult samples) are screened out by classifying the loss function values of the training text samples in the first-stage training process (namely N rounds of training), and different loss function weight values can be set according to different sample types because the training text samples of different sample types have different influence degrees on the training effect of the text retrieval model (such as the generalization capability of the text retrieval model), so that the loss function weight values of the training text samples of the sample types which do not influence the training effect of the model can be reduced in the process of training the text retrieval model by using the training text sample set, the sample types of each training text sample in the training text sample set and the loss function weight values corresponding to the preset sample types, and the loss function weight values of the training text samples of the sample types which have a greater influence on the training effect of the model can be increased, so as to fully play the potential of the training text samples of the training types which have a greater influence on the training effect of the training text retrieval model, and reduce the influence on the training effect of the training samples which have no influence or have a poorer influence on the training sample on the training effect of the training model; therefore, the embodiment can effectively improve the training weight (i.e., the weight value of the loss function) distribution of the training text samples of different sample types, so that the training process of the text retrieval model is more sufficient, the training efficiency and effect of the text retrieval model can be improved, the performance of the text retrieval model is further improved, and the text retrieval effect of the text retrieval model in the actual service scene is further improved (for example, the accuracy of the text retrieval result of the text retrieval model is improved).

For example, the embodiment of the present invention may be applied to an application scenario as shown in fig. 1. In this scenario, a terminal device 1 and a server 2 may be included.

The terminal device 1 may be hardware or software. When the terminal device 1 is hardware, it may be various electronic devices having a display screen and supporting communication with the server 2, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like; when the terminal device 1 is software, it may be installed in the electronic device as described above. The terminal device 1 may be implemented as a plurality of pieces of software or software modules, or may be implemented as a single piece of software or software module, which is not limited in this disclosure. Further, various applications, such as a data processing application, an instant messaging tool, social platform software, a search-type application, a shopping-type application, and the like, may be installed on the terminal device 1.

The server 2 may be a server providing various services, for example, a backend server receiving a request sent by a terminal device establishing a communication connection with the server, and the backend server may receive and analyze the request sent by the terminal device and generate a processing result. The server 2 may be one server, may also be a server cluster composed of a plurality of servers, or may also be a cloud computing service center, which is not limited in this disclosure.

The server 2 may be hardware or software. When the server 2 is hardware, it may be various electronic devices that provide various services to the terminal device 1. When the server 2 is software, it may be multiple software or software modules providing various services for the terminal device 1, or may be a single software or software module providing various services for the terminal device 1, which is not limited in the embodiment of the present disclosure.

The terminal device 1 and the server 2 may be connected in communication via a network. The network may be a wired network connected by a coaxial cable, a twisted pair and an optical fiber, or may be a wireless network that can interconnect various Communication devices without wiring, for example, bluetooth (Bluetooth), near Field Communication (NFC), infrared (Infrared), and the like, which is not limited in this disclosure.

Specifically, a user can input and acquire a training text sample set through the terminal device 1; the terminal device 1 sends the acquired training text sample set to the server 2. The server 2 stores a text retrieval model to be trained; the server 2 may first input the sample query text in each training of each set of training text samples into a preset text retrieval model to obtain text sentence features corresponding to the sample query text; inquiring in the preset text database according to the text sentence characteristics corresponding to the sample inquiry text to obtain a predicted article title corresponding to the sample inquiry text, wherein the predicted article title is an article title in the preset text database; determining a loss function value of the training text sample in the current round according to the predicted article title and the real article title; then, the server 2 may determine the sample type of each set of training text samples according to the loss function value of each set of training text samples in each of N training rounds in the set of training text samples; then, the server 2 may train the text retrieval model by using the training text sample set, the sample type of each training text sample in the training text sample set, and a loss function weight value corresponding to a preset sample type, so as to obtain a trained text retrieval model. Therefore, the loss function value of each training text sample in each training round can be determined by using the sample query text in each training text sample and the real article title corresponding to the sample query text, and then each training text sample is classified according to the loss function value of each training text sample in each training round, so as to determine the sample type of each training text sample; thus, the training text samples are fully globally analyzed and mined, different sample types (such as simple samples, common samples and difficult samples) are distinguished by classifying loss function values of the training text samples in the first-stage training process (namely N rounds of training), and different loss function weight values can be set according to different sample types because the training text samples of different sample types have different influence degrees (such as the generalization capability of the text retrieval model) on the training effect of the text retrieval model, so that in the process of training the text retrieval model, the loss function weight values of the training text samples of the sample types which do not influence the training effect of the model can be reduced by using the training text sample set, the sample types of each training text sample in the training text sample set and the loss function weight values corresponding to the preset sample types, and the loss function weight values of the training text samples of the sample types which greatly influence the training effect of the model can be increased, and the potential of the training text samples of the sample types which greatly influence the training effect of the model can be fully exerted, and the influence on the training effect of the training text samples of the training model which do not influence the training effect or the training effect of the training samples which greatly influences the training model on the training model; therefore, the embodiment can effectively improve the training weight (i.e., the weight value of the loss function) distribution of the training text samples of different sample types, so that the training process of the text retrieval model is more sufficient, the training efficiency and effect of the text retrieval model can be improved, the performance of the text retrieval model is further improved, and the text retrieval effect of the text retrieval model in the actual service scene is further improved (for example, the accuracy of the text retrieval result of the text retrieval model is improved).

It should be noted that specific types, numbers, and combinations of the terminal device 1, the server 2, and the network may be adjusted according to actual requirements of an application scenario, and the embodiment of the present disclosure does not limit this.

It should be noted that the above application scenarios are only illustrated for the convenience of understanding the present disclosure, and the embodiments of the present disclosure are not limited in any way in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

Fig. 2 is a flowchart of a training method of a text retrieval model according to an embodiment of the present disclosure. The training method of the text retrieval model of fig. 2 may be executed by the terminal device or the server of fig. 1. As shown in fig. 2, the training method of the text retrieval model includes:

s201: and acquiring a training text sample set.

The training text sample set comprises a plurality of training text samples, and each set of training text samples comprises a sample query text and a real article title corresponding to the sample query text.

In this embodiment, the training text sample set may include several training text samples. Each training recommendation sample comprises a sample query text and a real article title corresponding to the sample query text. The sample query text can be understood as query sentence text needing to be queried; in one implementation, the sample query text may be historical query text entered by the user. The real article title corresponding to the sample query text can be understood as the real article title corresponding to the sample query text.

It should be noted that the real article title is an article title in the preset text database. It can be understood that the title of the real article corresponding to the sample query text is an article title most similar to the text content of the sample query text in the preset text database; the preset text database stores a plurality of article titles and text sentence characteristics corresponding to the article titles in advance, wherein the text sentence characteristics corresponding to the article titles can be understood as characteristics capable of reflecting the meaning of text content of the article titles. For example, the sample query text is "world cup start time", and the real article title corresponding to the sample query text may be "world cup uncovering fight out of the stove".

S202: inputting a sample query text in each training text sample into a preset text retrieval model aiming at each round of training of each group of training text samples to obtain a text sentence characteristic corresponding to the sample query text; inquiring in the preset text database according to the text sentence characteristics corresponding to the sample inquiry text to obtain a predicted article title corresponding to the sample inquiry text; and determining the loss function value of the training text sample in the current round according to the predicted article title and the real article title.

After the training samples are obtained, N rounds of training can be performed on a preset text retrieval model by using each group of training text samples in a training text sample set respectively, so as to obtain a loss function value of each group of training text samples in the training text sample set in each round of training respectively. It can be understood that, since a set of training text samples is used to perform N rounds of training on a preset text retrieval model, N loss function values corresponding to the set of training text samples can be obtained, and if the set of training text samples includes Z sets of training text samples, Z × N loss function values can be obtained. It should be noted that N may be a positive integer greater than 1, and N rounds may be times preset according to actual requirements, or the number of training rounds that can cause the text retrieval model to have a slight overfitting condition (i.e., the loss function value of the training text sample set decreases, but the loss function value of the validation set starts to increase) may be taken as N rounds, for example, assuming that the text retrieval model is trained using the training text sample set, when the training is performed to the 100 th round, the text retrieval model has a slight overfitting condition, N may be set to 100. It should be noted that, in an implementation manner, the text retrieval model may be a recurrent neural network or a self-attention network, such as a neural network model of transform, BERT, BERTa, RNN, LSTM, GRU, elMo, CNN, fastText, albert, and the like.

Specifically, for each round of training of each group of training text samples, the sample query text in the training text samples is input into a preset text retrieval model, and text sentence features corresponding to the sample query text are obtained. That is to say, the sample query text in the training text sample is input into a preset text retrieval model, so that the text retrieval model performs latent representation vector extraction on the sample query text, so as to obtain text sentence features corresponding to the sample query text, and it can be understood that the text sentence features corresponding to the sample query text can be understood as feature vectors capable of reflecting text content of the sample query text.

Then, according to the text sentence characteristics corresponding to the sample query text, querying in the preset text database to obtain a predicted article title corresponding to the sample query text; that is, an article title closest to the text sentence feature is searched in the text database by using the text sentence feature corresponding to the sample query text, and the article title can be called a predicted article title; it is understood that the predicted article title is an article title in the predetermined text database.

Specifically, in an implementation manner, for each article title in the preset text database, a matching value between the article title and the sample query text may be determined according to a text sentence feature corresponding to the article title and a text sentence feature corresponding to the sample query text, for example, an inner product of the text sentence feature corresponding to the article title and the text sentence feature corresponding to the sample query text may be calculated, and the inner product is used as the matching value between the article title and the sample query text; it is understood that, if the matching value between the article title and the sample query text is higher, the more similar the article title and the text content of the sample query text are, and conversely, if the matching value between the article title and the sample query text is lower, the more dissimilar the article title and the text content of the sample query text are. And taking the article title with the highest matching value with the sample query text in the preset text database as the predicted article title corresponding to the sample query text.

And then, determining the loss function value of the training text sample in the current round according to the predicted article title and the real article title. For example, in one implementation, the loss function of the text retrieval model is a two-class cross entropy, a multi-class cross entropy loss function, a logistic regression loss function, or a triple loss function, and the loss function of the text retrieval model may be used to calculate the loss function values of the predicted article title and the real article title, and the loss function values may be used as the loss function values of the current training run of the set of training image samples.

S203: and determining the sample type of each group of training text samples according to the loss function value of each group of training text samples in each training round of N training rounds in the training text sample set.

Since the training text samples of different sample types have different degrees of influence on the training effect of the text retrieval model (for example, influence on the generalization capability of the text retrieval model), the training text samples in the training text sample set need to be classified so as to train the text retrieval model differently for the training text samples of different sample types. In the embodiment, the index of the loss function value is used, so that the classification of the training text sample can be completed in a self-adaptive and efficient manner, the complex data analysis and business knowledge intervention are avoided, the efficiency is high, the adaptability is wide, and the adaptive training method can be well adapted in other scenes.

Specifically, in this embodiment, for each training text sample in the training text sample set, the sample type of the training text sample may be determined according to the loss function value of the training text sample in each training round. That is to say, in this embodiment, after performing global and sufficient analysis and mining on the set of training text samples, the loss function values of the set of training text samples in each training round can be obtained, and the sample types of the set of training text samples are determined by classifying according to the loss function values of the set of training text samples in each training round.

It should be noted that the sample types of the training text samples can be classified into a first sample type (e.g., a normal sample), a second sample type (e.g., a simple sample), and a third sample type (e.g., a difficult sample).

S204: and training the text retrieval model by using the training text sample set, the sample type of each training text sample in the training text sample set and a preset loss function weight value corresponding to the sample type to obtain a trained text retrieval model.

In the embodiment, different loss function weight values can be set for training text samples of different sample types in the training process in advance, so that the potential of the training text samples can be fully exerted, a text retrieval model can be trained more fully in the training process, the performance of the model is improved, and the text retrieval effect of the text retrieval model in an actual service scene is further improved; and, a large number of useless simple samples can be effectively identified, so that part of the simple sample training weights which do not affect the model training effect can be reduced, namely, in the process of training the text retrieval model, the loss function weight value of the training text sample of the sample type which does not affect the model training effect can be reduced, and the loss function weight value of the training text sample of the sample type which greatly affects the model training effect can be improved, so that the potential of the training text sample of the sample type which greatly affects the model training effect can be fully exerted, the influence of the training text sample of the sample type which does not affect the model training effect or has poor effect on the model training effect on the text retrieval model training can be reduced, the training weight distribution of the simple samples and the difficult samples can be effectively improved, and the efficiency and the effect of the model training can be further improved.

In this embodiment, because the third sample type (for example, a difficult sample) has a large influence on the model training effect, the loss function weight value of the training text sample of the third sample type may be increased, that is, the loss function weight value corresponding to the training text sample of the third sample type is greater than the loss function weight value corresponding to the training text sample of the first sample type, for example, the loss function weight value of the training text sample of the third sample type is 2 to 5 times that of the training text sample of the first sample type. Since the second sample type (e.g., a simple sample) has a small influence on the training effect of the model, e.g., there is no gain for training the text retrieval model, and overfitting of the model is caused, and the online generalization capability is reduced, the loss function weight value of the training text sample of the second sample type can be reduced, that is, the loss function weight value corresponding to the training text sample of the second sample type is smaller than the loss function weight value corresponding to the training text sample of the first sample type, e.g., the loss function weight value of the training text sample of the first sample type is 2 to 5 times that of the loss function weight value of the training text sample of the second sample type.

In this implementation, after the sample type of each training text sample in the training text sample set is determined, the text retrieval model may be trained by using the sample types of each training text sample in the training text sample set and a loss function weight value corresponding to a preset sample type, so as to obtain a trained text retrieval model. In other words, the text retrieval model is trained by using training text samples in a training text sample set, wherein in the process of adjusting model parameters of the text retrieval model according to a loss function value, when the loss function value is calculated, a loss function weight value corresponding to a sample type, a preset sample type and a preset loss function of each training text sample in the training text sample set are calculated; and training until the loss function value of the text retrieval model meets a preset condition, or the training times reach preset times, so that the trained text retrieval model can be obtained.

Next, for example, how to calculate the loss function value according to the sample type of each training text sample in the training text sample set, the loss function weight value corresponding to the preset sample type, and the preset loss function in S204 is described, for example: the training text sample set includes 4 training text samples: a first training text sample (including sample query text s 1), a second training text sample (including sample query text s 2), and a third training text sample (including sampleQuery text s 3), a fourth training text sample (including sample query text s 4); assuming that the sample types of the first training text sample and the second training text sample are difficult samples (namely, a third sample type), the sample types of the third training text sample and the fourth training text sample are common samples (namely, the first sample type), the titles of real articles respectively corresponding to the sample query text s1, the sample query text s2, the sample query text s3 and the sample query text s4 are l1, l2, l3 and l4, respectively, the weight of a Loss function of the third sample type is w times of that of the first sample type, wherein the neural network (namely, the text retrieval model) is f (), the Loss function is Loss (), and when the gradient is calculated, the Loss function value Loss of the text retrieval model after the weight is redistributed is calculated according to the Loss function value Loss () _Average The calculation formula of (2) is as follows: loss _Average = w (loss (l 1, f (s 1) + loss (l 2, f (s 2)) + loss (l 3, f (s 3) + loss (l 4, f (s 4)). Then, model training may be performed according to a normal procedure.

Therefore, the embodiment of the disclosure may obtain the training text sample set first; the training text sample set comprises a plurality of training text samples, and each set of training text samples comprises a sample query text and a real article title corresponding to the sample query text; the real article title is an article title in the preset text database. Then, aiming at each round of training of each group of training text samples, inputting the sample query texts in the training text samples into a preset text retrieval model to obtain text sentence characteristics corresponding to the sample query texts; inquiring in the preset text database according to the text sentence characteristics corresponding to the sample inquiry text to obtain a predicted article title corresponding to the sample inquiry text, wherein the predicted article title is an article title in the preset text database; and determining the loss function value of the training text sample in the current round according to the predicted article title and the real article title. Then, the sample type of each training text sample can be determined according to the loss function value of each training text sample in the training text sample set in each training round of the N training rounds. And finally, training the text retrieval model by using the training text sample set, the sample type of each training text sample in the training text sample set and a loss function weight value corresponding to a preset sample type to obtain the trained text retrieval model. It can be seen that, in this embodiment, a sample query text in each set of training text samples and a real article title corresponding to the sample query text are first utilized to determine a loss function value of each set of training text samples in each training round, and then, each set of training text samples is classified according to the loss function value of each set of training text samples in each training round, so as to determine a sample type of each set of training text samples; thus, the training text samples are fully globally analyzed and mined, different sample types (such as simple samples, common samples and difficult samples) are distinguished by classifying loss function values of the training text samples in the first-stage training process (namely N rounds of training), and different loss function weight values can be set according to different sample types because the training text samples of different sample types have different influence degrees (such as the generalization capability of the text retrieval model) on the training effect of the text retrieval model, so that in the process of training the text retrieval model, the loss function weight values of the training text samples of the sample types which do not influence the training effect of the model can be reduced by using the training text sample set, the sample types of each training text sample in the training text sample set and the loss function weight values corresponding to the preset sample types, and the loss function weight values of the training text samples of the sample types which greatly influence the training effect of the model can be increased, and the potential of the training text samples of the sample types which greatly influence the training effect of the model can be fully exerted, and the influence on the training effect of the training text samples of the training model which do not influence the training effect or the training effect of the training samples which greatly influences the training model on the training model; therefore, the embodiment can effectively improve the training weight (i.e., the weight value of the loss function) distribution of the training text samples of different sample types, so that the training process of the text retrieval model is more sufficient, the training efficiency and effect of the text retrieval model can be improved, the performance of the text retrieval model is further improved, and the text retrieval effect of the text retrieval model in the actual service scene is further improved (for example, the accuracy of the text retrieval result of the text retrieval model is improved).

In some embodiments, the step of S203 "determining a sample type of each set of training text samples according to the loss function value of each set of training text samples in each of N training rounds respectively" may include the following steps:

s203a: and aiming at each group of training text samples in the training text sample set, determining the whole training loss reduction degree of the training text samples according to the loss function values of the training text samples in each round of training.

In an implementation manner of this embodiment, M rounds of training may be performed on the text retrieval model by using the training text sample set, that is, each round of training may be performed on the text retrieval model by using all the training text samples in the training text sample set.

The average value of the loss function values of the front M rounds of training and the loss function value of the back X round of training can be determined according to the loss function values of the training text samples in each round of training, wherein M and X are positive integers, and M and X are both smaller than N. It can be understood that the average value of the loss function values of the training text samples in the first M rounds of training is determined according to the loss function values of the training text samples in each round of training in the first M rounds of training, and the average value of the loss function values of the training text samples in the second X rounds of training is determined according to the loss function values of the training text samples in each round of training in the second X rounds of training. For example, the average of the loss function values of the training text samples in the first 10% round may be calculated based on all the loss function values of the training text samples in the first 10% round, and the average of the loss function values of the training text samples in the last 10% round may be calculated based on all the loss function values of the training text samples in the last 10% round.

Then, the whole training loss reduction degree of the training text sample can be determined according to the average value of the loss function values of the front M rounds of training and the loss function values of the back X rounds of training. That is, the whole training loss reduction degree of the training text sample can be determined according to the average loss function value of the training text samples in the previous M training rounds and according to the average loss function value of the training text samples in the next X training rounds. In one implementation, the training through-process loss reduction degree of the training text sample can be calculated by using the following formula: c = (a-b)/a; wherein c is the whole training loss reduction degree of the training text sample, a is the loss function value average value of the training text sample in the front M rounds of training, and b is the loss function value average value of the training text sample in the back X rounds of training.

S203b: and determining the sample type of each group of training text samples according to the training whole-process loss reduction degree of each group of training text samples in the training text sample set.

After determining the whole training loss reduction degree of each training text sample in the training text sample set, the whole training loss reduction degrees of each training text sample in the training text sample set may be ranked from high to low to obtain a ranking result. Then, the sample type of the training text sample located at the top n bits (for example, located at the top 30%) in the ranking result may be determined as a first sample type (for example, a common sample); the sample type of the training text sample located at the last n bits (e.g., located at the last 30%) in the ranking result may be determined as a second sample type (e.g., a simple sample); and determining the sample type of the training text sample which is not positioned at the top n bits and the bottom n bits (such as positioned after the top 30% and before the bottom 30%) in the ranking result as a third sample type (such as a difficult sample).

For example, assuming that the training text sample set includes 100 training text samples, the training whole-process loss reduction degrees of the training text samples in the training text sample set are ranked from high to low to obtain a ranking result. The sample types of the training text samples ranked in the first place to the third ten may be determined as normal samples, the sample types of the training text samples ranked in the thirty-first place to the seventh ten may be determined as difficult samples, and the sample types of the training text samples ranked in the seventy-first place to the hundredth place may be determined as simple samples.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

Fig. 3 is a schematic diagram of a training apparatus for a text retrieval model according to an embodiment of the present disclosure. As shown in fig. 3, the training device of the text retrieval model includes:

a set obtaining unit 301, configured to obtain a training text sample set; the training text sample set comprises a plurality of training text samples, and each set of training text samples comprises a sample query text and a real article title corresponding to the sample query text; the real article title is an article title in the preset text database;

a numerical value determining unit 302, configured to input a sample query text in each training text sample into a preset text retrieval model for each round of training of each set of training text samples, so as to obtain a text sentence feature corresponding to the sample query text; inquiring in the preset text database according to the text sentence characteristics corresponding to the sample inquiry text to obtain a predicted article title corresponding to the sample inquiry text, wherein the predicted article title is an article title in the preset text database; determining a loss function value of the training text sample in the current round according to the predicted article title and the real article title;

a type determining unit 303, configured to determine a sample type of each group of training text samples according to a loss function value of each group of training text samples in each of N training rounds in the training text sample set;

and the model training unit 304 is configured to train the text retrieval model by using the training text sample set, the sample type of each training text sample in the training text sample set, and a loss function weight value corresponding to a preset sample type, so as to obtain a trained text retrieval model.

Optionally, the preset text database stores a plurality of article titles and text sentence characteristics respectively corresponding to each article title; the value determining unit 302 is configured to:

for each article title in the preset text database, determining a matching value between the article title and the sample query text according to the text sentence characteristic corresponding to the article title and the text sentence characteristic corresponding to the sample query text;

and taking the article title with the highest matching value between the preset text database and the sample query text as the predicted article title corresponding to the sample query text.

Optionally, the text retrieval model is a recurrent neural network or a self-attention network.

Optionally, the type determining unit 303 is configured to:

determining the whole training loss reduction degree of the training text samples according to the loss function values of the training text samples in each round of training aiming at each group of training text samples in the training text sample set;

and determining the sample type of each group of training text samples according to the training whole-process loss reduction degree of each group of training text samples in the training text sample set.

Optionally, the type determining unit 303 is configured to:

determining the average value of the loss function values of the front M rounds of training and the loss function value of the back X round of training according to the loss function values of the training text samples in each round of training, wherein M and X are both smaller than N;

and determining the whole training loss reduction degree of the training text sample according to the loss function value average value of the front M rounds of training and the loss function value of the rear X round of training.

Optionally, the type determining unit 303 is configured to:

sequencing the whole training loss reduction degrees of all groups of training text samples in the training text sample set from high to low to obtain a sequencing result;

determining the sample type of the training text sample positioned at the top n bits in the sequencing result as a first sample type;

determining the sample type of the training text sample positioned at the last n bits in the sequencing result as a second sample type;

and determining the sample types of the training text samples which are not positioned at the front n bits and the back n bits in the sequencing result as a third sample type.

Optionally, the loss function weight value corresponding to the training text sample with the third sample type is greater than the loss function weight value corresponding to the training text sample with the first sample type; the loss function weight value corresponding to the training text sample with the second sample type is smaller than the loss function weight value corresponding to the training text sample with the first sample type.

Optionally, the loss function of the text retrieval model is a two-class cross entropy function, a multi-class cross entropy loss function, a logistic regression loss function, or a triple loss function.

Compared with the prior art, the embodiment of the disclosure has the following beneficial effects: the embodiment of the present disclosure provides a training device for a text retrieval model, where the device includes: the set acquisition unit is used for acquiring a training text sample set; the training text sample set comprises a plurality of training text samples, and each set of training text samples comprises a sample query text and a real article title corresponding to the sample query text; the real article title is an article title in the preset text database; the numerical value determining unit is used for inputting the sample query texts in the training text samples into a preset text retrieval model aiming at each round of training of each group of training text samples to obtain text sentence characteristics corresponding to the sample query texts; inquiring in the preset text database according to the text sentence characteristics corresponding to the sample inquiry text to obtain a predicted article title corresponding to the sample inquiry text, wherein the predicted article title is an article title in the preset text database; determining a loss function value of the training text sample in the current round according to the predicted article title and the real article title; the type determining unit is used for determining the sample type of each group of training text samples according to the loss function value of each group of training text samples in each training round of N training rounds in the training text sample set; and the model training unit is used for training the text retrieval model by utilizing the training text sample set, the sample type of each training text sample in the training text sample set and a loss function weight value corresponding to a preset sample type to obtain the trained text retrieval model. It can be seen that, in this embodiment, a sample query text in each set of training text samples and a real article title corresponding to the sample query text are first utilized to determine a loss function value of each set of training text samples in each training round, and then, each set of training text samples is classified according to the loss function value of each set of training text samples in each training round, so as to determine a sample type of each set of training text samples; thus, the training text samples are fully globally analyzed and mined, different sample types (such as simple samples, common samples and difficult samples) are distinguished by classifying loss function values of the training text samples in the first-stage training process (namely N rounds of training), and different loss function weight values can be set according to different sample types because the training text samples of different sample types have different influence degrees (such as the generalization capability of the text retrieval model) on the training effect of the text retrieval model, so that in the process of training the text retrieval model, the loss function weight values of the training text samples of the sample types which do not influence the training effect of the model can be reduced by using the training text sample set, the sample types of each training text sample in the training text sample set and the loss function weight values corresponding to the preset sample types, and the loss function weight values of the training text samples of the sample types which greatly influence the training effect of the model can be increased, and the potential of the training text samples of the sample types which greatly influence the training effect of the model can be fully exerted, and the influence on the training effect of the training text samples of the training model which do not influence the training effect or the training effect of the training samples which greatly influences the training model on the training model; therefore, the embodiment can effectively improve the training weight (i.e., the weight value of the loss function) distribution of the training text samples of different sample types, so that the training process of the text retrieval model is more sufficient, the training efficiency and effect of the text retrieval model can be improved, the performance of the text retrieval model is further improved, and the text retrieval effect of the text retrieval model in the actual service scene is further improved (for example, the accuracy of the text retrieval result of the text retrieval model is improved).

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.

Fig. 4 is a schematic diagram of a computer device 4 provided by the disclosed embodiment. As shown in fig. 4, the computer device 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps in the various method embodiments described above are implemented when the processor 401 executes the computer program 403. Alternatively, the processor 401 implements the functions of the respective modules/modules in the above-described respective apparatus embodiments when executing the computer program 403.

Illustratively, the computer program 403 may be partitioned into one or more modules stored in the memory 402 and executed by the processor 401 to accomplish the present disclosure. One or more modules/modules may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of computer program 403 in computer device 4.

The computer device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computer devices. Computer device 4 may include, but is not limited to, a processor 401 and a memory 402. Those skilled in the art will appreciate that fig. 4 is merely an example of a computer device 4 and is not intended to limit computer device 4 and may include more or fewer components than those shown, or some of the components may be combined, or different components, e.g., the computer device may also include input output devices, network access devices, buses, etc.

The Processor 401 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 402 may be an internal storage module of the computer device 4, for example, a hard disk or a memory of the computer device 4. The memory 402 may also be an external storage device of the computer device 4, such as a plug-in hard disk provided on the computer device 4, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, memory 402 may also include both internal and external storage modules of computer device 4. The memory 402 is used for storing computer programs and other programs and data required by the computer device. The memory 402 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned functional modules and modules are illustrated as examples, and in practical applications, the above-mentioned functional allocation may be performed by different functional modules and modules according to requirements, that is, the internal structure of the apparatus is divided into different functional modules or modules to perform all or part of the above-mentioned functions. In the embodiments, each functional module and each module may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module, and the integrated modules may be implemented in a form of hardware or a form of software functional modules. In addition, specific names of the functional modules and modules are only used for distinguishing one functional module from another, and are not used for limiting the protection scope of the present disclosure. The modules and the specific working processes of the modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, e.g., a division of modules or modules into only one logical division, another division may be present in an actual implementation, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, functional modules in the embodiments of the present disclosure may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

The integrated modules/modules, if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method in the above embodiments, and may also be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the above methods and embodiments. The computer program may comprise computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain suitable additions or additions that may be required in accordance with legislative and patent practices within the jurisdiction, for example, in some jurisdictions, computer readable media may not include electrical carrier signals or telecommunications signals in accordance with legislative and patent practices.

The above examples are only intended to illustrate the technical solution of the present disclosure, not to limit it; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present disclosure, and are intended to be included within the scope of the present disclosure.

Claims

1. A method for training a text retrieval model, the method comprising:

and training the text retrieval model by using the training text sample set, the sample type of each training text sample in the training text sample set and a preset loss function weight value corresponding to the sample type to obtain a trained text retrieval model.

2. The method of claim 1, wherein the predetermined text database stores a plurality of article titles and text sentence characteristics corresponding to each of the article titles; the querying is performed in the preset text database according to the text sentence characteristics corresponding to the sample query text to obtain the predicted article title corresponding to the sample query text, and the method comprises the following steps:

3. The method of claim 1, wherein the text retrieval model is a recurrent neural network, a self-attention network.

4. The method of claim 1, wherein determining the sample type of each set of training text samples according to the loss function value of each set of training text samples in each of N training rounds, respectively, comprises:

5. The method as claimed in claim 4, wherein the determining the degree of global loss reduction of training for the training text sample according to the loss function values of the training text sample in each training round respectively comprises:

6. The method according to claim 4, wherein the determining the sample type of each set of training text samples according to the degree of training global loss reduction of each set of training text samples in the set of training text samples comprises:

7. The method of claim 6, wherein the loss function weight value corresponding to the training text sample with the sample type of the third sample type is greater than the loss function weight value corresponding to the training text sample with the sample type of the first sample type; the loss function weight value corresponding to the training text sample with the second sample type is smaller than the loss function weight value corresponding to the training text sample with the first sample type.

8. The method of claim 1, wherein the loss function of the text retrieval model is a two-class cross entropy, a multi-class cross entropy loss function, a logistic regression loss function, or a triple loss function.

9. An apparatus for training a text retrieval model, the apparatus comprising:

10. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 9 when executing the computer program.

11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.