CN114861671A - Model training method and device, computer equipment and storage medium - Google Patents

Model training method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114861671A
CN114861671A CN202210375330.3A CN202210375330A CN114861671A CN 114861671 A CN114861671 A CN 114861671A CN 202210375330 A CN202210375330 A CN 202210375330A CN 114861671 A CN114861671 A CN 114861671A
Authority
CN
China
Prior art keywords
sample
similarity
loss
characteristic data
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210375330.3A
Other languages
Chinese (zh)
Inventor
张旭
文博
刘云峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhuiyi Technology Co Ltd
Original Assignee
Shenzhen Zhuiyi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhuiyi Technology Co Ltd filed Critical Shenzhen Zhuiyi Technology Co Ltd
Priority to CN202210375330.3A priority Critical patent/CN114861671A/en
Publication of CN114861671A publication Critical patent/CN114861671A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to a model training method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: inputting text sample data into a first model, and determining first loss according to the obtained first sample characteristic data; inputting the text sample data into a second model, and determining a second loss according to the obtained second sample characteristic data and the first sample characteristic data; inputting the text sample data into a third model to obtain third sample characteristic data, and determining the similarity between the first sample characteristic data and the third sample characteristic data to obtain a third loss or the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data to obtain a third loss based on a preset condition; and determining a loss function according to the first loss, the second loss and the third loss, wherein the loss function is used for training the first model. The scheme can enable the convergence speed of the first model to be higher, and the accuracy of text recognition to be higher.

Description

Model training method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a model training method, apparatus, computer device, storage medium, and computer program product.
Background
With the development of deep learning, deep neural networks in natural language processing are increasingly used, and in order to improve the performance of models, most models are complex, large in parameter quantity and large in memory consumption, so that the deep neural networks are difficult to directly apply to devices with limited application resources, such as a Graphic Processing Unit (GPU) and a smart phone.
The knowledge distillation belongs to a transfer learning method, namely the performance of one model is transferred to another model, and for a teacher-student network, the teacher network is often a more complex network, has better performance and generalization capability, and can be used as an objective function to guide another simpler student network to learn, so that the simpler student model with less parameter computation amount can also have the performance similar to the teacher network. The learning mode can convert the network from a large network into a small network and retain the performance close to that of the large network; the knowledge learned by multiple networks may also be transferred to a network such that the performance of a single network approaches the result of multiple networks in their entirety.
Complex models can be compressed by knowledge distillation technology to meet the operation requirements, but the accuracy of text recognition by student models obtained by knowledge distillation is often low.
Disclosure of Invention
In view of the above, it is necessary to provide a model training method, apparatus, computer device, computer readable storage medium and computer program product capable of improving the recognition accuracy of a student model for a text.
In a first aspect, the present application provides a model training method. The method comprises the following steps:
acquiring text sample data;
inputting the text sample data into a first model to obtain first sample characteristic data, and determining a first loss according to the first sample characteristic data;
inputting the text sample data into a second model to obtain second sample characteristic data, and determining a second loss according to the first sample characteristic data and the second sample characteristic data;
inputting the text sample data into a third model to obtain third sample characteristic data, and determining that a third loss is obtained according to the similarity between the first sample characteristic data and the third sample characteristic data or according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data based on a preset condition;
determining a loss function according to the first loss, the second loss, and the third loss, the loss function being used to train the first model.
In one embodiment, the text sample data comprises a first sample, a second sample and a third sample, and the second sample is similar to the first sample semantic, and the third sample is opposite to the first sample semantic; inputting the text sample data into a first model to obtain first sample characteristic data, including:
inputting the text sample data into the first model, and obtaining first target characteristic data according to the first sample;
obtaining second target characteristic data according to the second sample;
obtaining third target characteristic data according to the third sample;
inputting the text sample data into a second model to obtain second sample characteristic data, wherein the second sample characteristic data comprises:
inputting the text sample data into the second model, and obtaining fourth target characteristic data according to the first sample;
obtaining fifth target characteristic data according to the second sample;
obtaining sixth target characteristic data according to the third sample;
inputting the text sample data into a third model to obtain third sample characteristic data, wherein the third sample characteristic data comprises:
inputting the text sample data into the third model, and obtaining seventh target characteristic data according to the first sample;
obtaining eighth target characteristic data according to the second sample;
and obtaining ninth target characteristic data according to the third sample.
In one embodiment, the determining a first loss from the first sample feature data includes:
determining a first similarity according to the first target characteristic data and the second target characteristic data;
determining a second similarity according to the first target characteristic data and the third target characteristic data;
and determining the first loss according to the first similarity and the second similarity.
In one embodiment, the method further comprises:
determining a first entropy value according to the first similarity and the second similarity;
determining a first weight corresponding to the second loss and a second weight corresponding to the third loss according to the first entropy;
said determining a loss function from said first loss, said second loss, and said third loss comprises:
determining the loss function based on the first loss, the second loss, the first weight, the third loss, and the second weight.
In one embodiment, the determining, based on a preset condition, that a third loss is obtained by using the similarity between the first sample feature data and the third sample feature data, or a third loss is obtained by using the similarity between the third sample feature data and the similarity between the first sample feature data and the third sample feature data includes:
determining an average entropy value according to the first similarity and the second similarity;
if the average entropy value is larger than a preset threshold value, determining that a third loss is obtained by the similarity of the first sample characteristic data and the third sample characteristic data;
and if the average entropy value is smaller than or equal to a preset threshold value, determining that a third loss is obtained according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data.
In one embodiment, determining a second loss from the first sample feature data and the second sample feature data comprises:
determining a third similarity according to the first target characteristic data and the fourth target characteristic data;
determining a fourth similarity according to the first target characteristic data, the fifth target characteristic data and the sixth target characteristic data;
and determining the second loss according to the third similarity and the fourth similarity.
In one embodiment, the obtaining a third loss according to the similarity between the first sample feature data and the third sample feature data includes:
determining a fifth similarity according to the first target characteristic data and the seventh target characteristic data;
determining a sixth similarity according to the first target characteristic data, the eighth target characteristic data and the ninth target characteristic data;
and determining the third loss according to the fifth similarity and the sixth similarity.
In one embodiment, the obtaining a third loss according to the similarity between the third sample feature data and the similarity between the first sample feature data and the third sample feature data includes:
determining a seventh similarity according to the third sample characteristic data obtained in the current round and the historical third sample characteristic data;
and determining the third loss according to the seventh similarity, the fifth similarity and the sixth similarity.
In a second aspect, the present application further provides a model training apparatus. The device comprises:
the sample data acquisition module is used for acquiring text sample data;
the first loss determining module is used for inputting the text sample data into a first model to obtain first sample characteristic data and determining first loss according to the first sample characteristic data;
a second loss determining module, configured to input the text sample data into a second model to obtain second sample feature data, and determine a second loss according to the first sample feature data and the second sample feature data;
a third loss determining module, configured to input the text sample data into a third model to obtain third sample feature data, and determine, based on a preset condition, that a third loss is obtained according to a similarity between the first sample feature data and the third sample feature data, or that a third loss is obtained according to a similarity between the third sample feature data and a similarity between the first sample feature data and the third sample feature data;
a loss function determination module configured to determine a loss function according to the first loss, the second loss, and the third loss, where the loss function is used to train the first model.
In one embodiment, the text sample data comprises a first sample, a second sample and a third sample, and the second sample is similar to the first sample semantic, and the third sample is opposite to the first sample semantic; the first loss determination module is further configured to:
inputting the text sample data into the first model, and obtaining first target characteristic data according to the first sample;
obtaining second target characteristic data according to the second sample;
obtaining third target characteristic data according to the third sample;
the second loss determination module is further configured to:
inputting the text sample data into the second model, and obtaining fourth target characteristic data according to the first sample;
obtaining fifth target characteristic data according to the second sample;
obtaining sixth target characteristic data according to the third sample;
the third loss determination module is further configured to:
inputting the text sample data into the third model, and obtaining seventh target characteristic data according to the first sample;
obtaining eighth target characteristic data according to the second sample;
and obtaining ninth target characteristic data according to the third sample.
In one embodiment, the first loss determining module is further configured to:
determining a first similarity according to the first target characteristic data and the second target characteristic data;
determining a second similarity according to the first target characteristic data and the third target characteristic data;
and determining the first loss according to the first similarity and the second similarity.
In one embodiment, the model training apparatus further includes a weight determination module configured to:
determining a first entropy value according to the first similarity and the second similarity;
determining a first weight corresponding to the second loss and a second weight corresponding to the third loss according to the first entropy;
the loss function determination module is further configured to:
determining the loss function based on the first loss, the second loss, the first weight, the third loss, and the second weight.
In one embodiment, the third loss determining module is further configured to:
determining an average entropy value according to the first similarity and the second similarity;
if the average entropy value is larger than a preset threshold value, determining that a third loss is obtained by the similarity of the first sample characteristic data and the third sample characteristic data;
and if the average entropy value is smaller than or equal to a preset threshold value, determining that a third loss is obtained according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data.
In one embodiment, the second loss determination module is further configured to:
determining a third similarity according to the first target characteristic data and the fourth target characteristic data;
determining a fourth similarity according to the first target characteristic data, the fifth target characteristic data and the sixth target characteristic data;
and determining the second loss according to the third similarity and the fourth similarity.
In one embodiment, the third loss determining module is further configured to:
determining a fifth similarity according to the first target characteristic data and the seventh target characteristic data;
determining a sixth similarity according to the first target characteristic data, the eighth target characteristic data and the ninth target characteristic data;
and determining the third loss according to the fifth similarity and the sixth similarity.
In one embodiment, the third loss determining module is further configured to:
determining a seventh similarity according to the third sample characteristic data obtained in the current round and historical third sample characteristic data;
and determining the third loss according to the seventh similarity, the fifth similarity and the sixth similarity.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:
acquiring text sample data;
inputting the text sample data into a first model to obtain first sample characteristic data, and determining a first loss according to the first sample characteristic data;
inputting the text sample data into a second model to obtain second sample characteristic data, and determining a second loss according to the first sample characteristic data and the second sample characteristic data;
inputting the text sample data into a third model to obtain third sample characteristic data, and determining that a third loss is obtained according to the similarity between the first sample characteristic data and the third sample characteristic data or according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data based on a preset condition;
determining a loss function according to the first loss, the second loss, and the third loss, the loss function being used to train the first model.
In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring text sample data;
inputting the text sample data into a first model to obtain first sample characteristic data, and determining a first loss according to the first sample characteristic data;
inputting the text sample data into a second model to obtain second sample characteristic data, and determining a second loss according to the first sample characteristic data and the second sample characteristic data;
inputting the text sample data into a third model to obtain third sample characteristic data, and determining that a third loss is obtained according to the similarity between the first sample characteristic data and the third sample characteristic data or according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data based on a preset condition;
determining a loss function according to the first loss, the second loss, and the third loss, the loss function being used to train the first model.
In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:
acquiring text sample data;
inputting the text sample data into a first model to obtain first sample characteristic data, and determining a first loss according to the first sample characteristic data;
inputting the text sample data into a second model to obtain second sample characteristic data, and determining a second loss according to the first sample characteristic data and the second sample characteristic data;
inputting the text sample data into a third model to obtain third sample characteristic data, and determining that a third loss is obtained according to the similarity between the first sample characteristic data and the third sample characteristic data or according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data based on a preset condition;
determining a loss function according to the first loss, the second loss, and the third loss, the loss function being used to train the first model.
According to the model training method, the model training device, the computer equipment, the storage medium and the computer program product, the text sample data is obtained; inputting the text sample data into a first model to obtain first sample characteristic data, and determining a first loss according to the first sample characteristic data; inputting the text sample data into a second model to obtain second sample characteristic data, and determining a second loss according to the first sample characteristic data and the second sample characteristic data; inputting the text sample data into a third model to obtain third sample characteristic data, and determining that a third loss is obtained according to the similarity between the first sample characteristic data and the third sample characteristic data or according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data based on a preset condition; determining a loss function according to the first loss, the second loss, and the third loss, the loss function being used to train the first model. According to the method and the device, the loss function is comprehensively determined through combining the first loss obtained according to the first model, the second loss obtained according to the second model and the third loss obtained according to the third model, different determination modes of the third loss are selected according to preset conditions, the first model is trained according to the loss function, namely, the two models are used for guiding and training the first model simultaneously, the convergence speed of the first model is higher, the training efficiency of the model is improved, the training time is saved, hardware resources are saved, and the accuracy of text recognition is higher.
Drawings
FIG. 1 is a diagram of an exemplary environment in which a model training method may be implemented;
FIG. 2 is a schematic flow chart diagram of a model training method in one embodiment;
FIG. 3 is a schematic flow chart illustrating step 204 in one embodiment;
FIG. 4 is a schematic flow chart illustrating step 208 in one embodiment;
FIG. 5 is a flow chart illustrating step 206 in one embodiment;
FIG. 6 is a schematic flow chart of step 208 in another embodiment;
FIG. 7A is a schematic flow chart diagram illustrating a model training method in accordance with another embodiment;
FIG. 7B is a schematic flow chart diagram illustrating a model training method in accordance with another embodiment;
FIG. 8 is a block diagram showing the structure of a model training apparatus according to an embodiment;
FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
The model training method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The server 104 acquires text sample data sent by the terminal 102, inputs the text sample data into the first model to obtain first sample characteristic data, and determines a first loss according to the first sample characteristic data; inputting the text sample data into a second model to obtain second sample characteristic data, and determining a second loss according to the first sample characteristic data and the second sample characteristic data; inputting the text sample data into a third model to obtain third sample characteristic data, and determining that a third loss is obtained according to the similarity between the first sample characteristic data and the third sample characteristic data or the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data based on a preset condition; determining a loss function according to the first loss, the second loss and the third loss, wherein the loss function is used for training the first model.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.
It should be noted that any step in the model training method disclosed in the embodiment of the present application may be implemented by the terminal 102 and the server 104 in an interactive manner, or implemented by the server 104 alone, or implemented by the terminal 102 alone, which is not limited herein.
In one embodiment, as shown in fig. 2, a model training method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
step 202, acquiring text sample data.
The server can acquire text sample data through the terminal, and can also directly acquire the text sample data. The text sample data may be in the form of any sentence, paragraph, or article, and may include multiple types of samples at the same time, for example, at least one of an a sample, a B sample, and a C sample, where the a sample is any one sentence, paragraph, or article, the B sample is a corresponding sentence, paragraph, or article with similar semantics to the a sample, and the C sample is a corresponding sentence, paragraph, or article with opposite semantics to the a sample. If the text sample data includes three samples, i.e., a sample a, a sample B and a sample C, the input model may be input in a sentence pair form when being processed, for example, a sentence 1 in the sample a, a sentence 2 corresponding to the sample B and having a semantic similar to that of the sentence 1, and a sentence 3 corresponding to the sample C and having a semantic opposite to that of the sentence 1 are input together into the model for processing.
Step 204, inputting the text sample data into the first model to obtain first sample characteristic data, and determining a first loss according to the first sample characteristic data.
And inputting the text sample data into the first model for processing to obtain first sample characteristic data. If the text sample data comprises a plurality of types of samples, the first sample feature data comprises a plurality of types of target feature data. The first model is a model to be trained, and may be, for example, a 3-layer, 4-layer or 6-layer BERT (Bidirectional Encoder Representations from transducers) model, or a Bi-directional Long Short-Term Memory (Bi-directional Long Short-Term Memory) model. A first loss is determined from the first sample characteristic data. In one possible implementation manner, the similarity between the first feature data is calculated, and then the first loss is determined according to the similarity between the first feature data, for example, a Mean Squared Error (MSE) or a cross entropy corresponding to the similarity between the first feature data is used as the first loss. For example, if the cross entropy between the distribution p and the distribution q is to be calculated, the cross entropy can be calculated by the following formula (1):
Figure BDA0003590487350000081
wherein n represents the number of samples, x i=1,2,…,n The ith sample is represented.
The mean square error is calculated as the following equation (2):
Figure BDA0003590487350000091
wherein m represents the number of samples, y i=1,2,…,m Which represents the number of the i-th sample,
Figure BDA0003590487350000092
the sample mean is indicated.
Step 206, inputting the text sample data into the second model to obtain second sample characteristic data, and determining a second loss according to the first sample characteristic data and the second sample characteristic data.
And inputting the text sample characteristic data into a second model for processing to obtain second sample characteristic data. If the text sample data includes a plurality of types of samples, the second sample feature data includes a plurality of types of target feature data. The second model is a trained model, but model parameters are updated in a subsequent process, and the type of the second model can be the same as or different from that of the first model, and is specifically selected according to actual conditions. The second loss may be determined according to the similarity between the first sample feature data and the second sample feature data, and the second loss may be determined according to the similarity between the first sample feature data and the second sample feature data.
And 208, inputting the text sample data into a third model to obtain third sample characteristic data, and determining that a third loss is obtained according to the similarity between the first sample characteristic data and the third sample characteristic data or the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data based on a preset condition.
And inputting the text sample data into the third model for processing to obtain third sample characteristic data. If the text sample data includes a plurality of types of samples, the third sample feature data includes a plurality of types of target feature data. The third model is a trained model, which is equivalent to the target model of the first model, and for example, the third model may be a BERT large model or a BERT base model. In this embodiment, it is determined by setting a preset condition whether the third loss is obtained by using the similarity between the first sample feature data and the third sample feature data, or the third loss is obtained by using the similarity between the third sample feature data and the similarity between the first sample feature data and the third sample feature data.
In one possible implementation manner, when a preset condition is met, obtaining a third loss according to the similarity of the first sample characteristic data and the third sample characteristic data; and when the preset condition is not met, obtaining a third loss according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data. Or when the preset condition is met, obtaining a third loss according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data; and when the preset condition is not met, obtaining a third loss according to the similarity of the first sample characteristic data and the third sample characteristic data.
In another possible implementation manner, the preset condition is a preset threshold, and if the preset condition is greater than the preset threshold, a third loss is obtained according to the similarity between the first sample characteristic data and the third sample characteristic data; and if the third loss is smaller than or equal to the preset threshold, obtaining a third loss according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data. Or if the first loss is greater than the preset threshold, obtaining a third loss according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data; and if the first sample characteristic data is smaller than or equal to the preset threshold, obtaining a third loss according to the similarity of the first sample characteristic data and the third sample characteristic data.
In an alternative embodiment, when the third loss is obtained according to the similarity between the first sample feature data and the third sample feature data, the mean square error or the cross entropy corresponding to the similarity between the first sample feature data and the third sample feature data may be used as the third loss. Similarly, the similarity between the third sample feature data and the similarity between the first sample feature data and the third sample feature data are used to obtain a third loss, the similarity between the third sample feature data and the similarity between the first sample feature data and the third sample feature data are spliced to obtain a spliced similarity, and a mean square error or a cross entropy corresponding to the spliced similarity is used as the third loss. The third loss is obtained by using the similarity between the third sample feature data and the similarity between the first sample feature data and the third sample feature data, and may be a relative entropy (relative entropy) of the similarity between the third sample feature data and the similarity between the first sample feature data and the third sample feature data.
Step 210, determining a loss function according to the first loss, the second loss and the third loss, wherein the loss function is used for training the first model.
In this embodiment, a loss function is determined according to the first loss, the second loss, and the third loss, the first loss, the second loss, and the third loss may be weighted and summed to obtain a loss function, and the first model is trained through the loss function until the convergence condition is satisfied, and then the training is terminated.
In the model training method, text sample data is obtained; inputting text sample data into a first model to obtain first sample characteristic data, and determining a first loss according to the first sample characteristic data; inputting the text sample data into a second model to obtain second sample characteristic data, and determining a second loss according to the first sample characteristic data and the second sample characteristic data; inputting the text sample data into a third model to obtain third sample characteristic data, and determining that a third loss is obtained according to the similarity between the first sample characteristic data and the third sample characteristic data or the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data based on a preset condition; determining a loss function according to the first loss, the second loss and the third loss, wherein the loss function is used for training the first model. According to the embodiment of the application, the loss function is comprehensively determined by combining the first loss obtained according to the first model, the second loss obtained according to the second model and the third loss obtained according to the third model, different determination modes of the third loss are selected according to preset conditions, the first model is trained according to the loss function, namely, the two models are used for guiding and training the first model simultaneously, so that the convergence rate of the first model is higher, the training efficiency of the model is improved, the training time is saved, hardware resources are saved, and the accuracy of text recognition is higher.
In one embodiment, the text sample data comprises a first sample, a second sample and a third sample, wherein the second sample is similar to the first sample in semantic meaning, and the third sample is opposite to the first sample in semantic meaning; step 204, inputting the text sample data into the first model to obtain first sample characteristic data, including: inputting text sample data into the first model, and obtaining first target characteristic data according to the first sample; obtaining second target characteristic data according to the second sample; obtaining third target characteristic data according to the third sample;
step 206, inputting the text sample data into the second model to obtain second sample characteristic data, including: inputting text sample data into the second model, and obtaining fourth target characteristic data according to the first model; obtaining fifth target characteristic data according to the second sample; obtaining sixth target characteristic data according to the third sample;
inputting the text sample data into a third model in step 208 to obtain third sample characteristic data, including: inputting text sample data into a third model, and obtaining seventh target characteristic data according to the first model; obtaining eighth target characteristic data according to the second sample; and obtaining ninth target characteristic data according to the third sample.
In this embodiment, the text sample data includes a first sample, a second sample and a third sample, the second sample has a semantic similar to that of the first sample, the third sample has a semantic opposite to that of the first sample, the text sample data is input to different models to obtain corresponding different target feature data, the text sample data is input to the first model to obtain first sample feature data, and the first sample feature data includes first target feature data, second target feature data and third target feature data; inputting the text sample data into a second model to obtain second sample characteristic data, wherein the second sample characteristic data comprises fourth target characteristic data, fifth target characteristic data and sixth target characteristic data; and inputting the text sample data into the third model to obtain third sample characteristic data, wherein the third sample characteristic data comprises seventh target characteristic data, eighth target characteristic data and ninth target characteristic data. By using multiple types of text sample data, namely training the first model by using abundant text sample data, the accuracy of recognizing the text by the first model can be further improved.
In one embodiment, as shown in fig. 3, determining the first loss from the first sample feature data in step 204 includes:
step 302, determining a first similarity according to the first target characteristic data and the second target characteristic data.
Similarity, i.e. representing the similarity between two things. The more similar the two are, the greater the similarity is, and conversely, the smaller the similarity is. The similarity may be characterized using a Jaccard's (Jaccard) similarity coefficient, a cosine similarity, a distance, or a Pearson correlation coefficient, etc. Wherein, the Jacard similarity coefficient is mainly used for calculating the similarity between samples of the symbol measurement or Boolean value measurement; the cosine similarity evaluates the similarity between two vectors by calculating the cosine value of the included angle of the two vectors; the similarity is characterized by the distance, and the similarity can be characterized by using Euclidean distance or Manhattan distance between vectors; the pearson correlation coefficient passes through how closely the two range variables are linked. The specific similarity characterizing method is not limited to this, and is not specifically limited herein, and a corresponding similarity characterizing method is selected according to an actual requirement, and in this embodiment, the cosine similarity between the first target feature data and the second target feature data may be calculated as the first similarity.
Step 304, determining a second similarity according to the first target characteristic data and the third target characteristic data.
In this embodiment, the cosine similarity between the first target characteristic data and the third target characteristic data is calculated to obtain the second similarity.
Step 306, determining a first loss according to the first similarity and the second similarity.
And determining the first loss according to the first similarity and the second similarity. In a possible implementation manner, the first similarity and the second similarity may be spliced to obtain a first splicing similarity, the first splicing similarity is normalized, that is, processed to be a value between 0 and 1, for example, a softmax function may be used for processing to obtain a normalization result corresponding to the first splicing similarity, a mean square error or a cross entropy of the normalization result is calculated, and the mean square error or the cross entropy of the normalization result is used as the first loss. The softmax function is shown in equation (3) below:
Figure BDA0003590487350000121
wherein, P (Z) i,j ) Representing the knot resulting from the normalization processFruit, Z i,j The similarity between the ith sample and the jth sample, and C is the total number of samples.
In one embodiment, the model training method further comprises:
a first entropy value is determined based on the first similarity and the second similarity.
In this embodiment, cross entropies corresponding to the first similarity and the second similarity are obtained according to the first similarity and the second similarity, and the first entropy value is determined according to the maximum value of the cross entropies and the actual entropy value.
In one possible implementation manner, as described above, the first similarity and the second similarity are spliced to obtain a first splicing result, the first splicing result is normalized to obtain a normalization result, and the cross entropy S (Z) of the normalization result is calculated i ) As shown in the following equation (4):
S(Z i )=-Σ j∈C P(Z i,j )log b P(Z i,j ) (4)
according to the principle of maximum entropy
Figure BDA0003590487350000122
Then, the maximum value of the cross entropy is obtained, i.e. the similarity of all samples is the same at this time.
And taking the ratio of the maximum value of the cross entropy and the actual entropy as a first entropy.
And determining a first weight corresponding to the second loss and a second weight corresponding to the third loss according to the first entropy value.
In this embodiment, the weights corresponding to the second loss and the third loss are determined by using the first entropy, specifically, the first entropy may be used as the first weight corresponding to the second loss, and the difference between 1 and the first entropy may be used as the second weight corresponding to the third loss.
Step 210 of determining a loss function from the first loss, the second loss, and the third loss comprises:
a penalty function is determined based on the first penalty, the second penalty, the first weight, the third penalty, and the second weight.
In this embodiment, the loss function may be obtained by performing weighted summation on the first loss, the second loss, and the third loss, where specifically, the weight of the second loss is the first weight, and the weight of the third loss is the second weight. In one possible implementation, the weight of the first penalty may be 1, i.e., the penalty function is the sum of the first penalty, the product of the second penalty and the first weight, and the product of the third penalty and the second weight.
In one embodiment, as shown in fig. 4, the determining in step 208 that the third loss is obtained by the similarity between the first sample feature data and the third sample feature data or by the similarity between the third sample feature data and the similarity between the first sample feature data and the third sample feature data based on the preset condition includes:
and 402, determining an average entropy value according to the first similarity and the second similarity.
For a round of model training, i.e. one iteration, as in the foregoing embodiment, the first similarity and the second similarity may be spliced to obtain a first splicing result, the first splicing result is normalized to obtain a normalization result, and the cross entropy of the normalization result is calculated as the entropy of the current iteration. And determining an average entropy value according to the iteration times and the entropy value obtained by each iteration. The average entropy value may be an arithmetic average of the entropy values of each iteration of model training, or may be a weighted average of the entropy values of each iteration.
And step 404, if the average entropy is larger than a preset threshold, determining that a third loss is obtained by the similarity between the first sample characteristic data and the third sample characteristic data.
And step 406, if the average entropy is smaller than or equal to the preset threshold, determining that a third loss is obtained by the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data.
In this embodiment, the preset condition is a relationship between the average entropy and a preset threshold, that is, if the average entropy is greater than the preset threshold, determining that a third loss is obtained by using a similarity between the first sample characteristic data and the third sample characteristic data; and if the average entropy value is smaller than or equal to the preset threshold value, determining that a third loss is obtained according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data. In this embodiment, the determination mode of selecting the third loss is dynamically adjusted through the relationship between the average entropy and the preset threshold, so that a loss function more conforming to the actual training process is obtained, and the model training precision is further improved.
In one embodiment, as shown in fig. 5, determining a second loss from the first sample characteristic data and the second sample characteristic data in step 206 includes:
step 502, determining a third similarity according to the first target characteristic data and the fourth target characteristic data.
In this embodiment, the similarity between the first target feature data and the fourth target feature data may be calculated as the third similarity. The first target characteristic data is obtained through the first model according to the first sample for the current round of training, and the fourth target characteristic data is obtained through the second model according to the first sample for the current round of training.
Step 504, determining a fourth similarity according to the first target characteristic data, the fifth target characteristic data and the sixth target characteristic data.
In an alternative embodiment, the third similarity may be calculated by calculating the similarity between the first target feature data and the fifth target feature data and the sixth target feature data, that is, calculating the similarity between the first target feature data and the entirety of the fifth target feature data and the sixth target feature data. The first target characteristic data is obtained through a first model and according to a first sample for the current round of training, the fifth target characteristic data is obtained through a second model and according to a second sample for the current round of training, and the sixth target characteristic data is obtained through the second model and according to a third sample for the current round of training.
In another alternative embodiment, the fourth similarity is determined based on the first target characteristic data and the fifth target characteristic data, the sixth target characteristic data, and the historical second sample characteristic data. The historical second sample feature data are obtained through a second model in the past training of preset times, and comprise fourth target feature data, fifth target feature data and sixth target feature data obtained in the past training of the preset times. The preset times can be all the times of the previous training or the preset times closest to the current training. The historical second sample feature data may be stored in a memory corresponding to the second model in a queue. And calculating the similarity between the first target characteristic data and the whole of the fifth target characteristic data, the sixth target characteristic data and the historical second sample characteristic data to serve as a fourth similarity.
And step 506, determining a second loss according to the third similarity and the fourth similarity.
In this embodiment, the second loss is determined according to the third similarity and the fourth similarity. The third similarity and the fourth similarity may be spliced to obtain a second splicing similarity, the second splicing similarity is normalized to obtain a normalization result corresponding to the second splicing similarity, a mean square error or a cross entropy of the normalization result corresponding to the second splicing similarity is calculated, and the mean square error or the cross entropy of the normalization result corresponding to the second splicing similarity is used as the second loss.
In one embodiment, as shown in fig. 6, the obtaining a third loss in the step 208 according to the similarity between the first sample feature data and the third sample feature data includes:
step 602, determining a fifth similarity according to the first target feature data and the seventh target feature data.
In this embodiment, the similarity between the first target feature data and the seventh target feature data may be calculated as the fifth similarity. The first target characteristic data is obtained through the first model according to the first sample for the current round of training, and the seventh target characteristic data is obtained through the third model according to the first sample for the current round of training.
Step 604, determining a sixth similarity according to the first target characteristic data, the eighth target characteristic data and the ninth target characteristic data.
In an alternative embodiment, the sixth similarity may be calculated by calculating the similarity between the first target feature data and the eighth target feature data and the ninth target feature data, that is, the similarity between the first target feature data and the entirety of the eighth target feature data and the ninth target feature data. The first target characteristic data is obtained through a first model and according to a first sample for the current round of training, the eighth target characteristic data is obtained through a third model and according to a second sample for the current round of training, and the ninth target characteristic data is obtained through a third model and according to a third sample for the current round of training.
In another alternative embodiment, the fourth similarity is determined based on the first target feature data and the eighth target feature data, the ninth target feature data, and the historical third sample feature data. The historical third sample feature data is obtained through a third model in the past training of preset times, and comprises seventh target feature data, eighth target feature data and ninth target feature data obtained in the past training of the preset times. The preset times can be all the times of the previous training or the preset times closest to the current training. The historical third sample feature data may be stored in a memory corresponding to the third model in a queue. And calculating the similarity between the first target feature data and the whole of the eighth target feature data, the ninth target feature data and the historical third sample feature data as a fourth similarity.
And step 606, determining a third loss according to the fifth similarity and the sixth similarity.
In this embodiment, the third loss is determined based on the fifth similarity and the sixth similarity. The fifth similarity and the sixth similarity may be spliced to obtain a third splicing similarity, the third splicing similarity is normalized to obtain a normalization result corresponding to the third splicing similarity, a mean square error or a cross entropy of the normalization result corresponding to the third splicing similarity is calculated, and the mean square error or the cross entropy of the normalization result corresponding to the third splicing similarity is used as a third loss.
In one embodiment, obtaining a third loss in step 208 based on the similarity between the third sample feature data and the similarity between the first sample feature data and the third sample feature data comprises:
and determining a seventh similarity according to the third sample characteristic data obtained in the current round and the historical third sample characteristic data.
According to the description of the foregoing step 604, the historical third sample feature data is obtained by the third model in the past training of the preset times, and the historical third sample feature data includes seventh target feature data, eighth target feature data, and ninth target feature data obtained in the past training of the preset times. The third sample feature data obtained in the current round include seventh target feature data, eighth target feature data and ninth target feature data obtained in the current round through the third model. In this embodiment, a similarity between the third sample feature data obtained in this round and the historical third sample feature data may be calculated, that is, a similarity between the seventh target feature data, the eighth target feature data, and the ninth target feature data obtained in this round and the seventh target feature data, the eighth target feature data, and the ninth target feature data obtained in the previous training of the preset number of times is calculated as a seventh similarity.
And determining a third loss according to the seventh similarity, the fifth similarity and the sixth similarity.
In this embodiment, the fifth similarity and the sixth similarity are spliced to obtain a fifth splicing similarity, a relative entropy between the seventh similarity and the fifth splicing similarity is calculated, and the relative entropy between the seventh similarity and the fifth splicing similarity is used as a third loss.
In an optional embodiment, the fifth similarity, the sixth similarity and the seventh similarity may be spliced to obtain a fourth splicing similarity, the fourth splicing similarity is normalized to obtain a normalization result corresponding to the fourth splicing similarity, a mean square error or a cross entropy of the normalization result corresponding to the fourth splicing similarity is calculated, and the mean square error or the cross entropy of the normalization result corresponding to the fourth splicing similarity is used as the third loss.
In one embodiment, as shown in fig. 7A, the model training method includes:
step 702, obtaining text sample data, where the text sample data includes a first sample, a second sample and a third sample, the second sample has a semantic similar to the first sample, and the third sample has a semantic opposite to the first sample.
In this embodiment, the text sample data may be in the form of any sentence, paragraph, or article. And the text sample data comprises a first sample, a second sample and a third sample, and the first sample, the second sample and the third sample are input into the corresponding model together for processing during each training. As described with reference to fig. 7B, the text sample data includes a sample x and a positive sample x + Negative sample x - Sample x, positive sample x + Negative sample x - And inputting the data into the model together for processing.
Step 704, inputting the text sample data into the first model, and obtaining first target characteristic data according to the first model; obtaining second target characteristic data according to the second sample; obtaining third target characteristic data according to the third sample; inputting text sample data into the second model, and obtaining fourth target characteristic data according to the first model; obtaining fifth target characteristic data according to the second sample; obtaining sixth target characteristic data according to the third sample; inputting text sample data into a third model, and obtaining seventh target characteristic data according to the first model; obtaining eighth target characteristic data according to the second sample; and obtaining ninth target characteristic data according to the third sample.
In this embodiment, different text sample data is input into different models to obtain different target feature data. The first model may be the same as or different from the second model, for example, the first model and the second model may be a 3-layer, 4-layer or 6-layer BERT model, or biLSTM, and the third model is a BERT large model or a BERT base model. In FIG. 7B, the first model is student model, the second model is Small teacher model, the third model is Large teacher model, and the sample x is input into the student model generation featuref, positive sample x + Input student model Generation feature f +, negative sample x - Inputting a student model to generate a characteristic f-; similarly, the text sample data input Small teacher model and the Large teacher model also generate corresponding feature data. In addition, data generated by the Small teacher model is stored in the memory bank1 in a form of a queue, data generated by the Large teacher model is stored in the memory bank2 in a form of a queue, and the amounts of data stored in the memory bank1 and the memory bank2 are set according to requirements, which is not further limited herein.
Step 706, calculating cosine similarity between the first target characteristic data and the second target characteristic data to obtain a first similarity; calculating cosine similarity between the first target characteristic data and the third target characteristic data to obtain second similarity; and splicing the first similarity and the second similarity to obtain a first splicing similarity, and taking the cross entropy of the first splicing similarity as a first loss.
In this embodiment, the cosine similarity between the first target characteristic data and the second target characteristic data is calculated to obtain a first similarity; calculating cosine similarity between the first target characteristic data and the third target characteristic data to obtain second similarity; and splicing the first similarity and the second similarity to obtain a first splicing similarity, normalizing the first splicing similarity to obtain a normalization result corresponding to the first splicing similarity, calculating the cross entropy of the normalization result corresponding to the first splicing similarity, and taking the cross entropy of the normalization result as a first loss. In a possible implementation manner, the similarity between the sample and the positive sample and the similarity between the sample and the negative sample can be obtained in a supervised training manner, and the similarity between the sample and the positive sample and the similarity between the sample and the negative sample are spliced to obtain a first splicing similarity
Figure BDA0003590487350000161
According to the first splicing similarity
Figure BDA0003590487350000162
Calculating supervised lossesl s I.e. the first loss is l s
Step 708, calculating cosine similarity between the first target characteristic data and the fourth target characteristic data to obtain a third similarity; calculating cosine similarity between the first target characteristic data and fifth target characteristic data, between the sixth target characteristic data and the historical second characteristic data to obtain a fourth similarity; and splicing the third similarity and the fourth similarity to obtain a second splicing similarity, and taking the cross entropy of the second splicing similarity as a second loss.
In this embodiment, the third similarity is obtained by calculating the cosine similarity between the first target characteristic data obtained by the current round of training and the fourth target characteristic data obtained by the current round of training. And calculating cosine similarity between the first target characteristic data obtained by the training of the current round and fifth target characteristic data obtained by the training of the current round, the sixth target characteristic data obtained by the training of the current round and historical second characteristic data to obtain fourth similarity, wherein the historical second characteristic data are the fourth target characteristic data, the fifth target characteristic data and the sixth target characteristic data obtained by the training of preset times in the previous round. The preset number may be all the previous training times, or may be the preset number closest to the current training round, for example, the preset number closest to the current training round is 10. And splicing the third similarity and the fourth similarity to obtain a second splicing similarity, normalizing the second splicing similarity to obtain a normalized result corresponding to the second splicing similarity, calculating the cross entropy of the normalized result corresponding to the second splicing similarity, and taking the cross entropy of the normalized result as a second loss.
As can be seen from FIG. 7B, similarity is calculated according to the features generated by the student model, the features generated by the small teacher model and the feature data obtained by the previous training stored in memory bank1
Figure BDA0003590487350000171
According to the similarity
Figure BDA0003590487350000172
Calculating unsupervised loss l u,s I.e. the second loss is l u,s
Step 710, determining an average entropy value according to the first similarity and the second similarity; if the average entropy is greater than the predetermined threshold, then a third loss is obtained in the manner described in step 712; if the average entropy is less than or equal to the predetermined threshold, then a third loss is obtained in the manner described in step 714.
In this embodiment, the average entropy value is determined according to the number of iterations and the entropy value obtained by each iteration. Cross entropy corresponding to each round of training can be obtained according to the description in step 706, that is, an average entropy value can be determined according to the number of training rounds and the cross entropy corresponding to each round of training. The average entropy value can be an arithmetic mean of the cross entropy of each iteration of model training, or can be a weighted mean of the cross entropy of each iteration. Setting a preset threshold according to the requirement, and if the average entropy is greater than the preset threshold, obtaining a third loss in the manner described in step 712; if the average entropy is less than or equal to the predetermined threshold, then a third loss is obtained in the manner described in step 714.
Step 712, calculating cosine similarity between the first target characteristic data and the seventh target characteristic data to obtain a fifth similarity; calculating cosine similarity between the first target characteristic data and the eighth target characteristic data, between the ninth target characteristic data and between the historical third sample characteristic data to obtain a sixth similarity; and splicing the fifth similarity and the sixth similarity to obtain a third splicing similarity, and taking the cross entropy of the third splicing similarity as a third loss.
And calculating cosine similarity between the first target characteristic data obtained by the training of the current round and the seventh target characteristic data obtained by the training of the current round to obtain fifth similarity. And calculating cosine similarity between the first target characteristic data obtained by the training of the current round and eighth target characteristic data obtained by the training of the current round, the ninth target characteristic data obtained by the training of the current round and historical third sample characteristic data to obtain sixth similarity, wherein the historical third characteristic data are seventh target characteristic data, eighth target characteristic data and ninth target characteristic data obtained by preset times of previous training. That is, the similarity between the first target feature data and the entirety of the eighth target feature data, the ninth target feature data, and the historical third sample feature data is calculated as the sixth similarity.
As shown in fig. 7B, similarity is calculated according to feature data generated by the student model, features generated by the large teacher model, and feature data obtained by previous training stored in memory bank2
Figure BDA0003590487350000173
According to the similarity
Figure BDA0003590487350000174
Calculating unsupervised loss l u,l I.e. the third loss is l u,l
Step 714, calculating cosine similarity between the third sample characteristic data obtained in the current round and historical third sample characteristic data as seventh similarity; and splicing the fifth similarity, the sixth similarity and the seventh similarity to obtain a fourth splicing similarity, and taking the cross entropy of the fourth splicing result as a third loss.
The third sample characteristic data obtained in the current round comprises seventh target characteristic data, eighth target characteristic data and ninth target characteristic data obtained through a third model in the current round of training; the historical third sample feature data comprises seventh target feature data, eighth target feature data and ninth target feature data which are obtained in the previous training for a preset number of times. In this embodiment, the seventh similarity is obtained by calculating similarities between the seventh target feature data, the eighth target feature data, and the ninth target feature data obtained in the current round and the seventh target feature data, the eighth target feature data, and the ninth target feature data obtained in the previous training for the preset number of times. And splicing the fifth similarity, the sixth similarity and the seventh similarity to obtain a fourth splicing similarity, normalizing the fourth splicing similarity to obtain a normalized result corresponding to the fourth splicing similarity, calculating the cross entropy of the normalized result corresponding to the fourth splicing similarity, and taking the cross entropy of the normalized result corresponding to the fourth splicing similarity as a third loss.
Optionally, as shown in fig. 7B, similarity is calculated according to features generated by the large teacher model obtained in the current round and feature data obtained by previous training stored in memory bank2
Figure BDA0003590487350000181
Calculating the similarity
Figure BDA0003590487350000182
And
Figure BDA0003590487350000183
relative entropy between as distillation loss l kd I.e. the third loss is the distillation loss l kd
Step 716, taking the ratio of the maximum value of the cross entropy of the first splicing similarity to the actual entropy as a first entropy, taking the first entropy as a first weight corresponding to the second loss, and taking the difference between 1 and the first entropy as a second weight corresponding to the third loss.
Obtaining a first splicing similarity according to the step 706, taking a ratio of a maximum value of cross entropy of the first splicing similarity to an actual entropy as a first entropy, taking the first entropy as a first weight corresponding to a second loss, and taking a difference value between 1 and the first entropy as a second weight corresponding to a third loss.
Step 718, using the sum of the first loss, the product of the second loss and the first weight, and the product of the third loss and the second weight as a loss function, wherein the loss function is used for training the first model.
In one specific example, as shown in fig. 7B, the loss function, i.e., the total loss/, can be obtained by the following equation (5):
l=l s +α×l u,s +(1-α)×choice(l u,l ,l kd ) (5)
wherein l s For the first loss,/ u,s Is the second loss, α is the first weight, then (1- α) is the second weight, choice (l) u,l ,l kd ) Representing a third loss selected according to the relationship between the mean entropy value and a preset threshold value u,l Or l kd If the average entropy value is larger than the preset threshold value, selecting l u,l As a third loss; if the average entropy is less than or equal to the preset threshold value, selecting l kd As a third loss.
In the model training method in this embodiment, the text sample data including the first sample, the second sample, and the third sample is used as the training sample data, and the second model and the third model are used to perform multi-level guidance and training on the first model, so that the convergence rate of the first model is faster, and the accuracy of text recognition is higher.
In the model training method related in the embodiment of the application, the first model is mainly guided and trained according to the second model and the third model in the training process, namely, the parameters of the first model are continuously subjected to gradient updating along with the training process until the convergence condition is met. Optionally, in the process of training the model, in order to make the training precision of the first model higher, the second model may be momentum-updated according to the first model, and the third model parameter is unchanged. In one possible implementation, the second model parameter of the next round of training is determined according to the first model parameter of the current round of training and the second model parameter of the current round of training. Specifically, the updating manner of the second model can be shown as the following formula (6):
T i+1 =KS+(1-K)T i (6)
wherein, T i A second model parameter, T, representing the use of the current round of training i+1 Representing the second model parameter in the next round of training, S representing the first model parameter obtained in the current round, and K being a constant.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides a model training apparatus for implementing the above-mentioned model training method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the method, so the specific limitations in one or more embodiments of the model training device provided below can be referred to the limitations of the model training method in the above, and are not described herein again.
In one embodiment, as shown in fig. 8, there is provided a model training apparatus including: a sample data obtaining module 802, a first loss determining module 804, a second loss determining module 806, a third loss determining module 808, and a loss function determining module 810, wherein:
a sample data obtaining module 802, configured to obtain text sample data;
a first loss determining module 804, configured to input the text sample data into a first model to obtain first sample feature data, and determine a first loss according to the first sample feature data;
a second loss determining module 806, configured to input the text sample data into a second model to obtain second sample feature data, and determine a second loss according to the first sample feature data and the second sample feature data;
a third loss determining module 808, configured to input the text sample data into a third model to obtain third sample feature data, and determine, based on a preset condition, that a third loss is obtained according to a similarity between the first sample feature data and the third sample feature data, or that a third loss is obtained according to a similarity between the third sample feature data and a similarity between the first sample feature data and the third sample feature data;
a loss function determining module 810, configured to determine a loss function according to the first loss, the second loss, and the third loss, where the loss function is used to train the first model.
In one embodiment, the text sample data comprises a first sample, a second sample and a third sample, and the second sample is similar to the first sample semantic, and the third sample is opposite to the first sample semantic; the first loss determination module 804 is further configured to:
inputting the text sample data into the first model, and obtaining first target characteristic data according to the first sample;
obtaining second target characteristic data according to the second sample;
obtaining third target characteristic data according to the third sample;
the second loss determination module 806 is further configured to:
inputting the text sample data into the second model, and obtaining fourth target characteristic data according to the first sample;
obtaining fifth target characteristic data according to the second sample;
obtaining sixth target characteristic data according to the third sample;
the third loss determination module 808 is further configured to:
inputting the text sample data into the third model, and obtaining seventh target characteristic data according to the first sample;
obtaining eighth target characteristic data according to the second sample;
and obtaining ninth target characteristic data according to the third sample.
In one embodiment, the first loss determination module 804 is further configured to:
determining a first similarity according to the first target characteristic data and the second target characteristic data;
determining a second similarity according to the first target characteristic data and the third target characteristic data;
and determining the first loss according to the first similarity and the second similarity.
In one embodiment, the model training apparatus further comprises a weight determination module for:
determining a first entropy value according to the first similarity and the second similarity;
determining a first weight corresponding to the second loss and a second weight corresponding to the third loss according to the first entropy;
the loss function determination module 810 is further configured to:
determining the loss function based on the first loss, the second loss, the first weight, the third loss, and the second weight.
In one embodiment, the third loss determination module 808 is further configured to:
determining an average entropy value according to the first similarity and the second similarity;
if the average entropy value is larger than a preset threshold value, determining that a third loss is obtained by the similarity of the first sample characteristic data and the third sample characteristic data;
and if the average entropy value is smaller than or equal to a preset threshold value, determining that a third loss is obtained according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data.
In one embodiment, the second loss determination module 806 is further configured to:
determining a third similarity according to the first target characteristic data and the fourth target characteristic data;
determining a fourth similarity according to the first target characteristic data, the fifth target characteristic data and the sixth target characteristic data;
and determining the second loss according to the third similarity and the fourth similarity.
In one embodiment, the third loss determination module 808 is further configured to:
determining a fifth similarity according to the first target characteristic data and the seventh target characteristic data;
determining a sixth similarity according to the first target characteristic data, the eighth target characteristic data and the ninth target characteristic data;
and determining the third loss according to the fifth similarity and the sixth similarity.
In one embodiment, the third loss determination module 808 is further configured to:
determining a seventh similarity according to the third sample characteristic data obtained in the current round and the historical third sample characteristic data;
and determining the third loss according to the seventh similarity, the fifth similarity and the sixth similarity.
The modules in the model training device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store sample characteristic data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a model training method.
It will be appreciated by those skilled in the art that the configuration shown in fig. 9 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, which includes a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the above-mentioned model training methods when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned model training methods.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of the above-described model training methods.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (12)

1. A method of model training, the method comprising:
acquiring text sample data;
inputting the text sample data into a first model to obtain first sample characteristic data, and determining a first loss according to the first sample characteristic data;
inputting the text sample data into a second model to obtain second sample characteristic data, and determining a second loss according to the first sample characteristic data and the second sample characteristic data;
inputting the text sample data into a third model to obtain third sample characteristic data, and determining that a third loss is obtained according to the similarity between the first sample characteristic data and the third sample characteristic data or according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data based on a preset condition;
determining a loss function according to the first loss, the second loss, and the third loss, the loss function being used to train the first model.
2. The method of claim 1, wherein the text sample data comprises a first sample, a second sample, and a third sample, and wherein the second sample is similar to the first sample semantic and the third sample is opposite to the first sample semantic; inputting the text sample data into a first model to obtain first sample characteristic data, including:
inputting the text sample data into the first model, and obtaining first target characteristic data according to the first sample;
obtaining second target characteristic data according to the second sample;
obtaining third target characteristic data according to the third sample;
inputting the text sample data into a second model to obtain second sample characteristic data, wherein the second sample characteristic data comprises:
inputting the text sample data into the second model, and obtaining fourth target characteristic data according to the first sample;
obtaining fifth target characteristic data according to the second sample;
obtaining sixth target characteristic data according to the third sample;
inputting the text sample data into a third model to obtain third sample characteristic data, wherein the third sample characteristic data comprises:
inputting the text sample data into the third model, and obtaining seventh target characteristic data according to the first sample;
obtaining eighth target characteristic data according to the second sample;
and obtaining ninth target characteristic data according to the third sample.
3. The method of claim 2, wherein determining a first loss from the first sample characteristic data comprises:
determining a first similarity according to the first target characteristic data and the second target characteristic data;
determining a second similarity according to the first target characteristic data and the third target characteristic data;
and determining the first loss according to the first similarity and the second similarity.
4. The method of claim 3, further comprising:
determining a first entropy value according to the first similarity and the second similarity;
determining a first weight corresponding to the second loss and a second weight corresponding to the third loss according to the first entropy;
said determining a loss function from said first loss, said second loss, and said third loss comprises:
determining the loss function based on the first loss, the second loss, the first weight, the third loss, and the second weight.
5. The method according to claim 3, wherein the determining that a third loss is obtained by the similarity between the first sample feature data and the third sample feature data or by the similarity between the third sample feature data and the similarity between the first sample feature data and the third sample feature data based on a preset condition comprises:
determining an average entropy value according to the first similarity and the second similarity;
if the average entropy value is larger than a preset threshold value, determining that a third loss is obtained by the similarity of the first sample characteristic data and the third sample characteristic data;
and if the average entropy value is smaller than or equal to a preset threshold value, determining that a third loss is obtained according to the similarity between the third sample characteristic data and the similarity between the first sample characteristic data and the third sample characteristic data.
6. The method of claim 2, wherein determining a second loss from the first sample feature data and the second sample feature data comprises:
determining a third similarity according to the first target characteristic data and the fourth target characteristic data;
determining a fourth similarity according to the first target characteristic data, the fifth target characteristic data and the sixth target characteristic data;
and determining the second loss according to the third similarity and the fourth similarity.
7. The method of claim 2, wherein the deriving a third loss from the similarity of the first sample feature data and the third sample feature data comprises:
determining a fifth similarity according to the first target characteristic data and the seventh target characteristic data;
determining a sixth similarity according to the first target characteristic data, the eighth target characteristic data and the ninth target characteristic data;
and determining the third loss according to the fifth similarity and the sixth similarity.
8. The method of claim 7, wherein the deriving a third loss from the similarity between the third sample feature data and the similarity between the first sample feature data and the third sample feature data comprises:
determining a seventh similarity according to the third sample characteristic data obtained in the current round and historical third sample characteristic data;
and determining the third loss according to the seventh similarity, the fifth similarity and the sixth similarity.
9. A model training apparatus, the apparatus comprising:
the sample data acquisition module is used for acquiring text sample data;
the first loss determining module is used for inputting the text sample data into a first model to obtain first sample characteristic data and determining first loss according to the first sample characteristic data;
a second loss determining module, configured to input the text sample data into a second model to obtain second sample feature data, and determine a second loss according to the first sample feature data and the second sample feature data;
a third loss determining module, configured to input the text sample data into a third model to obtain third sample feature data, and determine, based on a preset condition, that a third loss is obtained according to a similarity between the first sample feature data and the third sample feature data, or that a third loss is obtained according to a similarity between the third sample feature data and a similarity between the first sample feature data and the third sample feature data;
a loss function determination module configured to determine a loss function according to the first loss, the second loss, and the third loss, where the loss function is used to train the first model.
10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 8.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
12. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 8 when executed by a processor.
CN202210375330.3A 2022-04-11 2022-04-11 Model training method and device, computer equipment and storage medium Pending CN114861671A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210375330.3A CN114861671A (en) 2022-04-11 2022-04-11 Model training method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210375330.3A CN114861671A (en) 2022-04-11 2022-04-11 Model training method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114861671A true CN114861671A (en) 2022-08-05

Family

ID=82629006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210375330.3A Pending CN114861671A (en) 2022-04-11 2022-04-11 Model training method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114861671A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115577781A (en) * 2022-09-28 2023-01-06 北京百度网讯科技有限公司 Quantum relative entropy determination method, device, equipment and storage medium
CN117591888A (en) * 2024-01-17 2024-02-23 北京交通大学 Cluster autonomous learning fault diagnosis method for key parts of train

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582500A (en) * 2020-05-07 2020-08-25 支付宝(杭州)信息技术有限公司 Method and system for improving model training effect
CN111950302A (en) * 2020-08-20 2020-11-17 上海携旅信息技术有限公司 Knowledge distillation-based machine translation model training method, device, equipment and medium
CN113111968A (en) * 2021-04-30 2021-07-13 北京大米科技有限公司 Image recognition model training method and device, electronic equipment and readable storage medium
CN113191478A (en) * 2020-01-14 2021-07-30 阿里巴巴集团控股有限公司 Training method, device and system of neural network model
CN113505797A (en) * 2021-09-09 2021-10-15 深圳思谋信息科技有限公司 Model training method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191478A (en) * 2020-01-14 2021-07-30 阿里巴巴集团控股有限公司 Training method, device and system of neural network model
CN111582500A (en) * 2020-05-07 2020-08-25 支付宝(杭州)信息技术有限公司 Method and system for improving model training effect
CN111950302A (en) * 2020-08-20 2020-11-17 上海携旅信息技术有限公司 Knowledge distillation-based machine translation model training method, device, equipment and medium
CN113111968A (en) * 2021-04-30 2021-07-13 北京大米科技有限公司 Image recognition model training method and device, electronic equipment and readable storage medium
CN113505797A (en) * 2021-09-09 2021-10-15 深圳思谋信息科技有限公司 Model training method and device, computer equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115577781A (en) * 2022-09-28 2023-01-06 北京百度网讯科技有限公司 Quantum relative entropy determination method, device, equipment and storage medium
CN117591888A (en) * 2024-01-17 2024-02-23 北京交通大学 Cluster autonomous learning fault diagnosis method for key parts of train
CN117591888B (en) * 2024-01-17 2024-04-12 北京交通大学 Cluster autonomous learning fault diagnosis method for key parts of train

Similar Documents

Publication Publication Date Title
US11487954B2 (en) Multi-turn dialogue response generation via mutual information maximization
CN109902706B (en) Recommendation method and device
CN110232480B (en) Project recommendation method realized by using variational regularized stream and model training method
CN109992773B (en) Word vector training method, system, device and medium based on multi-task learning
CN112580369B (en) Sentence repeating method, method and device for training sentence repeating model
CN109376222A (en) Question and answer matching degree calculation method, question and answer automatic matching method and device
CN114861671A (en) Model training method and device, computer equipment and storage medium
CN112699215B (en) Grading prediction method and system based on capsule network and interactive attention mechanism
Dai et al. Hybrid deep model for human behavior understanding on industrial internet of video things
CN108475346A (en) Neural random access machine
CN114492451B (en) Text matching method, device, electronic equipment and computer readable storage medium
CN115496144A (en) Power distribution network operation scene determining method and device, computer equipment and storage medium
CN116775836A (en) Textbook text question-answering method and system based on multi-level attention
CN116610218A (en) AI digital person interaction method, device and system
CN116109449A (en) Data processing method and related equipment
CN115204301A (en) Video text matching model training method and device and video text matching method and device
Yuan et al. Deep learning from a statistical perspective
WO2019234156A1 (en) Training spectral inference neural networks using bilevel optimization
CN116543289B (en) Image description method based on encoder-decoder and Bi-LSTM attention model
CN116796729A (en) Text recommendation method, device, equipment and storage medium based on feature enhancement
CN115392594B (en) Electrical load model training method based on neural network and feature screening
CN113779244B (en) Document emotion classification method and device, storage medium and electronic equipment
CN113378866B (en) Image classification method, system, storage medium and electronic device
CN115795025A (en) Abstract generation method and related equipment thereof
CN115062769A (en) Knowledge distillation-based model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination