CN112801178A

CN112801178A - Model training method, device, equipment and computer readable medium

Info

Publication number: CN112801178A
Application number: CN202110105965.7A
Authority: CN
Inventors: 翟步中
Original assignee: Shanghai Minglue Artificial Intelligence Group Co Ltd
Current assignee: Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2021-05-14
Anticipated expiration: 2041-01-26
Also published as: CN112801178B

Abstract

The application relates to a model training method, a device, equipment and a computer readable medium. The method comprises the following steps: acquiring a first training sample and a second training sample, wherein the sampling probability of the first training sample in data acquisition is greater than that of the second training sample; training the first model by using the first training sample to adjust the initialization parameters of the first model to obtain a second model; the method comprises the steps of training a second model by utilizing a first training sample and a second training sample to enable the second model to serve as a pre-training model, adjusting parameters of the second model through the training samples with different sampling probabilities to obtain a third model, wherein the adjustment range of the parameters of the second model is smaller than the adjustment range of initialization parameters of the first model, the recognition accuracy of the third model on objects of a target category is larger than that of the second model on the objects of the target category, and the second training sample belongs to the target category. The technical problem that the identification accuracy rate of the uncommon articles is low is solved.

Description

Model training method, device, equipment and computer readable medium

Technical Field

The present application relates to the field of intelligent recognition technologies, and in particular, to a model training method, apparatus, device, and computer readable medium.

Background

In the training of carrying out object identification by using deep learning, pictures of a large number of objects are required to be used as training samples, many articles are very common in real life, data acquisition is very convenient, and manual marking can be carried out quickly. The data volume of the uncommon articles is small, so that the recognition accuracy of the trained recognition model on the uncommon articles is low.

At present, in the related art, when training a recognition model, a mode of training samples in a balanced manner can be adopted, that is, the number of samples of common articles is selected according to the number of samples of the uncommon articles, or the number of samples of a large number of the uncommon articles is supplemented, or unbalanced samples are used for training, and the above mode is low in recognition accuracy of the uncommon articles, or needs to spend a large amount of labor cost and time cost.

Aiming at the problem of low identification accuracy of the uncommon articles, no effective solution is provided at present.

Disclosure of Invention

The application provides a model training method, a model training device, model training equipment and a computer readable medium, which are used for solving the technical problem of low identification accuracy of uncommon articles.

According to an aspect of an embodiment of the present application, there is provided a model training method, including:

acquiring a first training sample and a second training sample, wherein the sampling probability of the first training sample in data acquisition is greater than that of the second training sample;

training the first model by using the first training sample to adjust the initialization parameters of the first model to obtain a second model;

the method comprises the steps of training a second model by utilizing a first training sample and a second training sample to enable the second model to serve as a pre-training model, adjusting parameters of the second model through the training samples with different sampling probabilities to obtain a third model, wherein the adjustment range of the parameters of the second model is smaller than the adjustment range of initialization parameters of the first model, the recognition accuracy of the third model on objects of a target category is larger than that of the second model on the objects of the target category, and the second training sample belongs to the target category.

Optionally, training the first model by using the first training sample to adjust the initialization parameter of the first model, and obtaining the second model includes:

initializing parameters in each network layer in the first model through the first training sample to obtain initialization parameters, and adjusting the initialization parameters according to the difference between the recognition result of the first training sample and the pre-labeled classification result of the first training sample by the first model;

under the condition that the recognition accuracy of the first model on the test data reaches a first threshold value, taking the first model as a second model, wherein the test data belongs to at least one of data categories included in the first training sample;

and under the condition that the recognition accuracy of the first model to the test data does not reach a first threshold value, continuing to train the first model by using the first training sample so as to continuously adjust the numerical values of the parameters in each network layer in the first model until the recognition accuracy of the first model to the test data reaches the first threshold value.

Optionally, training the second model by using the first training sample and the second training sample to use the second model as a pre-training model, and adjusting parameters of the second model by using training samples with different sampling probabilities to obtain a third model, including:

extracting the same number of training samples from the first training sample and the second training sample to form a third training sample, wherein each training sample in the third training sample comprises a training sample extracted from the first training sample and a training sample extracted from the second training sample;

dividing the third training sample into a support set and a query set;

inputting the support set into a second model for training, and adjusting parameters of the second model according to the difference between the recognition result of the support set and the pre-labeled classification result of the support set by the second model;

taking the second model as a third model when the recognition accuracy of the second model on the inquiry set reaches a second threshold value;

and under the condition that the recognition accuracy of the second model to the inquiry set does not reach a second threshold value, continuing to train the second model by using the inquiry set so as to continuously adjust the numerical values of the parameters in each network layer in the second model until the recognition accuracy of the second model to the inquiry set reaches the second threshold value.

Optionally, after obtaining the third model, the method further includes:

randomly mixing the first training sample and the second training sample to obtain a fourth training sample;

inputting the fourth training sample into a third model for training, and adjusting the parameter of the third model according to the difference between the recognition result of the fourth training sample and the pre-labeled classification result of the fourth training sample by the third model;

under the condition that the identification accuracy of the third model on the test data reaches a third threshold value, taking the third model as a fourth model, wherein the identification accuracy of the fourth model on the object in the target category is higher than the identification accuracy of the third model on the object in the target category, and the category of the object indicated by the test data comprises the target category;

and under the condition that the recognition accuracy of the third model on the test data does not reach a third threshold value, continuing to train the third model by using the fourth training sample so as to continue to adjust the numerical values of the parameters in each network layer in the third model until the recognition accuracy of the third model on the inquiry set reaches the third threshold value, wherein the adjustment amplitude of the parameters of the third model is smaller than that of the parameters of the second model.

Optionally, adjusting the values of the parameters in the network layers within the third model until the accuracy of the identification of the query set by the third model reaches a third threshold comprises:

determining a loss value by using a target loss function, wherein the loss value is used for representing the difference of the accuracy between the identification result of the third model on the test data and the actual class label of the test data;

and adjusting the parameters of the third model by using the loss value until the output precision of the third model reaches a third threshold value.

Optionally, adjusting the values of the parameters in the network layers in the third model further includes:

determining a gradient of the target loss function;

and taking the product of the parameter of the third model and the gradient as a new parameter of the third model.

Optionally, obtaining the first training sample and the second training sample comprises:

determining a data acquisition range;

collecting target data within a data collection range;

classifying the target data according to a preset category, and determining the proportion of each category of data in the target data to obtain the sampling probability of each category of data;

and taking the data with the sampling probability greater than a preset threshold value as a first training sample, and taking the data with the sampling probability less than or equal to the preset threshold value as a second training sample.

According to another aspect of the embodiments of the present application, there is provided a model training apparatus, including:

the training sample acquisition module is used for acquiring a first training sample and a second training sample, wherein the sampling probability of the first training sample during data acquisition is greater than that of the second training sample;

the first training module is used for training the first model by using the first training sample so as to adjust the initialization parameters of the first model and obtain a second model;

the second training module is used for training a second model by utilizing the first training sample and the second training sample to take the second model as a pre-training model, adjusting parameters of the second model by utilizing the training samples with different sampling probabilities to obtain a third model, wherein the adjustment range of the parameters of the second model is smaller than the adjustment range of the initialization parameters of the first model, the recognition accuracy of the third model on the object of the target category is greater than the recognition accuracy of the second model on the object of the target category, and the category of the second training sample comprises the target category.

According to another aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, a communication interface, and a communication bus, where the memory stores a computer program executable on the processor, and the memory and the processor communicate with each other through the communication bus and the communication interface, and the processor implements the steps of the method when executing the computer program.

According to another aspect of embodiments of the present application, there is also provided a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the above-mentioned method.

Compared with the related art, the technical scheme provided by the embodiment of the application has the following advantages:

the technical scheme includes that a first training sample and a second training sample are obtained, and the sampling probability of the first training sample is greater than that of the second training sample during data acquisition; training the first model by using the first training sample to adjust the initialization parameters of the first model to obtain a second model; the method comprises the steps of training a second model by utilizing a first training sample and a second training sample to enable the second model to serve as a pre-training model, adjusting parameters of the second model through the training samples with different sampling probabilities to obtain a third model, wherein the adjustment range of the parameters of the second model is smaller than the adjustment range of initialization parameters of the first model, the recognition accuracy of the third model on objects of a target category is larger than that of the second model on the objects of the target category, and the second training sample belongs to the target category. According to the method and the device, the second model obtained by training the first training sample is used as the pre-training model, the parameters of the model are further adjusted by utilizing the training samples with different sampling probabilities, so that the identification accuracy of the unusual articles can be improved on the basis of meeting the accurate identification of most articles by the final model, and the technical problem of low identification accuracy of the unusual articles is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the technical solutions in the embodiments or related technologies of the present application, the drawings needed to be used in the description of the embodiments or related technologies will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without any creative effort.

FIG. 1 is a diagram illustrating an alternative hardware environment for a model training method according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of an alternative model training method provided in accordance with an embodiment of the present application;

FIG. 3 is a schematic view of a long tail distribution;

FIG. 4 is a schematic diagram of an alternative model training scheme provided in accordance with an embodiment of the present application;

FIG. 5 is a schematic diagram of an alternative model training scheme provided in accordance with an embodiment of the present application;

FIG. 6 is a schematic diagram of an alternative model training scheme provided in accordance with an embodiment of the present application;

FIG. 7 is a block diagram of an alternative model training apparatus provided in accordance with an embodiment of the present application;

fig. 8 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.

In the related art, when training the recognition model, a mode of balancing training samples can be adopted, namely the number of samples of common articles is selected according to the number of samples of the uncommon articles, or the number of samples of a large number of the uncommon articles is supplemented, or unbalanced samples are used for training, and the above mode is low in recognition accuracy of the uncommon articles, or needs to spend a large amount of labor cost and time cost.

To address the problems mentioned in the background, according to an aspect of embodiments of the present application, an embodiment of a model training method is provided.

Alternatively, in the embodiment of the present application, the model training method may be applied to a hardware environment formed by the terminal 101 and the server 103 as shown in fig. 1. As shown in fig. 1, a server 103 is connected to a terminal 101 through a network, which may be used to provide services for the terminal or a client installed on the terminal, and a database 105 may be provided on the server or separately from the server, and is used to provide data storage services for the server 103, and the network includes but is not limited to: wide area network, metropolitan area network, or local area network, and the terminal 101 includes but is not limited to a PC, a cell phone, a tablet computer, and the like.

A model training method in this embodiment of the present application may be executed by the server 103, or may be executed by both the server 103 and the terminal 101, as shown in fig. 2, the method may include the following steps:

step S202, a first training sample and a second training sample are obtained, and the sampling probability of the first training sample during data acquisition is greater than that of the second training sample.

The model training method provided by the embodiment of the application can train the neural network model for identifying the articles, and particularly can accurately identify the unusual articles. Most of the real life items are subject to long tail distribution as shown in fig. 3. For example, downloads from songs and software, web clicks to sales at online stores on the internet, all present features of long tail distribution. Here the long tail distribution is also relevant to the "leaderboard" culture that statistically ranks the popular events. The online music database has huge capacity and very convenient downloading mode. Sorting the tracks by download size can approximately result in a decreasing curve. At the head of the curve, popular tracks are downloaded in large numbers. Next, as popularity decreases, the curve decreases in vain, falling off rapidly to zero at the tail, but rather approaching very slowly to the horizontal axis, with the bold look extending almost parallel to the horizontal axis (which indicates that very unhealthy music dates still maintain a certain download rate). The correspondence between this particular ordering (i.e., ranking) and the amount of downloading is a long tail distribution. In this embodiment, the articles in the category 1, the category 2, and the category 3 may be referred to as common articles, and the image data acquired for the articles in the categories is the first training sample. While items of categories 4 through 14 may be referred to as uncommon items, the corresponding captured image data is a second training sample.

In an embodiment of the application, the first training sample is collected data of an article which is relatively common in real life, and the second training sample is collected data of an article which is not common in life. Taking an animal as an example, the first training sample is image data collected for poultry and common wildlife, and the second training sample is image data collected for rare animals. Taking a vehicle as an example, the first training sample is image data acquired from a bicycle, a motorcycle, an automobile, an airplane, a ship and a train, and the second training sample is image data acquired from a space shuttle, an aircraft carrier, a rocket and the like. The first training sample is easy to obtain, and the second training sample is difficult to obtain, so that the data size of the first training sample is far larger than that of the second training sample.

Step S204, training the first model by using the first training sample to adjust the initialization parameters of the first model to obtain a second model.

In the embodiment of the application, the first model can be trained by using the image data of a large number of common articles, so that an initial model with high identification accuracy on the common articles is obtained by using big data. The parameters of the first model are randomly initialized.

Step S206, training a second model by using a first training sample and a second training sample to take the second model as a pre-training model, adjusting parameters of the second model by using training samples with different sampling probabilities to obtain a third model, wherein the adjustment range of the parameters of the second model is smaller than that of the initial parameters of the first model, the recognition accuracy of the third model to the object of the target category is greater than that of the second model to the object of the target category, and the second training sample belongs to the target category.

In this application embodiment, can regard as the pre-training model with above-mentioned second model, utilize first training sample and second training sample further training to let the second model learn the discernment of unusual article based on the parameter of better second model, obtain the third model, in the training process, according to the parameter of real-time training result fine setting second model, the third model that makes obtain not only keeps the high recognition accuracy to common article, can also promote the recognition accuracy to unusual article.

Optionally, the step S204 of training the first model by using the first training sample to adjust the initialization parameter of the first model, and obtaining the second model includes the following steps:

step 11, initializing parameters in each network layer in the first model through the first training sample to obtain initialization parameters, and adjusting the initialization parameters according to the difference between the recognition result of the first training sample and the pre-labeled classification result of the first training sample by the first model;

step 12, under the condition that the recognition accuracy of the first model to the test data reaches a first threshold value, taking the first model as a second model, wherein the test data belongs to at least one of data categories included in the first training sample; and under the condition that the recognition accuracy of the first model to the test data does not reach a first threshold value, continuing to train the first model by using the first training sample so as to continuously adjust the numerical values of the parameters in each network layer in the first model until the recognition accuracy of the first model to the test data reaches the first threshold value.

In the embodiment of the application, the initialization of the parameters in each network layer in the first model is random initialization through the first training sample. The difference between the recognition result of the first model on the first training sample and the pre-labeled classification result of the first training sample may be determined by a loss function. The above target categories are categories of uncommon articles.

In this embodiment of the application, the first training sample may include image data of articles in multiple categories, so that the test data for performing test verification on the first model is at least one of the categories included in the data in the first training sample, and in order to reflect the identification accuracy of the model more accurately, a certain amount of image data may be collected for each category as the test data. The first threshold may be set as needed, and may be set according to an experimental result. And the recognition accuracy of the first model on the test data reaches a first threshold value, which indicates that the current first model can recognize the object very accurately, namely the current first model obtains better parameters, and then the current first model is used as the second model for subsequent further training.

In this embodiment of the application, the training of the first model by using the first training sample to continuously adjust the numerical values of the parameters in the network layers in the first model until the recognition accuracy of the test data by the first model reaches the first threshold may include: determining a pre-training loss value by using a target loss function, wherein the pre-training loss value is used for representing the difference of the accuracy between the recognition result of the first model on the test data and the actual class label of the test data; and adjusting parameters of the first model by using the pre-training loss value until the output precision of the first model reaches a first threshold value. The target loss function may be a softmax function, a Relu function, or the like. Or, when the parameters are adjusted, the gradient can be calculated in the process of training the first model by using the first training sample, and the product of the gradient and the original parameters is used as a new parameter.

In this embodiment of the present application, the method for adjusting the initialization parameter according to the difference between the recognition result of the first training sample and the pre-labeled classification result of the first training sample by the first model may also be adopted.

In the embodiment of the present application, the schematic diagram of step S204 is shown asFIG. 4 shows that the initialization parameter of the network structure of the first model is θ₀The first training sample is an easy sample, the gradient g is calculated in the training process, and the gradient g is used for updating theta₀To obtain theta₁。

Optionally, the training the second model by using the first training sample and the second training sample in step S206 to use the second model as a pre-training model, and adjusting parameters of the second model by using training samples with different sampling probabilities to obtain the third model may include the following steps:

step 21, extracting the same number of training samples from the first training sample and the second training sample to form a third training sample, wherein each training sample in the third training sample comprises a training sample extracted from the first training sample and a training sample extracted from the second training sample;

step 22, dividing the third training sample into a support set and a query set;

step 23, inputting the support set into a second model for training, and adjusting parameters of the second model according to the difference between the recognition result of the support set and the pre-labeled classification result of the support set by the second model;

step 24, taking the second model as a third model under the condition that the identification accuracy of the second model to the inquiry set reaches a second threshold value; and under the condition that the recognition accuracy of the second model to the inquiry set does not reach a second threshold value, continuing to train the second model by using the inquiry set so as to continuously adjust the numerical values of the parameters in each network layer in the second model until the recognition accuracy of the second model to the inquiry set reaches the second threshold value.

In the embodiment of the application, the second model can be further trained by adopting a training mode of a balanced training sample based on the second model. On the basis that the second model has better model parameters, training samples of the uncommon articles are added in further training, so that the parameters of the second model are finely adjusted, and a trained third model can learn and adapt to how to identify the uncommon articles.

In the embodiment of the application, the same number of training samples can be extracted from the first training sample and the second training sample respectively to form a third training sample, the third training sample is divided into a support set and an inquiry set, the support set is used for training data, the inquiry set is used for verifying the training result of the support set on the second model, namely the inquiry set is used as test data to see the recognition accuracy of the second model on the test data, and the recognition accuracy reaches a second threshold value, the current second model is used as the third model to show that the third model can accurately recognize the uncommon article, and if the recognition accuracy does not reach the second threshold value, the inquiry set is continuously used for training the second model until the recognition accuracy of the second model on the inquiry set reaches the second threshold value. The parameters of the second model are finely adjusted in the training process, and the mode of adjusting the parameters may include: determining a further trained loss value using the target loss function, the further trained loss value being used to represent a difference in accuracy between the recognition result of the second model on the challenge set (or support set) and the actual class label of the challenge set (or support set); and adjusting the parameters of the second model by using the loss value of the further training until the output precision of the second model reaches a second threshold value. Or, when the parameters are adjusted, the gradient can be calculated in the process of training the second model by using the support set and the inquiry set, and the product of the gradient and the original parameters is used as a new parameter.

The second threshold may be set as needed, and may be set according to an experimental result.

In the embodiment of the present application, the schematic diagram of step S206 is shown in fig. 5, and the parameter of the network structure of the second model is θ₁Training a second model by using the support set, calculating a gradient g in the training process, and updating theta by using the gradient g₁To obtain theta₂. On the basis of the training of the support set, the second model can be trained by using the inquiry set, the gradient g is calculated in the training process, and the gradient g is used for updating theta₂To obtain theta₃。

In order to further adapt the model to the identification of different articles, after obtaining the third model, the method may further comprise the steps of:

step 31, randomly mixing the first training sample and the second training sample to obtain a fourth training sample;

step 32, inputting a fourth training sample into a third model for training, and adjusting parameters of the third model according to the difference between the recognition result of the fourth training sample and the pre-labeled classification result of the fourth training sample by the third model;

step 33, when the recognition accuracy of the third model on the test data reaches a third threshold, taking the third model as a fourth model, wherein the recognition accuracy of the fourth model on the object in the target category is greater than the recognition accuracy of the third model on the object in the target category, and the category to which the object indicated by the test data belongs includes the target category; and under the condition that the recognition accuracy of the third model on the test data does not reach a third threshold value, continuing to train the third model by using the fourth training sample so as to continue to adjust the numerical values of the parameters in each network layer in the third model until the recognition accuracy of the third model on the inquiry set reaches the third threshold value, wherein the adjustment amplitude of the parameters of the third model is smaller than that of the parameters of the second model.

In the embodiment of the application, the third model may be further trained by adopting a training mode of an unbalanced training sample based on the third model. On the basis that the third model can better identify the uncommon articles, all training samples are used in further training to adopt an unbalanced training method, namely, a large number of common articles are used as interference, and the unusual articles are identified and trained in the interference, so that the parameters of the third model are further finely adjusted, and the identification accuracy of the trained fourth model can be further improved.

In the embodiment of the application, a fourth training sample may be formed by combining all the first training samples and the second training samples in a random mixing manner, and the fourth training sample is used to train the third model. And when the recognition accuracy of the third model on the test data reaches a third threshold, taking the current third model as a fourth model, wherein the recognition accuracy of the fourth model on the uncommon articles is higher than that of the third model on the uncommon articles, the test data comprises image data collected on the uncommon articles, and the target category comprises the category of the uncommon articles.

And if not, continuing to train the third model by using the fourth training sample until the recognition accuracy of the third model on the test data reaches a third threshold value. The parameters of the third model are further fine-tuned in the training process, and the mode of adjusting the parameters may include: determining a loss value by using a target loss function, wherein the loss value is used for representing the difference of the accuracy between the identification result of the third model on the test data and the actual class label of the test data; and adjusting the parameters of the third model by using the loss value until the output precision of the third model reaches a third threshold value. Or when the parameters are adjusted, the gradient can be calculated in the process of training the third model by using the fourth training sample, and the product of the gradient and the original parameters is used as a new parameter.

The third threshold may be set as needed, and may be set according to an experimental result.

In the embodiment of the present application, a schematic diagram of training the third model by using all training samples is shown in fig. 6, where a parameter of a network structure of the third model is θ₃Calculating gradient g in training process, and updating theta by using gradient g₃To obtain theta₄。

Optionally, obtaining the first training sample and the second training sample may include:

determining a data acquisition range; collecting target data within a data collection range; classifying the target data according to a preset category, and determining the proportion of each category of data in the target data to obtain the sampling probability of each category of data; and taking the data with the sampling probability greater than a preset threshold value as a first training sample, and taking the data with the sampling probability less than or equal to the preset threshold value as a second training sample.

In the embodiment of the present application, the data acquisition range may be the field of the final application of the model, such as buildings, animals, plants, vehicles, home appliances, and the like. And calculating the sampling probability of each type of data according to the quantity of the collected data, thereby dividing a first training sample and a second training sample.

According to still another aspect of an embodiment of the present application, as shown in fig. 7, there is provided a model training apparatus including:

a training sample obtaining module 701, configured to obtain a first training sample and a second training sample, where a sampling probability of the first training sample during data acquisition is greater than a sampling probability of the second training sample;

a first training module 703, configured to train the first model by using the first training sample to adjust an initialization parameter of the first model, so as to obtain a second model;

the second training module 705 is configured to train a second model by using the first training sample and the second training sample, to use the second model as a pre-training model, and adjust parameters of the second model by using training samples with different sampling probabilities to obtain a third model, where an adjustment range of the parameters of the second model is smaller than an adjustment range of initialization parameters of the first model, a recognition accuracy of the third model on an object of a target category is greater than a recognition accuracy of the second model on an object of the target category, and a category of the second training sample includes the target category.

It should be noted that the training sample obtaining module 701 in this embodiment may be configured to execute step S202 in this embodiment, the first training module 703 in this embodiment may be configured to execute step S204 in this embodiment, and the second training module 705 in this embodiment may be configured to execute step S206 in this embodiment.

It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may operate in a hardware environment as shown in fig. 1, and may be implemented by software or hardware.

Optionally, the first training module is specifically configured to:

Optionally, the second training module is specifically configured to:

dividing the third training sample into a support set and a query set;

Optionally, the model training apparatus further includes a third training module, configured to:

Optionally, the third training module is further configured to:

determining a gradient of the target loss function;

Optionally, the training sample obtaining module is specifically configured to:

determining a data acquisition range;

collecting target data within a data collection range;

According to another aspect of the embodiments of the present application, there is provided an electronic device, as shown in fig. 8, including a memory 801, a processor 803, a communication interface 805, and a communication bus 807, where the memory 801 stores a computer program that can be executed on the processor 803, the memory 801 and the processor 803 communicate with each other through the communication interface 805 and the communication bus 807, and the steps of the method are implemented when the processor 803 executes the computer program.

The memory and the processor in the electronic equipment are communicated with the communication interface through a communication bus. The communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

There is also provided, in accordance with yet another aspect of an embodiment of the present application, a computer-readable medium having non-volatile program code executable by a processor.

Optionally, in an embodiment of the present application, a computer readable medium is configured to store program code for the processor to perform the following steps:

the method comprises the steps of training a second model by utilizing a first training sample and a second training sample to enable the second model to serve as a pre-training model, adjusting parameters of the second model through the training samples with different sampling probabilities to obtain a third model, wherein the adjustment range of the parameters of the second model is smaller than the adjustment range of initialization parameters of the first model, the recognition accuracy of the third model on objects of a target category is larger than that of the second model on the objects of the target category, and the category of the second training sample comprises the target category.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.

When the embodiments of the present application are specifically implemented, reference may be made to the above embodiments, and corresponding technical effects are achieved.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk. It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of model training, comprising:

training a first model by using the first training sample to adjust the initialization parameters of the first model to obtain a second model;

training the second model by using the first training sample and the second training sample to take the second model as a pre-training model, and adjusting parameters of the second model by using training samples with different sampling probabilities to obtain a third model, wherein the adjustment range of the parameters of the second model is smaller than the adjustment range of initialization parameters of the first model, the recognition accuracy of the third model on objects of a target category is greater than the recognition accuracy of the second model on the objects of the target category, and the category of the second training sample comprises the target category.

2. The method of claim 1, wherein training a first model with the first training samples to adjust initialization parameters of the first model, and wherein obtaining a second model comprises:

initializing parameters in each network layer in the first model through the first training sample to obtain the initialized parameters, and adjusting the initialized parameters according to the difference between the recognition result of the first training sample and the pre-labeled classification result of the first training sample by the first model;

taking the first model as the second model when the recognition accuracy of the first model on test data reaches a first threshold value, wherein the test data belongs to at least one of data categories included in the first training sample;

and under the condition that the recognition accuracy of the first model to the test data does not reach the first threshold, continuing to train the first model by using the first training sample so as to continuously adjust the numerical values of the parameters in each network layer in the first model until the recognition accuracy of the first model to the test data reaches the first threshold.

3. The method of claim 1, wherein training the second model using the first training sample and the second training sample to use the second model as a pre-training model, and adjusting parameters of the second model by sampling training samples with different probabilities to obtain a third model comprises:

dividing the third training sample into a support set and a challenge set;

inputting the support set into the second model for training, and adjusting the parameters of the second model according to the difference between the recognition result of the support set and the pre-labeled classification result of the support set by the second model;

if the recognition accuracy of the second model on the query set reaches a second threshold, regarding the second model as the third model;

and under the condition that the recognition accuracy of the second model to the inquiry set does not reach the second threshold, continuing to train the second model by using the inquiry set so as to continuously adjust the numerical values of the parameters in each network layer in the second model until the recognition accuracy of the second model to the inquiry set reaches the second threshold.

4. The method of claim 3, wherein after obtaining the third model, the method further comprises:

inputting the fourth training sample into the third model for training, and adjusting the parameter of the third model according to the difference between the recognition result of the fourth training sample and the pre-labeled classification result of the fourth training sample by the third model;

if the recognition accuracy of the third model on the test data reaches a third threshold value, regarding the third model as a fourth model, wherein the recognition accuracy of the fourth model on the object in the target category is greater than the recognition accuracy of the third model on the object in the target category, and the category of the object indicated by the test data comprises the target category;

and under the condition that the recognition accuracy of the third model on the test data does not reach the third threshold, continuing to train the third model by using the fourth training sample so as to continue to adjust the numerical values of the parameters in each network layer in the third model until the recognition accuracy of the third model on the inquiry set reaches the third threshold, wherein the adjustment amplitude of the parameters of the third model is smaller than that of the parameters of the second model.

5. The method of claim 4, wherein adjusting the values of the parameters in the network layers within the third model until the accuracy of the identification of the set of queries by the third model reaches the third threshold comprises:

determining a loss value using an objective loss function, wherein the loss value is used to represent a difference in accuracy between the identification of the test data by the third model and the actual class label of the test data;

and adjusting the parameters of the third model by using the loss value until the output precision of the third model reaches the third threshold value.

6. The method of claim 5, wherein adjusting the values of the parameters in the network layers within the third model further comprises:

determining a gradient of the target loss function;

taking the product of the parameter of the third model and the gradient as a new parameter of the third model.

7. The method of any of claims 1 to 6, wherein obtaining the first training sample and the second training sample comprises:

determining a data acquisition range;

collecting target data within the data collection range;

and taking the data with the sampling probability greater than a preset threshold value as the first training sample, and taking the data with the sampling probability less than or equal to the preset threshold value as the second training sample.

8. A model training apparatus, comprising:

the first training module is used for training a first model by using the first training sample so as to adjust the initialization parameters of the first model and obtain a second model;

the second training module is used for training the second model by using the first training sample and the second training sample so as to use the second model as a pre-training model, and adjusting parameters of the second model by sampling training samples with different probabilities to obtain a third model, wherein the adjustment range of the parameters of the second model is smaller than the adjustment range of initialization parameters of the first model, the recognition accuracy of the third model on objects of a target category is greater than that of the second model on the objects of the target category, and the category of the second training sample comprises the target category.

9. An electronic device comprising a memory, a processor, a communication interface and a communication bus, wherein the memory stores a computer program operable on the processor, and the memory and the processor communicate via the communication bus and the communication interface, wherein the processor implements the steps of the method according to any of the claims 1 to 7 when executing the computer program.

10. A computer-readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of any of claims 1 to 7.