CN116935102A

CN116935102A - Lightweight model training method, device, equipment and medium

Info

Publication number: CN116935102A
Application number: CN202310793747.6A
Authority: CN
Inventors: 孔欧
Original assignee: Shanghai Mdata Information Technology Co ltd
Current assignee: Shanghai Mdata Information Technology Co ltd
Priority date: 2023-06-30
Filing date: 2023-06-30
Publication date: 2023-10-24
Anticipated expiration: 2043-06-30
Also published as: CN116935102B

Abstract

The application relates to a lightweight model training method, a device, equipment and a medium, wherein the training method comprises the following steps: constructing an image classification lightweight model based on the image classification large model, and respectively inputting training data into the image classification large model and the image classification lightweight model to obtain final loss of an image classification task; constructing a target detection lightweight model based on the target detection large model, and respectively inputting training data into the target detection large model and the target detection lightweight model to obtain the final loss of a target detection task; and carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss. The application can improve the effect of the lightweight model.

Description

Lightweight model training method, device, equipment and medium

Technical Field

The application relates to the technical field of machine learning, in particular to a lightweight model training method.

Background

At present, a light model, namely a model with a small parameter and a high reasoning speed, is to be trained, and is usually obtained by designing a light model structure and then optimizing the light model structure according to a real label (group_trunk) by using a loss function (loss_func), so that the effect of the trained light model is often much poorer than that of a large model (large model).

Disclosure of Invention

The application aims to solve the technical problem of providing a lightweight model training method which can improve the effect of a lightweight model.

The technical scheme adopted for solving the technical problems is as follows: the light model training method comprises the following steps:

constructing an image classification lightweight model based on an image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task;

constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;

and carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss to realize joint training of the image classification lightweight model and the target detection lightweight model.

The image classification large model and the target detection large model both employ a softmax_t function, which is the input of the index portion in the original softmax function divided by a constant T.

The training data are respectively input into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task, and the method specifically comprises the following steps:

inputting training data into the image classification large model to obtain a first probability of each category;

inputting the training data into the image classification lightweight model to obtain a second probability of each category;

and calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category and the real classification label.

The calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category and the real classification label specifically comprises the following steps:

performing cross entropy loss function calculation on the first probability of each category and the second probability of each category to obtain a first result;

performing cross entropy loss function calculation on the second probability of each category and the real classification label to obtain a second result;

and calculating the average value of the first result and the second result, and taking the obtained average value as the final loss of the image classification task.

The training data are respectively input into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task, and the method specifically comprises the following steps:

inputting training data into the target detection large model to obtain a first probability of each object category and a first rectangular frame position of each object;

inputting the training data into the target detection lightweight model to obtain a second probability of each object category and a second rectangular frame position of each object;

the final loss of the target detection task is calculated based on the first probability of each object class, the first rectangular box position of each object, the second probability of each object class, the second rectangular box position of each object, and the real target detection label.

The calculating the final loss of the target detection task based on the first probability of each object category, the first rectangular frame position of each object, the second probability of each object category, the second rectangular frame position of each object and the real target detection label specifically comprises the following steps:

performing cross entropy loss function calculation on the first probability of each object class and the second probability of each object class to obtain a first result;

performing cross entropy loss function calculation on the second probability of each object class and the classification sub-label in the real target detection label to obtain a second result;

summing the first result and the second result to obtain a classification loss value;

carrying out a mean square error loss function calculation on the first rectangular frame position of each object and the second rectangular frame position of each object to obtain a third result;

carrying out mean square error loss function calculation on the second rectangular frame position of each object and the position sub-label in the real target detection label to obtain a fourth result;

summing the third result and the fourth result to obtain a position loss value;

and obtaining an average value of the classification loss value and the position loss value, and taking the obtained average value as the final loss of the target detection task.

The technical scheme adopted for solving the technical problems is as follows: provided is a lightweight model training device, comprising:

the image classification training module is used for constructing an image classification lightweight model based on the image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task;

the target detection training module is used for constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;

and the optimization training module is used for carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss so as to realize joint training of the image classification lightweight model and the target detection lightweight model.

The technical scheme adopted for solving the technical problems is as follows: there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the lightweight model training method described above when executing the computer program.

The technical scheme adopted for solving the technical problems is as follows: there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the lightweight model training method described above.

Advantageous effects

Due to the adoption of the technical scheme, compared with the prior art, the application has the following advantages and positive effects: according to the application, the lightweight model is trained by utilizing the output probabilities of the real label and the large model, so that the effect of the lightweight model is improved, and in addition, the lightweight model of the target detection and image classification task can be trained and optimized together, so that the training efficiency is improved.

Drawings

Fig. 1 is a flowchart of a first embodiment of a quantization model training method of the present application.

Detailed Description

The application will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present application and are not intended to limit the scope of the present application. Furthermore, it should be understood that various changes and modifications can be made by one skilled in the art after reading the teachings of the present application, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.

A first embodiment of the present application relates to a lightweight model training method, as shown in fig. 1, comprising the steps of:

step 1, constructing an image classification lightweight model based on an image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; and respectively inputting training data into the image classification large model and the image classification lightweight model to obtain the final loss of the image classification task.

The image classification large model and the image classification lightweight model in this step are each composed of 3 parts, namely backbone, class _head and softmax functions, according to the general design of the image classification network.

The parameters of the image classification large model are fixed and only responsible for forward reasoning. Obtaining a feature vector by training an image through a background and a class head of the image classification large model; in this embodiment, the softmax function of the image classification large model is slightly modified, the input of the index part in the original formula is divided by a constant T, the value of T is 100 in this embodiment, and the rest is unchanged, and the softmax function is named as the softmax_t function. The feature vector is subjected to a softmax_t function to obtain a first probability prob_class_large of each category.

The parameter quantity of the backbox of the image classification lightweight model is set to be 1% of the parameter quantity of the image classification large model, the rest is unchanged, and the training image can obtain a second probability prob_class_lite of each category through the image classification lightweight model.

And carrying out cross entropy loss function calculation on the second probability prob_class_lite of each class and the first probability prob_class_large of each class to obtain a first result, and recording the first result as a loss_large value.

And performing cross entropy loss function calculation on the second probability prob_class_lite and the real classification label group_trunk_class of each class to obtain a second result, and marking the second result as a loss_trunk value.

And averaging the first result loss_large and the second result loss_trunk to obtain the final loss loss_final_class of the image classification task.

Step 2, constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; and respectively inputting training data into the target detection large model and the target detection lightweight model to obtain the final loss of the target detection task.

The large target detection model and the light target detection model in the step are mainly composed of 5 parts, namely backbone, class _head, box_head, softmax function and sigmoid function according to the general design of a target detection network.

The target detection large model is the same as the image classification large model, and the parameters are fixed and only responsible for forward reasoning. The training image sequentially passes through backbone, class _head and softmax_T functions of the target detection large model (the softmax_T functions are the same as the design of the image classification large model), so that a first probability prob_class_large of each object type is obtained, and the training image sequentially passes through back box, box_head and sigmoid functions of the target detection large model, so that a first rectangular frame position value_box_large of each object is obtained.

The parameter quantity of the backup in the target detection lightweight model is set to be 1% of the parameter quantity of the target detection large model, the rest is unchanged, and the second probability prob_class_lite of each object class and the second rectangular frame position value_box_lite of each object can be obtained after the training image passes through the target detection lightweight model.

And performing cross entropy loss function calculation on the second probability prob_class_lite of each object class and the first probability prob_class_large of each object class to obtain a first result, and marking the first result as a loss_large_class value.

And performing cross entropy loss function calculation on the second probability prob_class_lite of each object class and the classified sub-label group_trunk_class in the real target detection label to obtain a second result, and marking the second result as a loss_trunk_class value.

And summing the first result loss_large_class and the second result loss_trunk_class to obtain a classification loss value loss_class_final.

And carrying out mean square error loss function calculation on the first rectangular frame position value_box_large of each object and the second rectangular frame position value_box_lite of each object to obtain a third result, and recording the third result as a loss_large_box value.

And carrying out mean square error loss function calculation on the second rectangular frame position value_box_lite of each object and the position sub-label group_trunk_box in the real target detection label to obtain a fourth result, and marking the fourth result as a loss_trunk_box value.

And summing the third result loss_large_box and the fourth result loss_trunk to obtain a position loss value loss_box_final.

The classification loss value loss_class_final and the position loss value loss_box_final are averaged to obtain the final loss loss_final_detection of the target detection task.

And 3, carrying out weighted summation on the final loss loss_final_class of the image classification task and the final loss_final_detection of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss to realize joint training of the image classification lightweight model and the target detection lightweight model.

The formula for weighted summation in this embodiment is: 0.6 x loss_final_class+0.4 x loss_final_detection.

It is easy to find that the application utilizes the output probability of the real label and the large model to train the lightweight model at the same time, and improves the effect of the lightweight model.

A second embodiment of the present application relates to a lightweight model training device, comprising:

The image classification training module inputs training data into an image classification large model and an image classification lightweight model respectively to obtain final loss of an image classification task, and specifically comprises the following steps:

When training data are respectively input into a target detection large model and a target detection lightweight model by the target detection training module to obtain the final loss of a target detection task, the target detection training module specifically comprises the following steps:

summing the third result and the fourth result to obtain a position loss value;

A third embodiment of the application is directed to an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the lightweight model training method of the first embodiment when executing the computer program.

A fourth embodiment of the present application is directed to a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the lightweight model training method of the first embodiment.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The lightweight model training method is characterized by comprising the following steps of:

constructing an image classification lightweight model based on an image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task; constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;

2. The lightweight model training method of claim 1, wherein the image classification large model and the target detection large model each employ a softmax_t function, the softmax_t function being the input of the index portion divided by a constant T in the original softmax function.

3. The method for training a lightweight model according to claim 1, wherein the step of inputting training data into the image classification large model and the image classification lightweight model to obtain final loss of the image classification task comprises:

4. A lightweight model training method as defined in claim 3, wherein the calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category, and the true classification label specifically comprises:

performing cross entropy loss function calculation on the second probability of each category and the real classification label to obtain a second result; and calculating the average value of the first result and the second result, and taking the obtained average value as the final loss of the image classification task.

5. The method for training a lightweight model according to claim 1, wherein the step of inputting training data into the large target detection model and the lightweight target detection model to obtain a final loss of the target detection task comprises: inputting training data into the target detection large model to obtain a first probability of each object category and a first rectangular frame position of each object;

6. The lightweight model training method of claim 5, wherein the calculating the final loss of the target detection task based on the first probability of each object class, the first rectangular box position of each object, the second probability of each object class, the second rectangular box position of each object, and the real target detection label specifically comprises: performing cross entropy loss function calculation on the first probability of each object class and the second probability of each object class to obtain a first result;

summing the third result and the fourth result to obtain a position loss value;

7. A lightweight model training device, comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the lightweight model training method of any of claims 1-6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the lightweight model training method as claimed in any of the claims 1-6.