CN116935102A - Lightweight model training method, device, equipment and medium - Google Patents
Lightweight model training method, device, equipment and medium Download PDFInfo
- Publication number
- CN116935102A CN116935102A CN202310793747.6A CN202310793747A CN116935102A CN 116935102 A CN116935102 A CN 116935102A CN 202310793747 A CN202310793747 A CN 202310793747A CN 116935102 A CN116935102 A CN 116935102A
- Authority
- CN
- China
- Prior art keywords
- target detection
- model
- image classification
- loss
- lightweight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 80
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000001514 detection method Methods 0.000 claims abstract description 126
- 238000011478 gradient descent method Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 44
- 238000004364 calculation method Methods 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 17
- 238000003860 storage Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The application relates to a lightweight model training method, a device, equipment and a medium, wherein the training method comprises the following steps: constructing an image classification lightweight model based on the image classification large model, and respectively inputting training data into the image classification large model and the image classification lightweight model to obtain final loss of an image classification task; constructing a target detection lightweight model based on the target detection large model, and respectively inputting training data into the target detection large model and the target detection lightweight model to obtain the final loss of a target detection task; and carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss. The application can improve the effect of the lightweight model.
Description
Technical Field
The application relates to the technical field of machine learning, in particular to a lightweight model training method.
Background
At present, a light model, namely a model with a small parameter and a high reasoning speed, is to be trained, and is usually obtained by designing a light model structure and then optimizing the light model structure according to a real label (group_trunk) by using a loss function (loss_func), so that the effect of the trained light model is often much poorer than that of a large model (large model).
Disclosure of Invention
The application aims to solve the technical problem of providing a lightweight model training method which can improve the effect of a lightweight model.
The technical scheme adopted for solving the technical problems is as follows: the light model training method comprises the following steps:
constructing an image classification lightweight model based on an image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task;
constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;
and carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss to realize joint training of the image classification lightweight model and the target detection lightweight model.
The image classification large model and the target detection large model both employ a softmax_t function, which is the input of the index portion in the original softmax function divided by a constant T.
The training data are respectively input into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task, and the method specifically comprises the following steps:
inputting training data into the image classification large model to obtain a first probability of each category;
inputting the training data into the image classification lightweight model to obtain a second probability of each category;
and calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category and the real classification label.
The calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category and the real classification label specifically comprises the following steps:
performing cross entropy loss function calculation on the first probability of each category and the second probability of each category to obtain a first result;
performing cross entropy loss function calculation on the second probability of each category and the real classification label to obtain a second result;
and calculating the average value of the first result and the second result, and taking the obtained average value as the final loss of the image classification task.
The training data are respectively input into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task, and the method specifically comprises the following steps:
inputting training data into the target detection large model to obtain a first probability of each object category and a first rectangular frame position of each object;
inputting the training data into the target detection lightweight model to obtain a second probability of each object category and a second rectangular frame position of each object;
the final loss of the target detection task is calculated based on the first probability of each object class, the first rectangular box position of each object, the second probability of each object class, the second rectangular box position of each object, and the real target detection label.
The calculating the final loss of the target detection task based on the first probability of each object category, the first rectangular frame position of each object, the second probability of each object category, the second rectangular frame position of each object and the real target detection label specifically comprises the following steps:
performing cross entropy loss function calculation on the first probability of each object class and the second probability of each object class to obtain a first result;
performing cross entropy loss function calculation on the second probability of each object class and the classification sub-label in the real target detection label to obtain a second result;
summing the first result and the second result to obtain a classification loss value;
carrying out a mean square error loss function calculation on the first rectangular frame position of each object and the second rectangular frame position of each object to obtain a third result;
carrying out mean square error loss function calculation on the second rectangular frame position of each object and the position sub-label in the real target detection label to obtain a fourth result;
summing the third result and the fourth result to obtain a position loss value;
and obtaining an average value of the classification loss value and the position loss value, and taking the obtained average value as the final loss of the target detection task.
The technical scheme adopted for solving the technical problems is as follows: provided is a lightweight model training device, comprising:
the image classification training module is used for constructing an image classification lightweight model based on the image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task;
the target detection training module is used for constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;
and the optimization training module is used for carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss so as to realize joint training of the image classification lightweight model and the target detection lightweight model.
The technical scheme adopted for solving the technical problems is as follows: there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the lightweight model training method described above when executing the computer program.
The technical scheme adopted for solving the technical problems is as follows: there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the lightweight model training method described above.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the application has the following advantages and positive effects: according to the application, the lightweight model is trained by utilizing the output probabilities of the real label and the large model, so that the effect of the lightweight model is improved, and in addition, the lightweight model of the target detection and image classification task can be trained and optimized together, so that the training efficiency is improved.
Drawings
Fig. 1 is a flowchart of a first embodiment of a quantization model training method of the present application.
Detailed Description
The application will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present application and are not intended to limit the scope of the present application. Furthermore, it should be understood that various changes and modifications can be made by one skilled in the art after reading the teachings of the present application, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.
A first embodiment of the present application relates to a lightweight model training method, as shown in fig. 1, comprising the steps of:
step 1, constructing an image classification lightweight model based on an image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; and respectively inputting training data into the image classification large model and the image classification lightweight model to obtain the final loss of the image classification task.
The image classification large model and the image classification lightweight model in this step are each composed of 3 parts, namely backbone, class _head and softmax functions, according to the general design of the image classification network.
The parameters of the image classification large model are fixed and only responsible for forward reasoning. Obtaining a feature vector by training an image through a background and a class head of the image classification large model; in this embodiment, the softmax function of the image classification large model is slightly modified, the input of the index part in the original formula is divided by a constant T, the value of T is 100 in this embodiment, and the rest is unchanged, and the softmax function is named as the softmax_t function. The feature vector is subjected to a softmax_t function to obtain a first probability prob_class_large of each category.
The parameter quantity of the backbox of the image classification lightweight model is set to be 1% of the parameter quantity of the image classification large model, the rest is unchanged, and the training image can obtain a second probability prob_class_lite of each category through the image classification lightweight model.
And carrying out cross entropy loss function calculation on the second probability prob_class_lite of each class and the first probability prob_class_large of each class to obtain a first result, and recording the first result as a loss_large value.
And performing cross entropy loss function calculation on the second probability prob_class_lite and the real classification label group_trunk_class of each class to obtain a second result, and marking the second result as a loss_trunk value.
And averaging the first result loss_large and the second result loss_trunk to obtain the final loss loss_final_class of the image classification task.
Step 2, constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; and respectively inputting training data into the target detection large model and the target detection lightweight model to obtain the final loss of the target detection task.
The large target detection model and the light target detection model in the step are mainly composed of 5 parts, namely backbone, class _head, box_head, softmax function and sigmoid function according to the general design of a target detection network.
The target detection large model is the same as the image classification large model, and the parameters are fixed and only responsible for forward reasoning. The training image sequentially passes through backbone, class _head and softmax_T functions of the target detection large model (the softmax_T functions are the same as the design of the image classification large model), so that a first probability prob_class_large of each object type is obtained, and the training image sequentially passes through back box, box_head and sigmoid functions of the target detection large model, so that a first rectangular frame position value_box_large of each object is obtained.
The parameter quantity of the backup in the target detection lightweight model is set to be 1% of the parameter quantity of the target detection large model, the rest is unchanged, and the second probability prob_class_lite of each object class and the second rectangular frame position value_box_lite of each object can be obtained after the training image passes through the target detection lightweight model.
And performing cross entropy loss function calculation on the second probability prob_class_lite of each object class and the first probability prob_class_large of each object class to obtain a first result, and marking the first result as a loss_large_class value.
And performing cross entropy loss function calculation on the second probability prob_class_lite of each object class and the classified sub-label group_trunk_class in the real target detection label to obtain a second result, and marking the second result as a loss_trunk_class value.
And summing the first result loss_large_class and the second result loss_trunk_class to obtain a classification loss value loss_class_final.
And carrying out mean square error loss function calculation on the first rectangular frame position value_box_large of each object and the second rectangular frame position value_box_lite of each object to obtain a third result, and recording the third result as a loss_large_box value.
And carrying out mean square error loss function calculation on the second rectangular frame position value_box_lite of each object and the position sub-label group_trunk_box in the real target detection label to obtain a fourth result, and marking the fourth result as a loss_trunk_box value.
And summing the third result loss_large_box and the fourth result loss_trunk to obtain a position loss value loss_box_final.
The classification loss value loss_class_final and the position loss value loss_box_final are averaged to obtain the final loss loss_final_detection of the target detection task.
And 3, carrying out weighted summation on the final loss loss_final_class of the image classification task and the final loss_final_detection of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss to realize joint training of the image classification lightweight model and the target detection lightweight model.
The formula for weighted summation in this embodiment is: 0.6 x loss_final_class+0.4 x loss_final_detection.
It is easy to find that the application utilizes the output probability of the real label and the large model to train the lightweight model at the same time, and improves the effect of the lightweight model.
A second embodiment of the present application relates to a lightweight model training device, comprising:
the image classification training module is used for constructing an image classification lightweight model based on the image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task;
the target detection training module is used for constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;
and the optimization training module is used for carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss so as to realize joint training of the image classification lightweight model and the target detection lightweight model.
The image classification training module inputs training data into an image classification large model and an image classification lightweight model respectively to obtain final loss of an image classification task, and specifically comprises the following steps:
inputting training data into the image classification large model to obtain a first probability of each category;
inputting the training data into the image classification lightweight model to obtain a second probability of each category;
and calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category and the real classification label.
The calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category and the real classification label specifically comprises the following steps:
performing cross entropy loss function calculation on the first probability of each category and the second probability of each category to obtain a first result;
performing cross entropy loss function calculation on the second probability of each category and the real classification label to obtain a second result;
and calculating the average value of the first result and the second result, and taking the obtained average value as the final loss of the image classification task.
When training data are respectively input into a target detection large model and a target detection lightweight model by the target detection training module to obtain the final loss of a target detection task, the target detection training module specifically comprises the following steps:
inputting training data into the target detection large model to obtain a first probability of each object category and a first rectangular frame position of each object;
inputting the training data into the target detection lightweight model to obtain a second probability of each object category and a second rectangular frame position of each object;
the final loss of the target detection task is calculated based on the first probability of each object class, the first rectangular box position of each object, the second probability of each object class, the second rectangular box position of each object, and the real target detection label.
The calculating the final loss of the target detection task based on the first probability of each object category, the first rectangular frame position of each object, the second probability of each object category, the second rectangular frame position of each object and the real target detection label specifically comprises the following steps:
performing cross entropy loss function calculation on the first probability of each object class and the second probability of each object class to obtain a first result;
performing cross entropy loss function calculation on the second probability of each object class and the classification sub-label in the real target detection label to obtain a second result;
summing the first result and the second result to obtain a classification loss value;
carrying out a mean square error loss function calculation on the first rectangular frame position of each object and the second rectangular frame position of each object to obtain a third result;
carrying out mean square error loss function calculation on the second rectangular frame position of each object and the position sub-label in the real target detection label to obtain a fourth result;
summing the third result and the fourth result to obtain a position loss value;
and obtaining an average value of the classification loss value and the position loss value, and taking the obtained average value as the final loss of the target detection task.
A third embodiment of the application is directed to an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the lightweight model training method of the first embodiment when executing the computer program.
A fourth embodiment of the present application is directed to a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the lightweight model training method of the first embodiment.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (9)
1. The lightweight model training method is characterized by comprising the following steps of:
constructing an image classification lightweight model based on an image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task; constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;
and carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss to realize joint training of the image classification lightweight model and the target detection lightweight model.
2. The lightweight model training method of claim 1, wherein the image classification large model and the target detection large model each employ a softmax_t function, the softmax_t function being the input of the index portion divided by a constant T in the original softmax function.
3. The method for training a lightweight model according to claim 1, wherein the step of inputting training data into the image classification large model and the image classification lightweight model to obtain final loss of the image classification task comprises:
inputting training data into the image classification large model to obtain a first probability of each category;
inputting the training data into the image classification lightweight model to obtain a second probability of each category;
and calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category and the real classification label.
4. A lightweight model training method as defined in claim 3, wherein the calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category, and the true classification label specifically comprises:
performing cross entropy loss function calculation on the first probability of each category and the second probability of each category to obtain a first result;
performing cross entropy loss function calculation on the second probability of each category and the real classification label to obtain a second result; and calculating the average value of the first result and the second result, and taking the obtained average value as the final loss of the image classification task.
5. The method for training a lightweight model according to claim 1, wherein the step of inputting training data into the large target detection model and the lightweight target detection model to obtain a final loss of the target detection task comprises: inputting training data into the target detection large model to obtain a first probability of each object category and a first rectangular frame position of each object;
inputting the training data into the target detection lightweight model to obtain a second probability of each object category and a second rectangular frame position of each object;
the final loss of the target detection task is calculated based on the first probability of each object class, the first rectangular box position of each object, the second probability of each object class, the second rectangular box position of each object, and the real target detection label.
6. The lightweight model training method of claim 5, wherein the calculating the final loss of the target detection task based on the first probability of each object class, the first rectangular box position of each object, the second probability of each object class, the second rectangular box position of each object, and the real target detection label specifically comprises: performing cross entropy loss function calculation on the first probability of each object class and the second probability of each object class to obtain a first result;
performing cross entropy loss function calculation on the second probability of each object class and the classification sub-label in the real target detection label to obtain a second result;
summing the first result and the second result to obtain a classification loss value;
carrying out a mean square error loss function calculation on the first rectangular frame position of each object and the second rectangular frame position of each object to obtain a third result;
carrying out mean square error loss function calculation on the second rectangular frame position of each object and the position sub-label in the real target detection label to obtain a fourth result;
summing the third result and the fourth result to obtain a position loss value;
and obtaining an average value of the classification loss value and the position loss value, and taking the obtained average value as the final loss of the target detection task.
7. A lightweight model training device, comprising:
the image classification training module is used for constructing an image classification lightweight model based on the image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task;
the target detection training module is used for constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;
and the optimization training module is used for carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss so as to realize joint training of the image classification lightweight model and the target detection lightweight model.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the lightweight model training method of any of claims 1-6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the lightweight model training method as claimed in any of the claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310793747.6A CN116935102B (en) | 2023-06-30 | 2023-06-30 | Lightweight model training method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310793747.6A CN116935102B (en) | 2023-06-30 | 2023-06-30 | Lightweight model training method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116935102A true CN116935102A (en) | 2023-10-24 |
CN116935102B CN116935102B (en) | 2024-02-20 |
Family
ID=88383498
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310793747.6A Active CN116935102B (en) | 2023-06-30 | 2023-06-30 | Lightweight model training method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116935102B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160379A (en) * | 2018-11-07 | 2020-05-15 | 北京嘀嘀无限科技发展有限公司 | Training method and device of image detection model and target detection method and device |
WO2020155518A1 (en) * | 2019-02-03 | 2020-08-06 | 平安科技(深圳)有限公司 | Object detection method and device, computer device and storage medium |
WO2021087985A1 (en) * | 2019-11-08 | 2021-05-14 | 深圳市欢太科技有限公司 | Model training method and apparatus, storage medium, and electronic device |
CN114022658A (en) * | 2021-09-17 | 2022-02-08 | 浙江智慧视频安防创新中心有限公司 | Target detection method, device, storage medium and terminal |
CN115758245A (en) * | 2022-11-18 | 2023-03-07 | 上海蜜度信息技术有限公司 | Multi-mode data classification method, device, equipment and storage medium |
-
2023
- 2023-06-30 CN CN202310793747.6A patent/CN116935102B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160379A (en) * | 2018-11-07 | 2020-05-15 | 北京嘀嘀无限科技发展有限公司 | Training method and device of image detection model and target detection method and device |
WO2020155518A1 (en) * | 2019-02-03 | 2020-08-06 | 平安科技(深圳)有限公司 | Object detection method and device, computer device and storage medium |
WO2021087985A1 (en) * | 2019-11-08 | 2021-05-14 | 深圳市欢太科技有限公司 | Model training method and apparatus, storage medium, and electronic device |
CN114424253A (en) * | 2019-11-08 | 2022-04-29 | 深圳市欢太科技有限公司 | Model training method and device, storage medium and electronic equipment |
CN114022658A (en) * | 2021-09-17 | 2022-02-08 | 浙江智慧视频安防创新中心有限公司 | Target detection method, device, storage medium and terminal |
CN115758245A (en) * | 2022-11-18 | 2023-03-07 | 上海蜜度信息技术有限公司 | Multi-mode data classification method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116935102B (en) | 2024-02-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109816032B (en) | Unbiased mapping zero sample classification method and device based on generative countermeasure network | |
CN113947196A (en) | Network model training method and device and computer readable storage medium | |
CN111079532B (en) | Video content description method based on text self-encoder | |
CN113657560B (en) | Weak supervision image semantic segmentation method and system based on node classification | |
CN108229287B (en) | Image recognition method and device, electronic equipment and computer storage medium | |
CN113435509B (en) | Small sample scene classification and identification method and system based on meta-learning | |
CN110362814B (en) | Named entity identification method and device based on improved loss function | |
CN108053454B (en) | Method for generating graph structure data of countermeasure network based on deep convolution generation | |
US20190332894A1 (en) | Method for Processing Automobile Image Data, Apparatus, and Readable Storage Medium | |
CN114359938B (en) | Form identification method and device | |
JP7110929B2 (en) | Knowledge Complementary Program, Knowledge Complementary Method, and Knowledge Complementary Device | |
CN111967271A (en) | Analysis result generation method, device, equipment and readable storage medium | |
CN116402352A (en) | Enterprise risk prediction method and device, electronic equipment and medium | |
CN107729885B (en) | Face enhancement method based on multiple residual error learning | |
CN111950579A (en) | Training method and training device for classification model | |
CN112698977B (en) | Method, device, equipment and medium for positioning server fault | |
CN117788946A (en) | Image processing method, device, electronic equipment and storage medium | |
CN116935102B (en) | Lightweight model training method, device, equipment and medium | |
CN117574262A (en) | Underwater sound signal classification method, system and medium for small sample problem | |
CN110414845B (en) | Risk assessment method and device for target transaction | |
CN110889316B (en) | Target object identification method and device and storage medium | |
CN112861601A (en) | Method for generating confrontation sample and related equipment | |
Li et al. | Automated deep learning system for power line inspection image analysis and processing: Architecture and design issues | |
CN115758245A (en) | Multi-mode data classification method, device, equipment and storage medium | |
CN114547391A (en) | Message auditing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: Room 301AB, No. 10, Lane 198, Zhangheng Road, Free Trade Pilot Zone, Pudong New Area, Shanghai, 200120 Applicant after: Shanghai Mido Technology Co.,Ltd. Address before: Room 301AB, No. 10, Lane 198, Zhangheng Road, Free Trade Pilot Zone, Pudong New Area, Shanghai, 200120 Applicant before: SHANGHAI MDATA INFORMATION TECHNOLOGY Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |