CN116935102B - Lightweight model training method, device, equipment and medium - Google Patents
Lightweight model training method, device, equipment and medium Download PDFInfo
- Publication number
- CN116935102B CN116935102B CN202310793747.6A CN202310793747A CN116935102B CN 116935102 B CN116935102 B CN 116935102B CN 202310793747 A CN202310793747 A CN 202310793747A CN 116935102 B CN116935102 B CN 116935102B
- Authority
- CN
- China
- Prior art keywords
- target detection
- model
- image classification
- loss
- lightweight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000001514 detection method Methods 0.000 claims abstract description 122
- 238000011478 gradient descent method Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 44
- 238000004364 calculation method Methods 0.000 claims description 32
- 238000004590 computer program Methods 0.000 claims description 17
- 238000003860 storage Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a lightweight model training method, a device, equipment and a medium, wherein the training method comprises the following steps: constructing an image classification lightweight model based on the image classification large model, and respectively inputting training data into the image classification large model and the image classification lightweight model to obtain final loss of an image classification task; constructing a target detection lightweight model based on the target detection large model, and respectively inputting training data into the target detection large model and the target detection lightweight model to obtain the final loss of a target detection task; and carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss. The invention can improve the effect of the lightweight model.
Description
Technical Field
The invention relates to the technical field of machine learning, in particular to a lightweight model training method.
Background
At present, a light model, namely a model with a small parameter and a high reasoning speed, is to be trained, and is usually obtained by designing a light model structure and then optimizing the light model structure according to a real label (group_trunk) by using a loss function (loss_func), so that the effect of the trained light model is often much poorer than that of a large model (large model).
Disclosure of Invention
The invention aims to solve the technical problem of providing a lightweight model training method which can improve the effect of a lightweight model.
The technical scheme adopted for solving the technical problems is as follows: the light model training method comprises the following steps:
constructing an image classification lightweight model based on an image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task;
constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;
and carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss to realize joint training of the image classification lightweight model and the target detection lightweight model.
The image classification large model and the target detection large model both employ a softmax_t function, which is the input of the index portion in the original softmax function divided by a constant T.
The training data are respectively input into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task, and the method specifically comprises the following steps:
inputting training data into the image classification large model to obtain a first probability of each category;
inputting the training data into the image classification lightweight model to obtain a second probability of each category;
and calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category and the real classification label.
The calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category and the real classification label specifically comprises the following steps:
performing cross entropy loss function calculation on the first probability of each category and the second probability of each category to obtain a first result;
performing cross entropy loss function calculation on the second probability of each category and the real classification label to obtain a second result;
and calculating the average value of the first result and the second result, and taking the obtained average value as the final loss of the image classification task.
The training data are respectively input into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task, and the method specifically comprises the following steps:
inputting training data into the target detection large model to obtain a first probability of each object category and a first rectangular frame position of each object;
inputting the training data into the target detection lightweight model to obtain a second probability of each object category and a second rectangular frame position of each object;
the final loss of the target detection task is calculated based on the first probability of each object class, the first rectangular box position of each object, the second probability of each object class, the second rectangular box position of each object, and the real target detection label.
The calculating the final loss of the target detection task based on the first probability of each object category, the first rectangular frame position of each object, the second probability of each object category, the second rectangular frame position of each object and the real target detection label specifically comprises the following steps:
performing cross entropy loss function calculation on the first probability of each object class and the second probability of each object class to obtain a first result;
performing cross entropy loss function calculation on the second probability of each object class and the classification sub-label in the real target detection label to obtain a second result;
summing the first result and the second result to obtain a classification loss value;
carrying out a mean square error loss function calculation on the first rectangular frame position of each object and the second rectangular frame position of each object to obtain a third result;
carrying out mean square error loss function calculation on the second rectangular frame position of each object and the position sub-label in the real target detection label to obtain a fourth result;
summing the third result and the fourth result to obtain a position loss value;
and obtaining an average value of the classification loss value and the position loss value, and taking the obtained average value as the final loss of the target detection task.
The technical scheme adopted for solving the technical problems is as follows: provided is a lightweight model training device, comprising:
the image classification training module is used for constructing an image classification lightweight model based on the image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task;
the target detection training module is used for constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;
and the optimization training module is used for carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss so as to realize joint training of the image classification lightweight model and the target detection lightweight model.
The technical scheme adopted for solving the technical problems is as follows: there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the lightweight model training method described above when executing the computer program.
The technical scheme adopted for solving the technical problems is as follows: there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the lightweight model training method described above.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: according to the invention, the lightweight model is trained by utilizing the output probabilities of the real label and the large model, so that the effect of the lightweight model is improved, and in addition, the lightweight model of the target detection and image classification task can be trained and optimized together, so that the training efficiency is improved.
Drawings
Fig. 1 is a flowchart of a first embodiment of a quantization model training method of the present invention.
Detailed Description
The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.
A first embodiment of the present invention relates to a lightweight model training method, as shown in fig. 1, comprising the steps of:
step 1, constructing an image classification lightweight model based on an image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; and respectively inputting training data into the image classification large model and the image classification lightweight model to obtain the final loss of the image classification task.
The image classification large model and the image classification lightweight model in this step are each composed of 3 parts, namely backbone, class _head and softmax functions, according to the general design of the image classification network.
The parameters of the image classification large model are fixed and only responsible for forward reasoning. Obtaining a feature vector by training an image through a background and a class head of the image classification large model; in this embodiment, the softmax function of the image classification large model is slightly modified, the input of the index part in the original formula is divided by a constant T, the value of T is 100 in this embodiment, and the rest is unchanged, and the softmax function is named as the softmax_t function. The feature vector is subjected to a softmax_t function to obtain a first probability prob_class_large of each category.
The parameter quantity of the backbox of the image classification lightweight model is set to be 1% of the parameter quantity of the image classification large model, the rest is unchanged, and the training image can obtain a second probability prob_class_lite of each category through the image classification lightweight model.
And carrying out cross entropy loss function calculation on the second probability prob_class_lite of each class and the first probability prob_class_large of each class to obtain a first result, and recording the first result as a loss_large value.
And performing cross entropy loss function calculation on the second probability prob_class_lite and the real classification label group_trunk_class of each class to obtain a second result, and marking the second result as a loss_trunk value.
And averaging the first result loss_large and the second result loss_trunk to obtain the final loss loss_final_class of the image classification task.
Step 2, constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; and respectively inputting training data into the target detection large model and the target detection lightweight model to obtain the final loss of the target detection task.
The large target detection model and the light target detection model in the step are mainly composed of 5 parts, namely backbone, class _head, box_head, softmax function and sigmoid function according to the general design of a target detection network.
The target detection large model is the same as the image classification large model, and the parameters are fixed and only responsible for forward reasoning. The training image sequentially passes through backbone, class _head and softmax_T functions of the target detection large model (the softmax_T functions are the same as the design of the image classification large model), so that a first probability prob_class_large of each object type is obtained, and the training image sequentially passes through back box, box_head and sigmoid functions of the target detection large model, so that a first rectangular frame position value_box_large of each object is obtained.
The parameter quantity of the backup in the target detection lightweight model is set to be 1% of the parameter quantity of the target detection large model, the rest is unchanged, and the second probability prob_class_lite of each object class and the second rectangular frame position value_box_lite of each object can be obtained after the training image passes through the target detection lightweight model.
And performing cross entropy loss function calculation on the second probability prob_class_lite of each object class and the first probability prob_class_large of each object class to obtain a first result, and marking the first result as a loss_large_class value.
And performing cross entropy loss function calculation on the second probability prob_class_lite of each object class and the classified sub-label group_trunk_class in the real target detection label to obtain a second result, and marking the second result as a loss_trunk_class value.
And summing the first result loss_large_class and the second result loss_trunk_class to obtain a classification loss value loss_class_final.
And carrying out mean square error loss function calculation on the first rectangular frame position value_box_large of each object and the second rectangular frame position value_box_lite of each object to obtain a third result, and recording the third result as a loss_large_box value.
And carrying out mean square error loss function calculation on the second rectangular frame position value_box_lite of each object and the position sub-label group_trunk_box in the real target detection label to obtain a fourth result, and marking the fourth result as a loss_trunk_box value.
And summing the third result loss_large_box and the fourth result loss_trunk to obtain a position loss value loss_box_final.
The classification loss value loss_class_final and the position loss value loss_box_final are averaged to obtain the final loss loss_final_detection of the target detection task.
And 3, carrying out weighted summation on the final loss loss_final_class of the image classification task and the final loss_final_detection of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss to realize joint training of the image classification lightweight model and the target detection lightweight model.
The formula for weighted summation in this embodiment is: 0.6 x loss_final_class+0.4 x loss_final_detection.
It is easy to find that the invention utilizes the output probability of the real label and the large model to train the lightweight model at the same time, and improves the effect of the lightweight model.
A second embodiment of the present invention relates to a lightweight model training device, comprising:
the image classification training module is used for constructing an image classification lightweight model based on the image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task;
the target detection training module is used for constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;
and the optimization training module is used for carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss so as to realize joint training of the image classification lightweight model and the target detection lightweight model.
The image classification training module inputs training data into an image classification large model and an image classification lightweight model respectively to obtain final loss of an image classification task, and specifically comprises the following steps:
inputting training data into the image classification large model to obtain a first probability of each category;
inputting the training data into the image classification lightweight model to obtain a second probability of each category;
and calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category and the real classification label.
The calculating the final loss of the image classification task based on the first probability of each category, the second probability of each category and the real classification label specifically comprises the following steps:
performing cross entropy loss function calculation on the first probability of each category and the second probability of each category to obtain a first result;
performing cross entropy loss function calculation on the second probability of each category and the real classification label to obtain a second result;
and calculating the average value of the first result and the second result, and taking the obtained average value as the final loss of the image classification task.
When training data are respectively input into a target detection large model and a target detection lightweight model by the target detection training module to obtain the final loss of a target detection task, the target detection training module specifically comprises the following steps:
inputting training data into the target detection large model to obtain a first probability of each object category and a first rectangular frame position of each object;
inputting the training data into the target detection lightweight model to obtain a second probability of each object category and a second rectangular frame position of each object;
the final loss of the target detection task is calculated based on the first probability of each object class, the first rectangular box position of each object, the second probability of each object class, the second rectangular box position of each object, and the real target detection label.
The calculating the final loss of the target detection task based on the first probability of each object category, the first rectangular frame position of each object, the second probability of each object category, the second rectangular frame position of each object and the real target detection label specifically comprises the following steps:
performing cross entropy loss function calculation on the first probability of each object class and the second probability of each object class to obtain a first result;
performing cross entropy loss function calculation on the second probability of each object class and the classification sub-label in the real target detection label to obtain a second result;
summing the first result and the second result to obtain a classification loss value;
carrying out a mean square error loss function calculation on the first rectangular frame position of each object and the second rectangular frame position of each object to obtain a third result;
carrying out mean square error loss function calculation on the second rectangular frame position of each object and the position sub-label in the real target detection label to obtain a fourth result;
summing the third result and the fourth result to obtain a position loss value;
and obtaining an average value of the classification loss value and the position loss value, and taking the obtained average value as the final loss of the target detection task.
A third embodiment of the invention is directed to an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the lightweight model training method of the first embodiment when executing the computer program.
A fourth embodiment of the present invention is directed to a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the lightweight model training method of the first embodiment.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the invention can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (5)
1. The lightweight model training method is characterized by comprising the following steps of:
constructing an image classification lightweight model based on an image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task; the training data are respectively input into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task, and the method specifically comprises the following steps:
inputting training data into the image classification large model to obtain a first probability of each category;
inputting the training data into the image classification lightweight model to obtain a second probability of each category;
performing cross entropy loss function calculation on the first probability of each category and the second probability of each category to obtain a first result;
performing cross entropy loss function calculation on the second probability of each category and the real classification label to obtain a second result; calculating the average value of the first result and the second result, and taking the obtained average value as the final loss of the image classification task;
constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;
the training data are respectively input into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task, and the method specifically comprises the following steps:
inputting training data into the target detection large model to obtain a first probability of each object category and a first rectangular frame position of each object;
inputting the training data into the target detection lightweight model to obtain a second probability of each object category and a second rectangular frame position of each object;
performing cross entropy loss function calculation on the first probability of each object class and the second probability of each object class to obtain a first calculation result;
performing cross entropy loss function calculation on the second probability of each object class and the classification sub-label in the real target detection label to obtain a second calculation result;
summing the first calculation result and the second calculation result to obtain a classification loss value;
carrying out mean square error loss function calculation on the first rectangular frame position of each object and the second rectangular frame position of each object to obtain a third calculation result;
carrying out mean square error loss function calculation on the second rectangular frame position of each object and the position sub-label in the real target detection label to obtain a fourth calculation result;
summing the third calculation result and the fourth calculation result to obtain a position loss value;
calculating the average value of the classification loss value and the position loss value, and taking the obtained average value as the final loss of the target detection task;
and carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss to realize joint training of the image classification lightweight model and the target detection lightweight model.
2. The lightweight model training method of claim 1, wherein the image classification large model and the target detection large model each employ a softmax_t function, the softmax_t function being the input of the index portion divided by a constant T in the original softmax function.
3. A lightweight model training apparatus, characterized by applying the lightweight model training method as claimed in any one of claims 1-2, comprising:
the image classification training module is used for constructing an image classification lightweight model based on the image classification large model, wherein the parameter quantity of a main network part of the image classification lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the image classification large model; respectively inputting training data into an image classification large model and an image classification lightweight model to obtain final loss of an image classification task;
the target detection training module is used for constructing a target detection lightweight model based on a target detection large model, wherein the parameter quantity of a main network part of the target detection lightweight model is set to be a preset percentage of the parameter quantity of the main network part of the target detection large model; respectively inputting training data into a target detection large model and a target detection lightweight model to obtain the final loss of a target detection task;
and the optimization training module is used for carrying out weighted summation on the final loss of the image classification task and the final loss of the target detection task to obtain joint task loss, and updating parameters in the image classification lightweight model and the target detection lightweight model by adopting a gradient descent method based on the joint task loss so as to realize joint training of the image classification lightweight model and the target detection lightweight model.
4. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the lightweight model training method of any of claims 1-2 when the computer program is executed.
5. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the lightweight model training method as claimed in any of the claims 1-2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310793747.6A CN116935102B (en) | 2023-06-30 | 2023-06-30 | Lightweight model training method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310793747.6A CN116935102B (en) | 2023-06-30 | 2023-06-30 | Lightweight model training method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116935102A CN116935102A (en) | 2023-10-24 |
CN116935102B true CN116935102B (en) | 2024-02-20 |
Family
ID=88383498
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310793747.6A Active CN116935102B (en) | 2023-06-30 | 2023-06-30 | Lightweight model training method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116935102B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160379A (en) * | 2018-11-07 | 2020-05-15 | 北京嘀嘀无限科技发展有限公司 | Training method and device of image detection model and target detection method and device |
WO2020155518A1 (en) * | 2019-02-03 | 2020-08-06 | 平安科技(深圳)有限公司 | Object detection method and device, computer device and storage medium |
WO2021087985A1 (en) * | 2019-11-08 | 2021-05-14 | 深圳市欢太科技有限公司 | Model training method and apparatus, storage medium, and electronic device |
CN114022658A (en) * | 2021-09-17 | 2022-02-08 | 浙江智慧视频安防创新中心有限公司 | Target detection method, device, storage medium and terminal |
CN115758245A (en) * | 2022-11-18 | 2023-03-07 | 上海蜜度信息技术有限公司 | Multi-mode data classification method, device, equipment and storage medium |
-
2023
- 2023-06-30 CN CN202310793747.6A patent/CN116935102B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160379A (en) * | 2018-11-07 | 2020-05-15 | 北京嘀嘀无限科技发展有限公司 | Training method and device of image detection model and target detection method and device |
WO2020155518A1 (en) * | 2019-02-03 | 2020-08-06 | 平安科技(深圳)有限公司 | Object detection method and device, computer device and storage medium |
WO2021087985A1 (en) * | 2019-11-08 | 2021-05-14 | 深圳市欢太科技有限公司 | Model training method and apparatus, storage medium, and electronic device |
CN114424253A (en) * | 2019-11-08 | 2022-04-29 | 深圳市欢太科技有限公司 | Model training method and device, storage medium and electronic equipment |
CN114022658A (en) * | 2021-09-17 | 2022-02-08 | 浙江智慧视频安防创新中心有限公司 | Target detection method, device, storage medium and terminal |
CN115758245A (en) * | 2022-11-18 | 2023-03-07 | 上海蜜度信息技术有限公司 | Multi-mode data classification method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116935102A (en) | 2023-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109816032B (en) | Unbiased mapping zero sample classification method and device based on generative countermeasure network | |
CN113947196A (en) | Network model training method and device and computer readable storage medium | |
CN111079532A (en) | Video content description method based on text self-encoder | |
CN110362814B (en) | Named entity identification method and device based on improved loss function | |
US11449707B2 (en) | Method for processing automobile image data, apparatus, and readable storage medium | |
CN114359938B (en) | Form identification method and device | |
CN108053454A (en) | A kind of graph structure data creation method that confrontation network is generated based on depth convolution | |
JP7110929B2 (en) | Knowledge Complementary Program, Knowledge Complementary Method, and Knowledge Complementary Device | |
CN113741886A (en) | Statement level program repairing method and system based on graph | |
CN116402352A (en) | Enterprise risk prediction method and device, electronic equipment and medium | |
CN107729885B (en) | Face enhancement method based on multiple residual error learning | |
CN116594601A (en) | Pre-training large model code generation method based on knowledge base and multi-step prompt | |
CN111461353A (en) | Model training method and system | |
CN111950579A (en) | Training method and training device for classification model | |
CN113705402A (en) | Video behavior prediction method, system, electronic device and storage medium | |
CN116935102B (en) | Lightweight model training method, device, equipment and medium | |
CN116152609B (en) | Distributed model training method, system, device and computer readable medium | |
CN110889316B (en) | Target object identification method and device and storage medium | |
CN117574262A (en) | Underwater sound signal classification method, system and medium for small sample problem | |
JPWO2019180868A1 (en) | Image generator, image generator and image generator | |
CN112861601A (en) | Method for generating confrontation sample and related equipment | |
Li et al. | Automated deep learning system for power line inspection image analysis and processing: Architecture and design issues | |
CN114547391A (en) | Message auditing method and device | |
CN112698977B (en) | Method, device, equipment and medium for positioning server fault | |
CN112288032B (en) | Method and device for quantitative model training based on generation of confrontation network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Room 301AB, No. 10, Lane 198, Zhangheng Road, Free Trade Pilot Zone, Pudong New Area, Shanghai, 200120 Applicant after: Shanghai Mido Technology Co.,Ltd. Address before: Room 301AB, No. 10, Lane 198, Zhangheng Road, Free Trade Pilot Zone, Pudong New Area, Shanghai, 200120 Applicant before: SHANGHAI MDATA INFORMATION TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |