CN116977791A - Multi-task model training method, task prediction method, device, computer equipment and medium - Google Patents
Multi-task model training method, task prediction method, device, computer equipment and medium Download PDFInfo
- Publication number
- CN116977791A CN116977791A CN202310961694.4A CN202310961694A CN116977791A CN 116977791 A CN116977791 A CN 116977791A CN 202310961694 A CN202310961694 A CN 202310961694A CN 116977791 A CN116977791 A CN 116977791A
- Authority
- CN
- China
- Prior art keywords
- task
- head
- training
- training set
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 215
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000001514 detection method Methods 0.000 claims abstract description 87
- 238000004590 computer program Methods 0.000 claims description 15
- 238000003860 storage Methods 0.000 claims description 13
- 230000002776 aggregation Effects 0.000 claims description 10
- 238000004220 aggregation Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 18
- 230000006870 function Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides a multi-task model training method, a task prediction method, a device, computer equipment and a medium, and belongs to the technical field of data processing. The multi-task model comprises a shared network, a target detection head and at least one task head, and the training method of the multi-task model comprises the following steps: acquiring a plurality of image samples; combining the image samples with the label result of the same task head to obtain a training set of each task head; inputting each training set into a shared network to obtain a feature map of an image sample; inputting the feature map of the image sample to a target detection head to obtain a feature map of a target to be detected; and inputting the feature map of the target to be detected into a task head with the same name as the training set aiming at each training set, and iteratively training the multi-task model. The image samples in the training set are input to the corresponding task heads, data iteration is carried out on the task heads, and then the trained multi-task model can be utilized to obtain accurate target detection results and classification prediction results.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a multi-task model training method, a task prediction device, computer equipment and a medium.
Background
With the rapid development of deep learning technology, an algorithm multitasking model is generally required to complete all tasks of detecting and classifying targets, i.e., the algorithm multitasking model needs to detect the positions of the targets based on the acquired global images first, and then classify the attribute information of the targets based on the positions of the targets. Specifically, in the automatic driving scenario, if the vehicle is targeted, an algorithm multitasking model is required to identify the position of the vehicle, and attribute information such as the type, shielding condition, and head direction of the vehicle is classified. If the driver in the vehicle is targeted, the algorithm multitasking model is required to identify the position of the driver, and the attribute information such as wearing and sitting posture of the driver is identified. The algorithmic multitasking model typically used to implement detection and classification of targets is typically a multitasking model comprising a plurality of task heads. A task head of the existing multi-task model is used for identifying the position of a target to obtain a target detection result. The rest task heads are used for classifying the attribute information of the targets to obtain a classification prediction result.
When the multi-task model is trained, the image samples in the training set are required to be input to all task heads, and further massive image samples are required to comprise label results for labeling output results of all task heads so as to complete training of a plurality of task heads. However, when the task head of the multitasking model is changed due to the addition of a task head, the image sample lacks the label result required by the changed task head. When the image sample does not include the required label result of the task head, the task head cannot be subjected to data iteration according to the deviation degree of the predicted result output by the task head and the real result corresponding to the label result, so that the performance of the multi-task model is poor. When the multi-task model is used for detecting and classifying the targets, accurate results cannot be obtained.
Disclosure of Invention
The invention aims to provide a multi-task model training method to solve the problem that an accurate result cannot be obtained when a multi-task model is used for detecting and classifying targets in the prior art; second, a task prediction method is provided; thirdly, providing a multi-task model training device; the fourth object is to provide a task prediction device; a fifth object is to provide a computer device; a sixth object is to provide a machine readable storage medium.
In order to achieve the above purpose, the technical scheme adopted by the application is as follows:
in a first aspect, the present application provides a method for training a multitasking model, where the multitasking model includes a shared network, a target detection head and at least one task head, the method for training a multitasking model includes:
acquiring a plurality of image samples, wherein the image samples comprise label results of at least one task head;
combining the image samples with the label result of the same task head to obtain a training set of each task head, wherein the name of each training set is the same as the name of the corresponding task head;
inputting each training set into a shared network to obtain a feature map of an image sample;
inputting the feature map of the image sample to a target detection head to obtain a feature map of a target to be detected;
and inputting the feature map of the target to be detected into a task head with the same name as the training set aiming at each training set, and iteratively training the multi-task model.
With reference to the first aspect, in a first possible implementation manner, the step of constructing the multitasking model includes:
acquiring the name of each training set;
constructing a task head with the same name as the training set aiming at each training set;
And constructing a multi-task model based on the shared network, the target detection heads and all task heads.
With reference to the first aspect, in a second possible implementation manner, for each training set, inputting a feature map of an object to be detected to a task head with the same name as that of the training set, and iteratively training a multi-task model, including:
inputting a feature map of a target to be detected into task heads with the same name as the training set aiming at each training set to obtain the loss of each task head;
based on the loss of each task head, sequentially carrying out data iteration on each task head;
and under the condition that all task heads finish data iteration, executing the step of inputting a feature map of a target to be detected into the task heads with the same names as the training sets aiming at each training set to obtain the loss of each task head until the loss of each task head is smaller than a preset loss threshold value.
With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner, the loss of each task head includes a regression loss of the target detection head, a classification loss of the target detection head, and a classification loss of the task head.
With reference to the first aspect, in a fourth possible implementation manner, the shared network further includes a backbone network and an intermediate network, and each training set is input to the shared network to obtain a feature map of the image sample, where the feature map includes:
Inputting the training set into a backbone network to obtain an initial image sample feature map;
and inputting the initial image sample feature map into an intermediate network to obtain the image sample feature map with updated scale.
With reference to the first aspect, in a fifth possible implementation manner, each task header includes a feature aggregation layer and a task network, and for each training set, inputting a feature map of an object to be detected to a task header with the same name as that of the training set, and iteratively training a multitasking model, including:
inputting the feature images of the targets to be detected into a feature aggregation layer of a task head with the same name as the training set aiming at each training set to obtain feature images of the targets to be detected with preset scales;
inputting a feature map of a target to be detected with a preset scale into a task network, and iteratively training a multi-task model.
In a second aspect, the present application provides a task prediction method, the task prediction method including:
acquiring a target image;
inputting the target image into a multi-task model to obtain a target detection result and a prediction result of at least one task corresponding to the target detection result, wherein the multi-task model is obtained according to the training method of the multi-task model according to the first aspect.
In a third aspect, the present application provides a training apparatus for a multitasking model, the multitasking model including a target detection head and at least one task head, the training apparatus for a multitasking model including:
the image sample acquisition module is used for acquiring a plurality of image samples, wherein the image samples comprise label results of at least one task head;
the training set obtaining module is used for merging the image samples with the label result being the same task head to obtain a training set of each task head, wherein the name of each training set is the same as the name of the corresponding task head;
the sample feature map obtaining module is used for inputting each training set into the shared network to obtain a feature map of the image sample;
the target feature map obtaining module is used for inputting the feature map of the image sample to the target detection head to obtain the feature map of the target to be detected;
the model iteration training module is used for inputting the feature images of the targets to be detected into task heads with the same names as the training sets aiming at each training set, and iteratively training the multi-task model.
In a fourth aspect, the present application provides a task prediction apparatus, comprising:
the image acquisition module is used for acquiring a target image;
The prediction result obtaining module is configured to input the target image into a multitasking model, and obtain a target detection result and a prediction result of at least one task corresponding to the target detection result, where the multitasking model is obtained according to the training method of the multitasking model as in the first aspect.
In a fifth aspect, the present application provides a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, implements a training method of a multitasking model as in the first aspect, or implements a task prediction method as in the second aspect.
In a sixth aspect, the present application provides a machine-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a training method for a multitasking model as in the first aspect, or implements a task prediction method as in the second aspect.
The application has the beneficial effects that:
(1) And inputting the feature map of the target to be detected in the training set to a task head with the same name as the training set aiming at each training set. Because the image samples in the training set are not input to all task heads, the image samples in the training set are input to the corresponding task heads according to the names of the training set and the task head names, and the feature graphs of the image samples input to the task heads have the label results of iterating the task heads. According to the deviation degree of the predicted result and the real result, the task head can be subjected to data iteration, and further the trained multi-task model can be utilized to obtain an accurate target detection result and a classification predicted result.
(2) When target detection results of a plurality of targets to be detected are required to be output, the target detection results and the classification prediction results are independently output by utilizing a plurality of task heads of a model in the prior art, and the classification prediction results of the targets to be detected can be obtained only by matching the classification prediction results with the target detection results. According to the application, the target to be detected in the image is determined by utilizing the target detection head, and then the feature map of the target to be detected in the training set is input to the task head, so that the task head can directly output the classification prediction result of the target to be detected, and the classification prediction result is not required to be matched with the target detection result.
Drawings
The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain, without limitation, the embodiments of the application. In the drawings:
FIG. 1 shows a flowchart of a training method of a multitasking model provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of a multi-task model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a task head according to an embodiment of the present application;
FIG. 4 shows a flow chart of a task prediction method provided by an embodiment of the present application;
FIG. 5 shows a schematic structural diagram of a training device for a multi-task model according to an embodiment of the present application;
fig. 6 shows a schematic structural diagram of a task prediction apparatus according to an embodiment of the present application.
Detailed Description
The following describes the embodiments of the present application in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the application, are not intended to limit the application.
The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
The terms "comprises," "comprising," "including," or any other variation thereof, are intended to cover a specific feature, number, step, operation, element, component, or combination of the foregoing, which may be used in various embodiments of the present application, and are not intended to first exclude the presence of or increase the likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the application belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is the same as the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments of the application.
Example 1
Referring to fig. 1, fig. 1 is a flowchart illustrating a training method of a multi-task model according to an embodiment of the present application. The multi-task model comprises a shared network, a target detection head and at least one task head, and the training method of the multi-task model in fig. 1 comprises the following steps:
s110, acquiring a plurality of image samples, wherein the image samples comprise label results of at least one task head.
For ease of understanding, the object to be detected of the multitasking model in the embodiment of the present application is a vehicle. The target detection result output by the target detection head is the position of the vehicle in the image, and the prediction result output by each task head is the classification result of the vehicle, wherein the classification result of the vehicle can be the color of the vehicle, the model of the vehicle and the like, and the description is omitted herein. The image samples are used to train the multitasking model so that the trained multitasking model outputs the desired prediction results. When the multi-task model is trained iteratively, the multi-task model determines whether to output a correct prediction result of the image sample according to the label result of the image sample, and a plurality of image samples are obtained. It should be understood that a plurality of image samples may be directly constructed, or the image samples may be collected from different data sources, which will not be described in detail herein.
Because of the inefficiency of constructing multiple image samples, it is often necessary to collect image samples from different data sources to train the multitasking model with a huge number of image samples. When image samples are collected from different data sources, the image samples include label results for at least one task header. Taking the example that the multitasking model comprises a first task head and a second task head, the image sample can comprise the label result of the first task head, the label result of the second task head, the label result of the first task head and the label result of the second task head, and the image sample is not required to comprise the label results of all task heads.
S120, merging the image samples with the label result being the same task head to obtain training sets of each task head, wherein the name of each training set is the same as the name of the corresponding task head.
When acquiring image samples, image samples of different data sources may be acquired. And merging the image samples of the data sources of the same task, namely merging the image samples with the label result being the same task head to obtain a training set of each task head. The number of the obtained training sets is the same as the number of the task heads, and the name of each training set is determined. The name of each training set is the same as the name of the corresponding task head, so that the task head corresponding to the training set is determined through the name of the training set and the name of the task head. The names of the training sets are set according to actual requirements, and can be any character, and are not limited herein.
And S130, inputting each training set into a shared network to obtain a feature map of the image sample.
Typically, the image sample is an image in the form of a photograph or the like, and the multitasking model cannot identify the image sample. And inputting each training set into a sharing network, and extracting the characteristics of the icon samples through the sharing network to obtain the characteristic diagrams of the image samples. The feature map of the image sample may include color features, texture features, shape features, spatial relationship features, and the like of the pattern sample, which are not described herein.
In an embodiment of the present application, the shared network further includes a backbone network and a neck network, and each training set is input to the shared network to obtain a feature map of the image sample, including:
inputting the training set into a backbone network to obtain a feature map of an initial image sample;
and inputting the feature map of the initial image sample into the neck network to obtain the feature map of the image sample with the updated scale.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a multi-task model according to an embodiment of the present application.
The multitasking model in the embodiment of the present application is a network structure consisting of a backbone (backbone) network, a neck (neg) network, and a task header 240 (head). The backbone network 210 is used for extracting features of the image samples, and inputting the training set to the backbone network 210 to obtain a feature map of the initial image samples. The backbone network 210 and the neck network 220 form a shared network, and the neck network 220 is disposed between the backbone network 210 and the task head 240, so as to further improve feature diversity and feature robustness of the feature map of the image sample. The feature map of the initial image sample is input to the neck network 220 to obtain a feature map of the updated scale image sample. The target detection head 230 is configured to map a position of a target to be detected to a feature map of the image sample, so as to obtain a feature map of the target to be detected. The task heads 240 are configured to output a prediction result based on a feature map of an object to be detected, and only two task heads 240 are shown in the figure for ease of understanding.
The structure of the backbone network 210 is set according to actual requirements, and may be a Resnet (residual network) structure or a dark net (dark net) structure, which is not limited herein. The structure of the neck network 220 is also set according to practical requirements, and may be a FPN (Feature pyramid network, feature pyramid) structure, etc., which is not limited herein.
S140, inputting the feature map of the image sample to the target detection head to obtain the feature map of the target to be detected.
When the object to be detected is a vehicle, there is a situation that a plurality of images of the vehicles exist in one pattern sample, the position of each vehicle in the image sample needs to be identified, and then the prediction result of at least one task head of each vehicle is output. Inputting the feature map of the image sample to a target detection head, and determining the position of each target to be detected to obtain the feature map of the target to be detected.
The target detection head may generate a plurality of candidate boxes of different sizes and proportions, all of which cover all positions and scales of the feature map of the image sample. And determining each candidate frame as a reference frame in turn, and acquiring a target frame with the intersection ratio with the reference frame being greater than a threshold value. And determining the position of the target to be detected through the offset of the target frame. The target detection head also positions a plurality of key points, forms a detection frame through the key points, or detects the central area and boundary information of the target to be detected between the key points so as to determine the position of the target to be detected, and can also determine the position of the target to be detected in other modes, which is not described herein. In the embodiment of the application, the target detection head is used for outputting the position of the target to be detected, mapping the position of the target to be detected to the characteristic diagram of the image sample, outputting the target detection frame of the target to be detected, and further obtaining the characteristic diagram of the target to be detected.
It should be understood that the target detection head also needs to iterate data through the training set to identify the position of the target to be detected in the icon sample. The target detection head may be subjected to data iteration by a training set including the label result as the target position to be detected, which is not described herein.
S150, inputting the feature map of the target to be detected into a task head with the same name as the training set aiming at each training set, and iteratively training the multi-task model.
In the embodiment of the application, aiming at each training set, a feature map of a target to be detected in the training set is input to a task head with the same name as the training set. Because the image samples in the training set are not input to all task heads, the image samples in the training set are input to the corresponding task heads according to the names of the training set and the task head names, and the feature graphs of the image samples input to the task heads have the label results of iterating the task heads. According to the deviation degree of the predicted result and the real result, the task head can be subjected to data iteration, and further the trained multi-task model can be utilized to obtain an accurate target detection result and a classification predicted result.
In an embodiment of the present application, the steps of constructing the multitasking model include:
Acquiring the name of each training set;
constructing a task head with the same name as the training set aiming at each training set;
and constructing a multi-task model based on the shared network, the target detection heads and all task heads.
When the object to be detected is detected and classified by utilizing the multitasking, the number of task heads of the multitasking model is changed according to the requirement. For example, when classifying the model and the color of the vehicle, a multitasking model of two task heads needs to be constructed, one task head is used for outputting the prediction result of the model of the vehicle, and the other task head is used for outputting the prediction result of the color of the vehicle. When classifying the gender, the gesture and the shielding state of the pedestrian, a multi-task model of three task heads needs to be constructed, wherein the first task head is used for outputting the prediction result of the gender of the pedestrian, the second task head is used for outputting the prediction result of the gesture of the pedestrian, and the third task head is used for outputting the prediction result of the shielding state of the pedestrian.
In the prior art, it is generally necessary to construct a finished multitasking model, and then acquire an image sample including the label results of all task heads. In the embodiment of the application, since the image sample only needs to include the image sample of at least one task head, the multi-task model can be built after the image sample is acquired. Specifically, after the training set of each task head is obtained, the name of each training set is obtained. Because the feature images of the image samples in each training set are correspondingly input into one task head, the task heads with the same names as the training sets are constructed aiming at each training set, and a multi-task model is constructed based on the shared network, the target detection heads and all the task heads. Because the name of each training set is the same as the name of the corresponding task head, the number of task heads can be determined according to the number of the training sets, and then a plurality of training sets are adapted to construct a multi-task model. When a new predicted result category needs to be added, a training set corresponding to the new predicted result category is acquired, so that the task head of the multi-task model can be adaptively added, and the efficiency of constructing the multi-task model is improved.
In the embodiment of the application, for each training set, a feature map of a target to be detected is input to a task head with the same name as that of the training set, and a multi-task model is trained iteratively, which comprises the following steps:
inputting a feature map of a target to be detected into task heads with the same name as the training set aiming at each training set to obtain the loss of each task head;
based on the loss of each task head, sequentially carrying out data iteration on each task head;
and under the condition that all task heads finish data iteration, executing the step of inputting a feature map of a target to be detected into the task heads with the same names as the training sets aiming at each training set to obtain the loss of each task head until the loss of each task head is smaller than a preset loss threshold value.
And inputting the feature map of the target to be detected into a task head with the same name as the training set aiming at each training set. And determining the deviation degree of the predicted result output by the task head and the real result corresponding to the label result according to the label result corresponding to the task head included in each image sample in the training set, and further obtaining the loss of each task head. Based on the loss of each task head, data iteration is performed on each task head in turn to minimize the loss of the task.
Multiple iterations of task heads of the multitasking model are typically required, such that the penalty of each task head is less than a preset penalty threshold. And under the condition that all task heads complete data iteration, executing the step of inputting a feature map of a target to be detected into the task heads with the same names as the training sets aiming at each training set to obtain the loss of each task head, wherein the loss of each task head is smaller than a preset loss threshold value.
The task head can be iterated for a large number of times so that the loss of the task head is smaller than a preset loss threshold. The method can also detect the loss of the task head after the task head performs certain times of data iteration, and perform certain times of data on the task head again if the loss of the task head is greater than or equal to a preset loss threshold value. In the embodiment of the application, after the current task head completes one data iteration, the next data iteration is not executed immediately, but the next task head is subjected to the data iteration. And (3) after all task heads complete one iteration, executing the step of inputting the feature map of the target to be detected into the task heads with the same names as the training sets aiming at each training set to obtain the loss of each task head so as to perform the next iteration training on each task head. By carrying out serial iteration on a plurality of task heads, each task head can be guaranteed to output an accurate prediction result.
In the embodiment of the application, the loss of each task head comprises regression loss of the target detection head, classification loss of the target detection head and classification loss of the task head.
The loss function is used to determine the degree of deviation of the predicted outcome from the true outcome of the multitasking model output, and it is often necessary to minimize the loss function to iterate the multitasking model. Regression damage is used for continuous variables and classification loss is used for discrete variables. Specifically, the regression loss of the target detection head is used to adjust the accuracy of the target detection frame output by the target detection head. The classification loss of the target detection head is used for determining whether the target to be detected in the target detection frame is a target of a predicted task head output prediction result. The classification loss of the task head is used for determining a prediction result corresponding to the target to be detected, and further determining whether to output a correct prediction result of the target to be detected.
In the embodiment of the application, each task head comprises a feature aggregation layer and a task network, and aiming at each training set, a feature map of a target to be detected is input into the task head with the same name as the training set, and a multi-task model is trained iteratively, wherein the method comprises the following steps:
inputting the feature images of the targets to be detected into a feature aggregation layer of a task head with the same name as the training set aiming at each training set to obtain feature images of the targets to be detected with preset scales;
Inputting a feature map of a target to be detected with a preset scale into a task network, and iteratively training a multi-task model.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a task head according to an embodiment of the present application.
For each training set, inputting the feature images of the targets to be detected into a feature aggregation layer 241 of the task head 240 with the same name as the training set, and normalizing the feature images of the targets to be detected under different scales to a uniform size by using the feature aggregation layer 241 so as to obtain the feature images of the targets to be detected with preset scales. The preset scale is set according to actual requirements, and is not limited herein. The feature map of the target to be detected with the preset scale is input to the task network 242, and the task network 242 maps the features to the classifications by using a convolution network or a full-connection network to obtain a prediction result. And iteratively training the multi-task model according to the deviation degree of the predicted result and the real result. It should be understood that the task head 240 may further include other network structures, such as a feature extraction network, to further extract features from the feature map of the object to be detected by using the feature extraction network, where the other network structures are set according to actual requirements, and are not limited herein.
The application provides a training method of a multi-task model, wherein the multi-task model comprises a shared network, a target detection head and at least one task head, and the training method of the multi-task model comprises the following steps: acquiring a plurality of image samples; combining the image samples with the label result of the same task head to obtain a training set of each task head; inputting each training set into a shared network to obtain a feature map of an image sample; inputting the feature map of the image sample to a target detection head to obtain a feature map of a target to be detected; and inputting the feature map of the target to be detected into a task head with the same name as the training set aiming at each training set, and iteratively training the multi-task model. And inputting the feature map of the target to be detected in the training set to a task head with the same name as the training set aiming at each training set. Because the image samples in the training set are not input to all task heads, the image samples in the training set are input to the corresponding task heads according to the names of the training set and the task head names, and the feature graphs of the image samples input to the task heads have the label results of iterating the task heads. According to the deviation degree of the predicted result and the real result, the task head can be subjected to data iteration, and further the trained multi-task model can be utilized to obtain an accurate target detection result and a classification predicted result.
When target detection results of a plurality of targets to be detected are required to be output, the target detection results and the classification prediction results are independently output by utilizing a plurality of task heads of a model in the prior art, and the classification prediction results of the targets to be detected can be obtained only by matching the classification prediction results with the target detection results. According to the application, the target to be detected in the image is determined by utilizing the target detection head, and then the feature map of the target to be detected in the training set is input to the task head, so that the task head can directly output the classification prediction result of the target to be detected, and the classification prediction result is not required to be matched with the target detection result.
Example 2
In the prior art, a plurality of task heads of a model are utilized to independently output a target detection result and a classification prediction result, and the classification prediction result of a target to be detected can be obtained only by matching the classification prediction result with the target detection result. Taking the example that the model in the prior art comprises two task heads, in the case that the image comprises a plurality of targets to be detected, the first task head is used for outputting target detection results of the targets to be detected so as to determine positions of the targets to be detected in the image. The second task head is used for outputting classification prediction results of a plurality of targets to be detected. Because the classification prediction result and the target detection result are independent, each classification prediction result needs to be matched with the target detection result so as to obtain the classification prediction result of each target to be detected.
Referring to fig. 4, fig. 4 is a flowchart illustrating a task prediction method according to an embodiment of the present application.
The task prediction method in fig. 4 includes:
s310, acquiring a target image.
Target images that require target detection and target classification are acquired to input the target images into a multitasking model. The target image may be acquired by an image acquisition device, or may be directly acquired from a different data source, which is not described herein. The image form of the target image is also set according to the actual requirement, and may be a photograph or the like, which is not limited herein.
S320, inputting the target image into the multitasking model to obtain a target detection result and a prediction result of at least one task corresponding to the target detection result.
The multitasking model is obtained according to the training method of the multitasking model as in example 1. And inputting the target image into a sharing network of the multitasking model to obtain a feature map of the target image. The feature map of the target image is input to the target detection head to obtain a target detection result of the target image, wherein the target detection result may be a position of a target to be detected, and details are not described herein.
And mapping the target detection result to a feature map of the target image to obtain a feature map of the target to be detected. And inputting the feature map of the target to be detected into each task head to obtain a predicted result of at least one task corresponding to the target detection result. The target detection head is utilized to determine the target to be detected in the image, and then the feature map of the target to be detected in the training set is input to the task head, so that the task head can correspondingly output the classification prediction result of the target to be detected. The target detection result and the classification prediction result are output relative to a plurality of task heads using the multi-task model, and the target detection result corresponding to each classification prediction result does not need to be determined.
Example 3
Referring to fig. 5, fig. 5 is a schematic structural diagram of a training device for a multi-task model according to an embodiment of the application. The multitasking model includes a target detection head and at least one task head, and the training device 400 of the multitasking model in fig. 5 includes:
an image sample acquisition module 410, configured to acquire a plurality of image samples, where the image samples include a label result of at least one task head;
the training set obtaining module 420 is configured to combine the image samples with the same task head as the label result to obtain a training set of each task head, where a name of each training set is the same as a name of a corresponding task head;
a sample feature map obtaining module 430, configured to input each training set to the shared network to obtain a feature map of the image sample;
the target feature map obtaining module 440 is configured to input a feature map of the image sample to the target detection head to obtain a feature map of the target to be detected;
the model iterative training module 450 is configured to input, for each training set, a feature map of a target to be detected to a task head with the same name as that of the training set, and iteratively train the multi-task model.
In an embodiment of the present application, the steps of constructing the multitasking model include:
Acquiring the name of each training set;
constructing a task head with the same name as the training set aiming at each training set;
and constructing a multi-task model based on the shared network, the target detection heads and all task heads.
In an embodiment of the present application, the model iterative training module 450 includes:
the task head loss obtaining submodule is used for inputting a feature map of a target to be detected into a task head with the same name as the training set aiming at each training set to obtain the loss of each task head;
the task head iteration sub-module is used for sequentially carrying out data iteration on each task head based on the loss of each task head;
and the iteration completion sub-module is used for executing the step of inputting the feature map of the target to be detected to the task heads with the same name as the training set aiming at each training set under the condition that all the task heads complete data iteration to obtain the loss of each task head until the loss of each task head is smaller than a preset loss threshold value.
In the embodiment of the application, the loss of each task head comprises regression loss of the target detection head, classification loss of the target detection head and classification loss of the task head.
In an embodiment of the present application, the shared network further includes a backbone network and an intermediate network, and the sample feature map obtaining module 430 includes:
The initial feature map obtaining sub-module is used for inputting the training set into the backbone network to obtain an initial image sample feature map;
and the feature map scale updating sub-module is used for inputting the initial image sample feature map to the intermediate network to obtain the image sample feature map with updated scale.
In an embodiment of the present application, each task header includes a feature aggregation layer and a task network, and the model iterative training module 450 includes:
the sub-module is used for inputting the feature images of the targets to be detected into the feature aggregation layer of the task head with the same name as the training sets aiming at each training set to obtain the feature images of the targets to be detected with the preset scale;
and the iterative training sub-module is used for inputting the characteristic diagram of the target to be detected with the preset scale into the task network and iteratively training the multi-task model.
The training device 400 for the multi-task model is used to perform the corresponding steps in the training method for the multi-task model, and the implementation of each function is not described herein. Furthermore, the alternative example in embodiment 1 is also applicable to the training apparatus 400 of the multitasking model of embodiment 3.
Example 4
Referring to fig. 6, fig. 6 is a schematic structural diagram of a task prediction apparatus according to an embodiment of the present application. The task prediction apparatus 500 in fig. 6 includes:
An image acquisition module 510, configured to acquire a target image;
a prediction result obtaining module 520, configured to input the target image into a multitasking model, and obtain a target detection result and a prediction result of at least one task corresponding to the target detection result, where the multitasking model is obtained according to the training method of the multitasking model according to any one of claims 1 to 6.
The task prediction apparatus 500 is configured to perform the corresponding steps in the task prediction method, and specific implementation of each function is not described herein.
The embodiment of the application also provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and the computer program realizes the training method of the multi-task model as in the embodiment 1 or realizes the task prediction method as in the embodiment 2 when the computer program is executed by the processor.
The image sample acquiring module 410, the training set acquiring module 420, the sample feature map acquiring module 430, the target feature map acquiring module 440, the model iterative training module 450, the image acquiring module 510, the prediction result acquiring module 520 and the like in the present embodiment are all stored as program units in the memory, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and accurate results cannot be obtained when the kernel parameters are adjusted to detect and classify the targets by utilizing the multi-task model.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
The embodiment of the present application further provides a machine-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a training method of the multitasking model as in embodiment 1, or implements a task prediction method as in embodiment 2.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Machine-readable storage media, including both non-transitory and removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.
Claims (11)
1. A method for training a multi-tasking model, wherein the multi-tasking model comprises a shared network, a target detection head and at least one tasking head, the method for training the multi-tasking model comprising:
Acquiring a plurality of image samples, wherein the image samples comprise label results of at least one task head;
combining the image samples with the label results of the same task head to obtain a training set of each task head, wherein the name of each training set is the same as the name of the corresponding task head;
inputting each training set to the shared network to obtain a feature map of the image sample;
inputting the feature map of the image sample to the target detection head to obtain a feature map of a target to be detected;
and inputting the feature map of the target to be detected into the task heads with the same names as the training sets aiming at each training set, and iteratively training the multi-task model.
2. The method for training a multi-task model according to claim 1, wherein the step of constructing the multi-task model comprises:
acquiring the name of each training set;
constructing the task heads with the same name as the training sets aiming at each training set;
and constructing the multi-task model based on the shared network, the target detection heads and all the task heads.
3. The method according to claim 1, wherein the step of inputting the feature map of the object to be detected to the task head having the same name as the training set for each training set, iteratively training the multi-task model, comprises:
inputting the feature images of the targets to be detected into the task heads with the same names as the training sets aiming at each training set to obtain the loss of each task head;
based on the loss of each task head, sequentially carrying out data iteration on each task head;
and under the condition that all task heads complete data iteration, executing the step of inputting the feature map of the target to be detected to the task heads with the same names as the training sets aiming at each training set to obtain the loss of each task head until the loss of each task head is smaller than a preset loss threshold value.
4. A method of training a multitasking model according to claim 3 in which the loss of each task head comprises a regression loss of the target detection head, a classification loss of the target detection head and a classification loss of the task head.
5. The method for training a multi-task model according to claim 1, wherein the shared network further comprises a backbone network and an intermediate network, and the inputting each training set into the shared network, to obtain the feature map of the image sample, includes:
inputting the training set into the backbone network to obtain an initial image sample feature map;
and inputting the initial image sample feature map to the intermediate network to obtain the image sample feature map with updated scale.
6. The method according to claim 1, wherein each task head includes a feature aggregation layer and a task network, the inputting the feature map of the object to be detected to the task head having the same name as the training set for each training set, iteratively training the multi-task model, comprising:
inputting the feature images of the targets to be detected into a feature aggregation layer of the task head, which is the same as the name of the training set, aiming at each training set to obtain feature images of the targets to be detected with preset scales;
inputting the feature map of the target to be detected with the preset scale into the task network, and iteratively training the multi-task model.
7. A task prediction method, characterized in that the task prediction method comprises:
acquiring a target image;
inputting a target image into a multi-task model to obtain a target detection result and a prediction result of at least one task corresponding to the target detection result, wherein the multi-task model is obtained according to the training method of the multi-task model as claimed in any one of claims 1 to 6.
8. A training apparatus for a multitasking model, the multitasking model comprising a target detection head and at least one task head, the training apparatus comprising:
an image sample acquisition module for acquiring a plurality of image samples, wherein the image samples comprise label results of at least one task head;
the training set obtaining module is used for merging the image samples with the same label result as the task heads to obtain a training set of each task head, wherein the name of each training set is the same as the name of the corresponding task head;
the sample feature map obtaining module is used for inputting each training set to the sharing network to obtain a feature map of the image sample;
The target feature map obtaining module is used for inputting the feature map of the image sample to the target detection head to obtain a feature map of a target to be detected;
and the model iteration training module is used for inputting the feature map of the target to be detected into the task heads with the same names as the training sets aiming at each training set, and iteratively training the multi-task model.
9. A task prediction apparatus, characterized in that the task prediction apparatus comprises:
the image acquisition module is used for acquiring a target image;
a prediction result obtaining module, configured to input a target image into a multitasking model, and obtain a target detection result and a prediction result of at least one task corresponding to the target detection result, where the multitasking model is obtained according to the training method of the multitasking model according to any one of claims 1 to 6.
10. A computer device, characterized in that it comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, implements the training method of the multitasking model of any one of claims 1 to 7 or implements the task prediction method of claim 8.
11. A machine readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of training a multitasking model as claimed in any of claims 1 to 7 or implements a task prediction method as claimed in claim 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310961694.4A CN116977791A (en) | 2023-07-28 | 2023-07-28 | Multi-task model training method, task prediction method, device, computer equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310961694.4A CN116977791A (en) | 2023-07-28 | 2023-07-28 | Multi-task model training method, task prediction method, device, computer equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116977791A true CN116977791A (en) | 2023-10-31 |
Family
ID=88474580
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310961694.4A Pending CN116977791A (en) | 2023-07-28 | 2023-07-28 | Multi-task model training method, task prediction method, device, computer equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116977791A (en) |
-
2023
- 2023-07-28 CN CN202310961694.4A patent/CN116977791A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102140805B1 (en) | Neural network learning method and apparatus for object detection of satellite images | |
CN112819110B (en) | Incremental small sample target detection method and system based on weight generation | |
CN111274981B (en) | Target detection network construction method and device and target detection method | |
CN113128478B (en) | Model training method, pedestrian analysis method, device, equipment and storage medium | |
CN112906816B (en) | Target detection method and device based on optical differential and two-channel neural network | |
CN112598091A (en) | Training model and small sample classification method and device | |
CN110689134A (en) | Method, apparatus, device and storage medium for performing machine learning process | |
CN110245683A (en) | The residual error relational network construction method that sample object identifies a kind of less and application | |
CN115115825B (en) | Method, device, computer equipment and storage medium for detecting object in image | |
CN109190662A (en) | A kind of three-dimensional vehicle detection method, system, terminal and storage medium returned based on key point | |
CN112541394A (en) | Black eye and rhinitis identification method, system and computer medium | |
CN115995042A (en) | Video SAR moving target detection method and device | |
CN107316296B (en) | Remote sensing image change detection method and device based on logarithmic transformation | |
CN116805393A (en) | Hyperspectral image classification method and system based on 3DUnet spectrum-space information fusion | |
CN112101156A (en) | Target identification method and device and electronic equipment | |
CN115661573A (en) | Method and device for detecting infrared dim target, computing equipment and storage medium | |
CN109903246B (en) | Method and device for detecting image change | |
CN110866931A (en) | Image segmentation model training method and classification-based enhanced image segmentation method | |
CN114494823A (en) | Commodity identification, detection and counting method and system in retail scene | |
CN113963236A (en) | Target detection method and device | |
WO2020152487A1 (en) | Methods and apparatus to perform image analyses in a computing environment | |
CN112861652A (en) | Method and system for tracking and segmenting video target based on convolutional neural network | |
CN115147348B (en) | Tire defect detection method and system based on improved YOLOv3 | |
CN114677578B (en) | Method and device for determining training sample data | |
CN113591543B (en) | Traffic sign recognition method, device, electronic equipment and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |