CN116740712A - Target labeling method and device for infrared image, electronic equipment and storage medium - Google Patents

Target labeling method and device for infrared image, electronic equipment and storage medium Download PDF

Info

Publication number
CN116740712A
CN116740712A CN202310601213.9A CN202310601213A CN116740712A CN 116740712 A CN116740712 A CN 116740712A CN 202310601213 A CN202310601213 A CN 202310601213A CN 116740712 A CN116740712 A CN 116740712A
Authority
CN
China
Prior art keywords
target
infrared
detection model
training
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310601213.9A
Other languages
Chinese (zh)
Inventor
徐建国
彭海娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingxiang Technology Co Ltd
Original Assignee
Beijing Jingxiang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingxiang Technology Co Ltd filed Critical Beijing Jingxiang Technology Co Ltd
Priority to CN202310601213.9A priority Critical patent/CN116740712A/en
Publication of CN116740712A publication Critical patent/CN116740712A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The application discloses a target labeling method and device for an infrared image, electronic equipment and a storage medium, wherein the method comprises the steps of inputting the infrared image into an initial target detection model to obtain a first detection result of the infrared target, wherein the first detection result of the infrared target comprises the confidence coefficient of the infrared target and the confidence coefficient of the category of the infrared target; and taking the first detection result of the infrared target as an initial weight, and continuing training in the initial target detection model to obtain a final target detection model so as to output a second detection result of the infrared target, wherein the final target detection model comprises a labeling result of the infrared target. The application realizes the annotation of the infrared image and the training of the infrared image model. The application can be used for infrared target detection of pedestrians, vehicles and the like.

Description

Target labeling method and device for infrared image, electronic equipment and storage medium
Technical Field
The present application relates to the field of autopilot technologies, and in particular, to a method and apparatus for labeling an infrared image, an electronic device, and a storage medium.
Background
The data is the basis of the effective operation of a machine learning algorithm, and the process of data labeling is generally a mode of combining large model pre-labeling and manual selection. The labeling data required in the traditional machine learning method is relatively small, and tens of thousands of levels of data can optimize the performance of the model. However, the depth of the deep learning model is very deep, tens of layers and hundreds of layers are fewer, so hundreds of thousands of data are needed to optimize the performance of the model.
In order to improve the generalization performance of the model, data (images) of different scenes are usually required to be collected, then the data are marked and screened, the model is trained again, and finally the performance of the model is evaluated. The above collection and labeling requires a lot of time and labor.
In the related art, in the automatic driving scene, the image sensor and the model are almost all based on the RGB format image, and the labeling result is mostly based on the RGB image format. However, the infrared images are rarely marked, the disclosed data sets are fewer, imaging of infrared cameras of different manufacturers is different, training of the infrared image target recognition model is relatively difficult, and a large amount of manpower and material resources are needed to mark the data.
Disclosure of Invention
The embodiment of the application provides a target labeling method and device for infrared images, electronic equipment and storage media, and aims to realize infrared image labeling and infrared image model training.
The embodiment of the application adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a method for labeling an infrared image, where the method includes:
inputting an infrared image into an initial target detection model to obtain a first infrared target detection result, wherein the first infrared target detection result comprises the confidence level of an infrared target and the confidence level of the category of the infrared target;
And taking the first detection result of the infrared target as an initial weight, and continuing training in the initial target detection model to obtain a final target detection model so as to output a second detection result of the infrared target, wherein the final target detection model comprises a labeling result of the infrared target.
In some embodiments, the method further comprises: and screening the confidence coefficient of the infrared target in the infrared target first detection result and the confidence coefficient of the category of the infrared target by a preset threshold value to obtain the infrared target second detection result.
In some embodiments, the training in the initial target detection model to obtain a final target detection model includes:
if the confidence coefficient of the labeling result of the infrared target is larger than a first threshold value, taking the infrared target labeled in the labeling result of the infrared target as positive sample training data; and
selecting and marking a non-target to be detected or adding a false mark target sample in the second detection result of the infrared target as negative sample training data;
and training according to the positive sample training data and the negative sample training data to obtain a final target detection model.
In some embodiments, the training in the initial target detection model to obtain a final target detection model includes:
performing multi-scale target detection on the infrared image by adopting a sliding window, and performing target cross-union ratio (IOU) calculation on the detected multi-scale target and the sample training data;
and reserving sample training data meeting the IOU calculation result, dividing the sample training data into a training set and a testing set, taking the first infrared target detection result as an initial weight, and carrying out multi-round training on the initial target detection model to obtain a final target detection model.
In some embodiments, the training further in the initial target detection model to obtain a final target detection model using the first detection result of the infrared target as an initial weight includes:
and removing the target which does not accord with the form of the target to be detected by tracking the target and matching the target template in a final target detection model, wherein the target template at least comprises one of the following steps: target rate, target aspect ratio.
In some embodiments, the training further in the initial target detection model to obtain a final target detection model using the first detection result of the infrared target as an initial weight includes:
Taking the first detection result of the infrared target as an initial weight, reducing the confidence coefficient of the infrared target and the confidence coefficient of the category of the infrared target after each round of training is finished, and continuing training in the initial target detection model to obtain a final target detection model;
and/or the number of the groups of groups,
taking infrared targets detected simultaneously in at least two layers in the target detection model as sample training data in each training round, and continuing training in the initial target detection model to obtain a final target detection model;
and/or the number of the groups of groups,
tracking the infrared target during each training round, extracting the characteristic of the infrared target at the last moment and the target characteristic of the infrared target at the current moment to calculate the similarity, and continuing training in the initial target detection model to obtain a final target detection model;
and/or the number of the groups of groups,
tracking the position, length and width of the central point of the infrared target by adopting Kalman filtering, and continuing training in the initial target detection model to obtain a final target detection model.
In some embodiments, the object detection model comprises:
the main network backbox is used for extracting features and comprises a p2 layer, a p3 layer, a p4 layer and a p5 layer, wherein the p2 layer is used for feature matching during target tracking, and the p4 layer and the p5 layer are respectively used for detecting targets with different scales;
And the Head is used for judging the confidence coefficient of the infrared target and the confidence coefficient of the category of the infrared target according to the length and the width of the infrared target.
In a second aspect, an embodiment of the present application further provides a target labeling device for an infrared image, where the device includes:
the initial training module is used for inputting the infrared image into the initial target detection model to obtain an infrared target first detection result, wherein the infrared target first detection result comprises the confidence coefficient of the infrared target and the confidence coefficient of the category of the infrared target;
the optimization module is used for taking the first detection result of the infrared target as an initial weight, continuing training in the initial target detection model to obtain a final target detection model so as to output a second detection result of the infrared target, and the final target detection model comprises a labeling result of the infrared target.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the above method.
In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium storing one or more programs, which when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the above-described method.
The above at least one technical scheme adopted by the embodiment of the application can achieve the following beneficial effects: firstly inputting an infrared image into an initial target detection model to obtain an infrared target first detection result, then taking the infrared target first detection result as an initial weight, and continuing training in the initial target detection model to obtain a final target detection model so as to output an infrared target second detection result. The final target detection model comprises the labeling result of the infrared target, so that the self-adaptive labeling of the infrared image is completed, and the training of the model is completed.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a schematic flow chart of a target labeling method of an infrared image;
FIG. 2 is a schematic diagram of a target labeling device for infrared images;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The inventor finds that the target labeling in the related technology is basically a method based on the combination of model pre-labeling and manual adjustment. The high-performance large model is adopted to achieve a good effect on target detection, such as pedestrians, automobiles and high precision; and then combining manual adjustment to adjust a larger or smaller target frame and a false detection and missed detection target.
The existing model pre-labels more general scenes and common targets, and lacks labels for specific scene targets. For example, pedestrians and automobiles in RGB format images, if pedestrians in infrared images, the performance of the model is severely degraded, resulting in more manual post-processing. Meanwhile, the accuracy of the obtained infrared image marking result does not reach the degree that the infrared image marking result can be taken for use, the correction and the adjustment of the size of the target frame are needed to be carried out by combining manual work, the process needs to be carried out repeatedly, and more manpower and time are needed to be consumed.
In order to overcome the defects, the target labeling method for the infrared image in the embodiment of the application collects the video of the actual road, uses the model trained by the disclosed data as the initial weight of the subsequent model, and trains the model for identifying the pedestrian target in the labeling process.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
The embodiment of the application provides a target labeling method of an infrared image, as shown in fig. 1, and provides a flow chart of the target labeling method of the infrared image in the embodiment of the application, wherein the method at least comprises the following steps S110 to S120:
step S110, inputting an infrared image into an initial target detection model to obtain an infrared target first detection result, wherein the infrared target first detection result comprises the confidence level of an infrared target and the confidence level of the category of the infrared target.
And performing post-processing according to the collected video of the infrared pedestrian. Since the pedestrian of the RGB image and the pedestrian of the infrared image obtained by the camera have different color and texture information, but the shape, outline and semantic information of the pedestrian are almost the same, the first detection result of the infrared target can be obtained after the infrared image is input into the initial target detection model.
In the first detection result, it is considered whether the RGB image and the infrared image have characteristic similarities for the pedestrian target of the RBG image and the pedestrian of the infrared image, which are represented by semantic information of the pedestrian, geometric information of the outline and the like. Therefore, the common characteristic information can be fully utilized for infrared target marking and model training.
It should be noted that the confidence of the infrared target and the confidence of the category of the infrared target are included in the first detection result. That is to say, the detection result includes the confidence coefficient of the infrared target and the category of the infrared target, and the labeling result of the infrared target can be further evaluated according to the confidence coefficient and the category.
Step S120, taking the first detection result of the infrared target as an initial weight, and continuing training in the initial target detection model to obtain a final target detection model, so as to output a second detection result of the infrared target, where the final target detection model includes a labeling result of the infrared target.
And according to the first detection result of the infrared target obtained in the step, continuing training in the initial target detection model as an initial weight to obtain a final target detection model. And outputting a second detection result of the infrared target by the final target detection model. It should be noted that, the output of the second detection result is the confidence that the infrared target is a pedestrian or a non-pedestrian.
It should be noted that, the initial target detection model may be an initial target detection model in the related art, and the result obtained after the input of the infrared image is used as an initial weight to train the initial model again, that is, training the model through the detected infrared image result (for example, the pedestrian) (the last trained model is used as the initial weight of the training), reasoning the labeled data after training the model, and outputting a confidence level for each target.
The labeling result of the infrared target is included in the final target detection model. For multiple iterations of the model, the data labeling and model training are realized, and only a small amount of manual intervention is needed in the whole process.
Illustratively, to reduce mislabeling and missed labeling, the target for which the confidence of each output is greater than a preset threshold is taken as a positive sample. Meanwhile, a non-pedestrian image (such as a target comprising some common false detections) is selected as a negative sample; training a two-classification classifier, classifying the marked targets, reserving marking results with classification results larger than a preset threshold, and discarding other results.
By the method, videos acquired by cameras are acquired in an actual road, and a plurality of known public data training results are used as initial weights, so that a final target detection result is obtained by continuously training an initial model. Meanwhile, the infrared target recognition model is trained while the process of self-adaptive target labeling of the infrared image is performed. It should be noted that the above method can be extended to multi-target labeling, such as infrared image pedestrians and infrared image vehicles, with good effect.
In comparison with the prior art, when the infrared image target detection/labeling is performed by using the RBG image target detection model, the performance of the model is seriously reduced, so that more subsequent manual post-processing is required. By the method, an infrared image is input into an initial target detection model to obtain a first infrared target detection result, the first infrared target detection result is used as an initial weight, and training is continued in the initial target detection model to obtain a final target detection model. As the initial target detection model is trained for a plurality of times, the final target detection model can output the confidence coefficient of the infrared target and the labeling result of the infrared target, so that the subsequent manual processing is not required, and the performance (high recall rate and low FPR) of the model is ensured.
Compared with the prior art, the method has the advantages that the correction and the adjustment of the size of the target frame are combined manually, the correction process of workers is reduced, the size of the target frame is screened in advance, and the marking quality of the infrared target is improved.
According to the method, the similarity between the RGB image target and the infrared image target in characteristics is considered, namely, the similarity comprises but is not limited to semantic information of the target, geometric information of the target contour and the like, so that multi-scale characteristics of the target can be over-extracted, high-level semantic characteristics and low-level geometric characteristics of the target are utilized, then data labeling and model training are realized through iteration, only a small amount of manual intervention is needed in the whole process, finally, a model trained by the disclosed data is utilized as initial weight of a subsequent model, and a model for completing infrared target recognition is trained in the labeling process.
In one embodiment of the application, the method further comprises: and screening the confidence coefficient of the infrared target in the infrared target first detection result and the confidence coefficient of the category of the infrared target by a preset threshold value to obtain the infrared target second detection result.
Since the pedestrian of the RGB image and the pedestrian of the infrared image are different in color and texture information, the morphology, outline, and semantic information of the object are almost the same. Therefore, in the first detection of the infrared pedestrian target by using the detection information of the initial target detection model, the threshold values of the target detection can be respectively: po > =0.95, confidence of category is: pc > =0.95, and then the targets are screened in combination with the confidence level of the tracked targets (preferably, the confidence level of the first three times may be set to 1 and the confidence level of the later time may be set to 0.5). Training of the model is performed by pedestrians of the detected infrared image (the last trained model is used as the initial weight of this training). After training the model, reasoning the marked data, and outputting a confidence coefficient by each target: p=po×pc.
Preferably, the infrared target is an infrared pedestrian.
In one embodiment of the present application, the training in the initial target detection model to obtain a final target detection model includes: if the confidence coefficient of the labeling result of the infrared target is larger than a first threshold value, taking the infrared target labeled in the labeling result of the infrared target as positive sample training data; selecting and marking a non-target to be detected or adding a target sample marked by mistake in the second detection result of the infrared target as negative sample training data; and training according to the positive sample training data and the negative sample training data to obtain a final target detection model.
In order to reduce mislabeling and missed labeling, the target with P >0.6 is taken as a positive sample according to the confidence level set above. Meanwhile, a non-pedestrian image (including some common false detection targets such as roadside shrubs) is selected as a negative sample, a two-classification classifier is trained, labeled targets are classified, labeling results with classification results >0.6 are reserved, and other results are discarded.
According to the method, training of the classification model is carried out through the targets with high confidence coefficient (the targets are generally classified into the foreground and the background), and then the targets with low confidence coefficient are classified, so that mislabeling and missed labeling of the targets are reduced well.
In one embodiment of the present application, the training in the initial target detection model to obtain a final target detection model includes: performing multi-scale target detection on the infrared image by adopting a sliding window, and performing target cross-union ratio (IOU) calculation on the detected multi-scale target and the sample training data; and reserving sample training data meeting the IOU calculation result, dividing the sample training data into a training set and a testing set, taking the first infrared target detection result as an initial weight, and carrying out multi-round training on the initial target detection model to obtain a final target detection model.
In order to prevent missing detection and optimize the detection BOX of the target, the whole image is detected in a multi-scale mode through a sliding window method, the detected target and the marked sample are subjected to cross-union ratio IOU, the IOU is reserved to be less than 0.3, and other targets are discarded. The remaining labeling data are divided into a training set and a testing set, and the weight of the last model is used as an initial weight to train the model for a new round.
Note that the larger the overlap ratio IOU (Intersection over Union) for the comparison result, the more overlapping the targets are explained, so overlapping targets can be discarded by the IOU.
According to the method, the cross ratio of the infrared target frames is calculated, so that false detection caused by missing detection is prevented.
In one embodiment of the present application, the training in the initial target detection model to obtain a final target detection model using the first detection result of the infrared target as an initial weight includes: and removing the target which does not accord with the form of the target to be detected by tracking the target and matching the target template in a final target detection model, wherein the target template at least comprises one of the following steps: target rate, target aspect ratio.
By tracking the object and template matching (aspect ratio of the object), some objects that do not conform to the object morphology are removed, taking into account that there may be false detections due to the object detection model. Throughout the labeling process, motion information and aspect ratio of the object are used and can be implemented with object tracking.
Preferably, the above object is a pedestrian.
It will be appreciated that scenes where there is false detection include, but are not limited to, the outer image has a high likelihood of identifying as a pedestrian with the tree and the tree shrubs on the roadside.
Illustratively, the object is tracked each time by the object detection model, and at the feature detection layer of the object detection model, the feature of the object is extracted, and the central point (corresponding to the obtained rate) and the length and width (corresponding to the obtained aspect ratio) of the tracked object are filtered by using kalman for calculating the similarity of the feature of the object at the previous moment and the feature of the object at the current moment.
Further, it is preferable that such targets are discarded if the rate or aspect ratio change of the target is found abnormal when the next target comes, and the targets are classified into static targets and dynamic targets by the rate of the targets.
Further, taking the object as a pedestrian as an example, the confidence given when the dynamic object meets the pedestrian is 1. If the aspect ratio is consistent with the characteristics of a pedestrian, a relatively low confidence level of 0.5 is given, otherwise the confidence level is assigned based on the normal distribution of the aspect ratio template and the rate of the object.
In the method, in order to ensure the accuracy of target matching, the image features are acquired from the feature extraction layer of the model to match the target features of the front frame and the rear frame. And assignment is performed according to the rate and the aspect ratio of the target in a Gaussian distribution mode, so that target matching is more accurate.
In one embodiment of the present application, the training in the initial target detection model to obtain a final target detection model using the first detection result of the infrared target as an initial weight includes: taking the first detection result of the infrared target as an initial weight, reducing the confidence coefficient of the infrared target and the confidence coefficient of the category of the infrared target after each round of training is finished, and continuing training in the initial target detection model to obtain a final target detection model; and/or, taking infrared targets detected simultaneously in at least two layers in the target detection model as sample training data in each round of training, and continuing training in the initial target detection model to obtain a final target detection model; and/or tracking the infrared target during each training round, extracting the characteristic of the infrared target at the last moment and the characteristic of the infrared target at the current moment, calculating the similarity, and continuing training in the initial target detection model to obtain a final target detection model; and/or tracking the central point position and the length and width of the infrared target by adopting Kalman filtering, and continuing training in the initial target detection model to obtain a final target detection model.
The confidence of whether the object is set to Po, the confidence of the object class is set to Pc, and the length and width (H, W) of the object in the object detection model.
Firstly, after one round of training is finished, pc and Po are respectively reduced by 0.05, targets detected at two layers respectively and simultaneously are only considered in the previous three rounds, and the test performance of the model can be stopped after reaching a set performance index;
second, the values of Pc and Po should not be below 0.5, and when below 0.5 a false positive of the target will occur.
Finally, the quality of the data annotation can be significantly improved by adding some representative mislabeling samples (for example, one for two-class models can be trained).
In one embodiment of the application, the object detection model comprises: the main network backbox is used for extracting features and comprises a p2 layer, a p3 layer, a p4 layer and a p5 layer, wherein the p2 layer is used for feature matching during target tracking, and the p4 layer and the p5 layer are respectively used for detecting targets with different scales; and the Head is used for judging the confidence coefficient of the infrared target and the confidence coefficient of the category of the infrared target according to the length and the width of the infrared target.
Specifically, the backbone network backup is divided into P2, P3, P4 and P5 layers, wherein the P2 layer is used for feature matching during target tracking, the P3, P4 and P5 layers are used for detecting multi-scale pedestrian targets, and the head is as follows: confidence (Po) of whether or not it is a target; confidence (Pc) of category; length and width (H, W) of the target.
It will be appreciated that for the above P2 layer, P3 layer, a model block (the latter)A layer) the size of the feature map feature output, such as P1, corresponds to the resolution of the input/(2 to the power of 1, 2 1 ) For example, 1024×1024, the resolution of the P1 layer output is 512×512; and so on. It will also be appreciated that for layers of lower resolution (e.g., P4), more semantic information is included, and for layers of higher resolution (e.g., P2), more detailed information is included.
For the object detection model, the high-level semantic features and the low-level geometric features of the object are utilized by extracting multi-scale features (p 2, p3 and p4 layers) of the object.
For the object detection model, the annotation uses the motion information and aspect ratio of the object, thereby being implemented by object tracking. Meanwhile, in order to ensure the accuracy of target matching, the p2 layer in the target detection model is utilized to match the target characteristics of the front frame and the rear frame.
Preferably, in an embodiment of the present application, there is provided a target labeling method for an infrared image, where the method includes:
inputting an infrared image into an initial target detection model to obtain a first infrared target detection result, wherein the first infrared target detection result comprises the confidence level of an infrared target and the confidence level of the category of the infrared target;
And taking the first detection result of the infrared target as an initial weight, and continuing training in the initial target detection model to obtain a final target detection model so as to output a second detection result of the infrared target, wherein the final target detection model comprises a labeling result of the infrared target.
Preferably, the method further comprises: and screening the confidence coefficient of the infrared target in the infrared target first detection result and the confidence coefficient of the category of the infrared target by a preset threshold value to obtain the infrared target second detection result.
Preferably, the training in the initial target detection model to obtain a final target detection model includes:
if the confidence coefficient of the labeling result of the infrared target is larger than a first threshold value, taking the infrared target labeled in the labeling result of the infrared target as positive sample training data; and
selecting and marking a non-target to be detected or adding a false mark target sample in the second detection result of the infrared target as negative sample training data;
and training according to the positive sample training data and the negative sample training data to obtain a final target detection model.
Preferably, the training in the initial target detection model to obtain a final target detection model includes:
Performing multi-scale target detection on the infrared image by adopting a sliding window, and performing target cross-union ratio (IOU) calculation on the detected multi-scale target and the sample training data;
and reserving sample training data meeting the IOU calculation result, dividing the sample training data into a training set and a testing set, taking the first infrared target detection result as an initial weight, and carrying out multi-round training on the initial target detection model to obtain a final target detection model.
Preferably, the step of taking the first detection result of the infrared target as an initial weight and continuing training in the initial target detection model to obtain a final target detection model includes:
and removing the target which does not accord with the form of the target to be detected by tracking the target and matching the target template in a final target detection model, wherein the target template at least comprises one of the following steps: target rate, target aspect ratio.
Preferably, the step of taking the first detection result of the infrared target as an initial weight and continuing training in the initial target detection model to obtain a final target detection model includes:
taking the first detection result of the infrared target as an initial weight, reducing the confidence coefficient of the infrared target and the confidence coefficient of the category of the infrared target after each round of training is finished, and continuing training in the initial target detection model to obtain a final target detection model;
Taking infrared targets detected simultaneously in at least two layers in the target detection model as sample training data in each training round, and continuing training in the initial target detection model to obtain a final target detection model;
tracking the infrared target during each training round, extracting the characteristic of the infrared target at the last moment and the target characteristic of the infrared target at the current moment to calculate the similarity, and continuing training in the initial target detection model to obtain a final target detection model;
tracking the position, length and width of the central point of the infrared target by adopting Kalman filtering, and continuing training in the initial target detection model to obtain a final target detection model.
The embodiment of the present application further provides an infrared image target labeling device 200, as shown in fig. 2, and provides a schematic structural diagram of the infrared image target labeling device in the embodiment of the present application, where the infrared image target labeling device 200 at least includes: a preliminary training module 210 and an optimization module 220, wherein:
in one embodiment of the present application, the preliminary training module 210 is specifically configured to: and inputting the infrared image into an initial target detection model to obtain a first infrared target detection result, wherein the first infrared target detection result comprises the confidence level of the infrared target and the confidence level of the category of the infrared target.
And performing post-processing according to the collected video of the infrared pedestrian. Since the pedestrian of the RGB image and the pedestrian of the infrared image obtained by the camera have different color and texture information, but the shape, outline and semantic information of the pedestrian are almost the same, the first detection result of the infrared target can be obtained after the infrared image is input into the initial target detection model.
In the first detection result, it is considered whether the RGB image and the infrared image have characteristic similarities for the pedestrian target of the RBG image and the pedestrian of the infrared image, which are represented by semantic information of the pedestrian, geometric information of the outline and the like. Therefore, the common characteristic information can be fully utilized for infrared target marking and model training.
It should be noted that the confidence of the infrared target and the confidence of the category of the infrared target are included in the first detection result. That is to say, the detection result includes the confidence coefficient of the infrared target and the category of the infrared target, and the labeling result of the infrared target can be further evaluated according to the confidence coefficient and the category.
In one embodiment of the present application, the optimization module 220 is specifically configured to: and taking the first detection result of the infrared target as an initial weight, and continuing training in the initial target detection model to obtain a final target detection model so as to output a second detection result of the infrared target, wherein the final target detection model comprises a labeling result of the infrared target.
And according to the first detection result of the infrared target obtained in the step, continuing training in the initial target detection model as an initial weight to obtain a final target detection model. And outputting a second detection result of the infrared target by the final target detection model. It should be noted that, the output of the second detection result is the confidence that the infrared target is a pedestrian or a non-pedestrian.
It should be noted that, the initial target detection model may be an initial target detection model in the related art, and the result obtained after the input of the infrared image is used as an initial weight to train the initial model again, that is, training the model through the detected infrared image result (for example, the pedestrian) (the last trained model is used as the initial weight of the training), reasoning the labeled data after training the model, and outputting a confidence level for each target.
The labeling result of the infrared target is included in the final target detection model. For multiple iterations of the model, the data labeling and model training are realized, and only a small amount of manual intervention is needed in the whole process.
Illustratively, to reduce mislabeling and missed labeling, the target for which the confidence of each output is greater than a preset threshold is taken as a positive sample. Meanwhile, a non-pedestrian image (such as a target comprising some common false detections) is selected as a negative sample; training a two-classification classifier, classifying the marked targets, reserving marking results with classification results larger than a preset threshold, and discarding other results.
It can be understood that the above-mentioned target labeling device for infrared images can implement each step of the target labeling method for infrared images provided in the foregoing embodiment, and the relevant explanation about the target labeling method for infrared images is applicable to the target labeling device for infrared images, which is not described herein.
Fig. 3 is a schematic structural view of an electronic device according to an embodiment of the present application. Referring to fig. 3, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 3, but not only one bus or type of bus.
And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the target marking device of the infrared image on the logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:
inputting an infrared image into an initial target detection model to obtain a first infrared target detection result, wherein the first infrared target detection result comprises the confidence level of an infrared target and the confidence level of the category of the infrared target;
and taking the first detection result of the infrared target as an initial weight, and continuing training in the initial target detection model to obtain a final target detection model so as to output a second detection result of the infrared target, wherein the final target detection model comprises a labeling result of the infrared target.
The method executed by the infrared image target marking apparatus disclosed in the embodiment of fig. 1 of the present application may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
The electronic device may further execute the method executed by the target labeling device of the infrared image in fig. 1, and implement the function of the target labeling device of the infrared image in the embodiment shown in fig. 1, which is not described herein.
The embodiment of the application also proposes a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device comprising a plurality of application programs, enable the electronic device to perform a method performed by a target labeling apparatus for infrared images in the embodiment shown in fig. 1, and specifically for performing:
inputting an infrared image into an initial target detection model to obtain a first infrared target detection result, wherein the first infrared target detection result comprises the confidence level of an infrared target and the confidence level of the category of the infrared target;
and taking the first detection result of the infrared target as an initial weight, and continuing training in the initial target detection model to obtain a final target detection model so as to output a second detection result of the infrared target, wherein the final target detection model comprises a labeling result of the infrared target.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. A method for labeling targets of infrared images, wherein the method comprises:
inputting an infrared image into an initial target detection model to obtain a first infrared target detection result, wherein the first infrared target detection result comprises the confidence level of an infrared target and the confidence level of the category of the infrared target;
and taking the first detection result of the infrared target as an initial weight, and continuing training in the initial target detection model to obtain a final target detection model so as to output a second detection result of the infrared target, wherein the final target detection model comprises a labeling result of the infrared target.
2. The method of claim 1, wherein the method further comprises: and screening the confidence coefficient of the infrared target in the infrared target first detection result and the confidence coefficient of the category of the infrared target by a preset threshold value to obtain the infrared target second detection result.
3. The method of claim 2, wherein the continuing training in the initial target detection model to obtain a final target detection model comprises:
if the confidence coefficient of the labeling result of the infrared target is larger than a first threshold value, taking the infrared target labeled in the labeling result of the infrared target as positive sample training data; and
Selecting and marking a non-target to be detected or adding a false mark target sample in the second detection result of the infrared target as negative sample training data;
and training according to the positive sample training data and the negative sample training data to obtain a final target detection model.
4. A method according to claim 3, wherein said continuing training in said initial target detection model to obtain a final target detection model comprises:
performing multi-scale target detection on the infrared image by adopting a sliding window, and performing target cross-union ratio (IOU) calculation on the detected multi-scale target and the sample training data;
and reserving sample training data meeting the IOU calculation result, dividing the sample training data into a training set and a testing set, taking the first infrared target detection result as an initial weight, and carrying out multi-round training on the initial target detection model to obtain a final target detection model.
5. The method of claim 1, wherein the training further in the initial target detection model to obtain a final target detection model using the first detection result of the infrared target as an initial weight comprises:
and removing the target which does not accord with the form of the target to be detected by tracking the target and matching the target template in a final target detection model, wherein the target template at least comprises one of the following steps: target rate, target aspect ratio.
6. The method of claim 1, wherein the training further in the initial target detection model to obtain a final target detection model using the first detection result of the infrared target as an initial weight comprises:
taking the first detection result of the infrared target as an initial weight, reducing the confidence coefficient of the infrared target and the confidence coefficient of the category of the infrared target after each round of training is finished, and continuing training in the initial target detection model to obtain a final target detection model;
and/or the number of the groups of groups,
taking infrared targets detected simultaneously in at least two layers in the target detection model as sample training data in each training round, and continuing training in the initial target detection model to obtain a final target detection model;
and/or the number of the groups of groups,
tracking the infrared target during each training round, extracting the characteristic of the infrared target at the last moment and the target characteristic of the infrared target at the current moment to calculate the similarity, and continuing training in the initial target detection model to obtain a final target detection model;
and/or the number of the groups of groups,
tracking the position, length and width of the central point of the infrared target by adopting Kalman filtering, and continuing training in the initial target detection model to obtain a final target detection model.
7. The method of any one of claims 1 to 6, wherein the object detection model comprises:
the main network backbox is used for extracting features and comprises a p2 layer, a p3 layer, a p4 layer and a p5 layer, wherein the p2 layer is used for feature matching during target tracking, and the p4 layer and the p5 layer are respectively used for detecting targets with different scales;
and the Head is used for judging the confidence coefficient of the infrared target and the confidence coefficient of the category of the infrared target according to the length and the width of the infrared target.
8. An infrared image target labeling apparatus, wherein the apparatus comprises:
the initial training module is used for inputting the infrared image into the initial target detection model to obtain an infrared target first detection result, wherein the infrared target first detection result comprises the confidence coefficient of the infrared target and the confidence coefficient of the category of the infrared target;
the optimization module is used for taking the first detection result of the infrared target as an initial weight, continuing training in the initial target detection model to obtain a final target detection model so as to output a second detection result of the infrared target, and the final target detection model comprises a labeling result of the infrared target.
9. An electronic device, comprising:
a processor; and
A memory arranged to store computer executable instructions which, when executed, cause the processor to perform the method of any of claims 1 to 7.
10. A computer readable storage medium storing one or more programs, which when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of any of claims 1-7.
CN202310601213.9A 2023-05-25 2023-05-25 Target labeling method and device for infrared image, electronic equipment and storage medium Pending CN116740712A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310601213.9A CN116740712A (en) 2023-05-25 2023-05-25 Target labeling method and device for infrared image, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310601213.9A CN116740712A (en) 2023-05-25 2023-05-25 Target labeling method and device for infrared image, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116740712A true CN116740712A (en) 2023-09-12

Family

ID=87905409

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310601213.9A Pending CN116740712A (en) 2023-05-25 2023-05-25 Target labeling method and device for infrared image, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116740712A (en)

Similar Documents

Publication Publication Date Title
US11455805B2 (en) Method and apparatus for detecting parking space usage condition, electronic device, and storage medium
CN113468967B (en) Attention mechanism-based lane line detection method, attention mechanism-based lane line detection device, attention mechanism-based lane line detection equipment and attention mechanism-based lane line detection medium
CN109993031B (en) Method and device for detecting target illegal driving behaviors of livestock-powered vehicle and camera
CN110516514B (en) Modeling method and device of target detection model
KR101848019B1 (en) Method and Apparatus for Detecting Vehicle License Plate by Detecting Vehicle Area
CN113298050B (en) Lane line recognition model training method and device and lane line recognition method and device
CN104361359A (en) Vehicle recognition method based on image detection
CN111126393A (en) Vehicle appearance refitting judgment method and device, computer equipment and storage medium
CN112036462A (en) Method and device for model training and target detection
CN110599453A (en) Panel defect detection method and device based on image fusion and equipment terminal
CN113836850A (en) Model obtaining method, system and device, medium and product defect detection method
CN109993032B (en) Shared bicycle target identification method and device and camera
CN110728229B (en) Image processing method, device, equipment and storage medium
CN116977979A (en) Traffic sign recognition method, system, equipment and storage medium
CN112287905A (en) Vehicle damage identification method, device, equipment and storage medium
CN112733864A (en) Model training method, target detection method, device, equipment and storage medium
CN113903014B (en) Lane line prediction method and device, electronic device and computer-readable storage medium
CN112446375A (en) License plate recognition method, device, equipment and storage medium
CN116740712A (en) Target labeling method and device for infrared image, electronic equipment and storage medium
CN113591543B (en) Traffic sign recognition method, device, electronic equipment and computer storage medium
CN109949335B (en) Image processing method and device
CN114037976A (en) Road traffic sign identification method and device
CN110659384B (en) Video structured analysis method and device
CN116503695B (en) Training method of target detection model, target detection method and device
US20240233325A9 (en) Method for classifying images and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination