CN111046880A

CN111046880A - Infrared target image segmentation method and system, electronic device and storage medium

Info

Publication number: CN111046880A
Application number: CN201911195519.9A
Authority: CN
Inventors: 荆楠; 张智杰; 雷波; 谭海; 孙钢波
Original assignee: 717th Research Institute of CSIC
Current assignee: 717th Research Institute of CSIC
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2020-04-21
Anticipated expiration: 2039-11-28
Also published as: CN111046880B

Abstract

The invention provides an infrared target image segmentation method, a system, electronic equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of collecting infrared target images under various scenes as a training data set, preprocessing the infrared target images, marking target examples, making a pixel level binary mask, extracting a multi-level resolution size characteristic diagram of the infrared target images, presetting candidate frames with different sizes one by pixel points, inputting the candidate frames into a region nomination network binary classification and boundary frame modification, filtering the candidate frames of a background, carrying out ROI Align operation on a foreground candidate frame to obtain an ROI region, carrying out N class classification, boundary frame regression and binary mask generation on the ROI region, and finally obtaining a trained infrared target image segmentation model. The method solves the problem that the real-time performance of the existing image processing method is difficult to guarantee in complex scenes, can be suitable for target detection in infrared images in various complex scenes, and meanwhile realizes the real-time performance of infrared image processing and reduces the calculated amount.

Description

Infrared target image segmentation method and system, electronic device and storage medium

Technical Field

The present invention relates to the field of image processing, and in particular, to an infrared target image segmentation method, system, electronic device, and storage medium.

Background

In the field of computer vision, target detection is taken as a classical research direction and is widely applied to aspects of traffic monitoring, image retrieval, human-computer interaction and the like. The infrared target detection is used as an important branch in computer image processing, is suitable for the condition that the color and the form of a target are similar to the surrounding environment, and can be applied to the fields of security monitoring, military investigation, night driving, shipping and the like. The infrared image reflects the relative temperature information of an object, is less influenced by weather factors, has the advantages of long detection distance, high detection reliability and the like compared with infrared spectrum imaging of equipment such as an illumination camera, night vision and the like, but has the defects of low resolution, fuzzy details and the like.

At present, in common visible light target segmentation methods, a threshold segmentation-based method is easy to implement in engineering and small in calculation amount, but is difficult to process scenes with complex backgrounds and more interference, and is easy to have false detection and missed detection; complicated methods such as convolutional neural networks can obtain better detection results, but the calculation amount is large, and the requirement of real-time processing is difficult to meet.

Therefore, it is necessary to provide an infrared target image segmentation method capable of processing complex scenes and realizing real-time processing.

Disclosure of Invention

In view of this, the embodiment of the present invention provides an infrared target image segmentation method to solve the problem that the existing image segmentation method is difficult to adapt to a complex application scene and ensure real-time image processing.

In a first aspect of the embodiments of the present invention, a method for segmenting an infrared target image is provided, including:

acquiring infrared target images under various scenes according to the characteristic attributes of the infrared target to be detected;

preprocessing the infrared target image, marking a target instance in the infrared target image by a marking tool, and manufacturing a pixel-level binary mask;

extracting a multi-level resolution size characteristic diagram in the infrared target image through a pre-trained ResNet network, and presetting a predetermined number of prior frames with different sizes for pixel points in the multi-level resolution size characteristic diagram one by one;

inputting the prior frame serving as a candidate frame into a regional nomination network for binary classification and boundary frame modification of a foreground or a background, removing the candidate frame belonging to a background class, and performing ROI Align operation on the obtained candidate frame of the foreground class to obtain an ROI region;

and respectively carrying out N category classification, bounding box regression and binary mask generation on the ROI to obtain a trained infrared target image segmentation model.

In a second aspect of the embodiments of the present invention, there is provided an infrared target image segmentation system, including:

the acquisition module is used for acquiring infrared target images under various scenes according to the characteristic attributes of the infrared target to be detected;

the marking module is used for preprocessing the infrared target image, marking a target instance in the infrared target image through a marking tool and manufacturing a pixel-level binary mask;

the extraction module is used for extracting a multi-level resolution ratio size characteristic diagram in the infrared target image through a pre-trained ResNet network and presetting a predetermined number of prior frames with different sizes for pixel points one by one in the multi-level resolution ratio size characteristic diagram;

the input module is used for inputting the prior frame serving as a candidate frame into a regional nomination network for binary classification and boundary frame modification of a foreground or a background, removing the candidate frame belonging to a background class, and performing ROI Align operation on the obtained candidate frame of the foreground class to obtain an ROI region;

and the processing module is used for respectively carrying out N-type classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.

In a third aspect of the embodiments of the present invention, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable by the processor, where the processor executes the computer program to implement the steps of the method according to the first aspect of the embodiments of the present invention.

In a fourth aspect of the embodiments of the present invention, a computer-readable storage medium is provided, which stores a computer program, which when executed by a processor implements the steps of the method provided by the first aspect of the embodiments of the present invention.

In the embodiment of the invention, infrared target images under various scenes are collected according to the characteristic attributes of the infrared target to be detected and used as training data sets, the infrared target images are preprocessed, a labeling tool is adopted to label a target instance and make a pixel level binary mask, a multi-level resolution size characteristic map of the infrared target images is extracted through a pre-trained ResNet network, a certain number of prior frames with different sizes are preset for pixel points in the multi-level resolution size characteristic map one by one, the prior frames are used as candidate frames and input to a region nomination network to carry out foreground or background binary classification and boundary frame modification, the candidate frames of a background are filtered, the nomination regions are aligned to obtain ROI regions, the ROI regions are respectively subjected to N class classification, boundary frame regression and binary mask generation, and finally an infrared target image segmentation model is trained to carry out target detection segmentation in the infrared images, the method can adapt to target detection in the infrared image under various complex scenes, has strong anti-interference capability, can ensure the accuracy of detection, has simple and quick detection and identification processes, solves the problems that the calculation process of the existing infrared image target detection method is complex under the complex scenes and the real-time performance can be guaranteed, can reduce the calculated amount, improves the infrared image processing and analyzing capability, realizes the real-time detection of the target under the complex scenes, provides pixel-level target acquisition and segmentation, and has high practical value.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of an infrared target image segmentation method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an infrared target image segmentation system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons skilled in the art without any inventive work shall fall within the protection scope of the present invention, and the principle and features of the present invention shall be described below with reference to the accompanying drawings.

The terms "comprises" and "comprising," when used in this specification and claims, and in the accompanying drawings and figures, are intended to cover non-exclusive inclusions, such that a process, method or system, or apparatus that comprises a list of steps or elements is not limited to the listed steps or elements.

Referring to fig. 1, a schematic flow chart of an infrared image target detection method according to an embodiment of the present invention includes:

s101, acquiring infrared target images under various scenes according to the characteristic attributes of the infrared target to be detected;

the infrared target to be detected is a target object detected by infrared signals of detected target radiation, and can be a person, a vehicle, an animal and the like generally. The characteristic property of the infrared target may be the size of the target, the radiation intensity, etc. The multiple scenes refer to multiple complex scenes, such as regional scenes with dense people flow and traffic flow, infrared target images under at least three complex scenes can be acquired, and in order to improve the accuracy of infrared target detection, the acquired scenes can be increased.

Optionally, the infrared thermal imager is used for shooting infrared images of different targets in different complex scenes, and focusing, zooming and exposure parameters of the infrared thermal imager are continuously changed to form an infrared target image data set.

The shooting scene, the target, the focal length parameter, the exposure parameter and the like of the infrared image in the infrared target data set can be different, and the diversity of the data set is ensured.

S102, preprocessing the infrared target image, marking a target instance in the infrared target image through a marking tool, and manufacturing a pixel-level binary mask;

the preprocessing process specifically includes methods such as image rotation and translation, random trimming, color dithering, translation transformation, scale transformation, contrast transformation, noise disturbance and the like, and data enhancement is performed on the infrared target image. And labeling the target instance in the infrared target image by using a LabelImg labeling tool.

The binary mask refers to a binary image composed of 0 and 1, and controls the image processing process by locally or non-shielding the image. Image masks may be defined by specifying data values, data ranges, infinite or finite values, regions of interest, annotation files, and the like.

Specifically, data enhancement is carried out on the infrared target image by methods of image rotation translation, random trimming, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance; performing data augmentation by adopting a class balance strategy based on data class imbalance; and randomly sequencing the infrared target images.

In the classification learning algorithm, the classification accuracy is easily influenced by the great difference of the proportions of different classes of samples, the data set can be diversified through data augmentation, and the generalization capability of the model is improved. The same target picture can appear in succession among the data enhancement process, can lead to the model to learn the characteristic of same target in succession in the training process, appears overfitting phenomenon, carries out random sequencing after disordering data sample order in this embodiment to reach the effect that promotes model performance in the training process.

S103, extracting a multi-level resolution size characteristic diagram in the infrared target image through a pre-trained ResNet network, and presetting a predetermined number of prior frames with different sizes for pixel points one by one in the multi-level resolution size characteristic diagram;

the ResNet network model is a residual learning network for feature extraction, the ResNet network is pre-trained through ImageNet classification data sets, and feature maps of infrared target images are extracted by using the trained ResNet network.

Optionally, the convolutional neural network based structure outputs a two-branch structure comprising two classes (foreground and background) and bounding box modified regression. Specifically, firstly, a 3 × 3 convolution kernel is used to generate fully connected features with 256-dimensional or 512-dimensional dimensions on a multi-level resolution feature map, and then the generated 256-dimensional or 512-dimensional features are used to generate 2 fully connected layer branches; the regression layer predicts the coordinates and the width and the height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;

the loss function of an RPN (Region pro-social Network) Network is composed of certain weights of both softmax loss and regression loss. Calculating Softmax loss through a background calibration result and a prediction result corresponding to the candidate frame, wherein the regression loss is calculated as follows;

t_x＝(x-x_a)/w_a,t_y＝(y-y_a)/h_a

t_w＝log(w/w_a),t_h＝log(h/h_a)

the loss function of the RPN network is calculated as follows:

wherein, x and y are coordinates of the center point of the candidate frame predicted by the RPN network, w and h are width and height of the candidate frame, and the position information of the prior frame is (x)_a,y_a,w_a,h_a) The position information of the real label frame is (x)^*,y^*,w^*,h^*)。

S104, inputting the prior frame serving as a candidate frame into a regional nomination network for binary classification and boundary frame modification of a foreground or a background, removing the candidate frame belonging to a background class, and performing ROI Align operation on the obtained candidate frame of the foreground class to obtain an ROI region;

the region nomination network (or region growing network, i.e. RPN) can classify the corresponding target or non-target in the candidate box and perform frame modification on the candidate box. The classified candidate frames can be divided into foreground candidate frames and background candidate frames. The ROI region is a region of interest in image processing.

The ROI Align operation is to determine the characteristic value of each pixel point in the original image RoI area through bilinear interpolation, then perform operations such as maximum or average pooling and the like to improve the precision, and solve the problem of deviation alignment caused by direct sampling in the pooling process.

Specifically, the ROI region is divided into a predetermined number of bin regions, such as 7 × 7 bin regions, 4 sampling points are selected in each bin region, pixel values of 4 feature points closest to each sampling point are obtained, and the pixel value of each sampling point is determined by bilinear interpolation;

the average or maximum pooling for each bin region is calculated to generate a feature map of the ROI region, i.e., a 7 × 7 size feature map.

And S105, respectively carrying out N-type classification, bounding box regression and binary mask generation on the ROI to obtain a trained infrared target image segmentation model.

And repeating the steps of inputting the candidate frame into the region nomination network in the step S104, removing the candidate frame belonging to the background, obtaining the ROI region, and performing classification, frame regression and binary mask generation operations on the ROI region in the step S105, so that training of an infrared target image segmentation model can be completed based on infrared target image data, and target detection segmentation can be rapidly and accurately performed on the infrared image to be detected through the infrared target image segmentation model.

Optionally, the classification branch structure extracts full-connection features by using two full-connection layers, and the classification branch generates N-dimensional features to represent N category scores

The regression branch structure extracts full-connection features by using two full-connection layers and generates N_num4-dimensional features represent generated bounding box coordinates;

extracting convolution characteristics by using five full convolution layers in the mask branch structure, and generating N28X 28 dimensional characteristics by using the mask branch to represent the generated N types of masks;

wherein the loss function is:

L＝L_cls+L_box+L_mask

L_clsto classify the branch loss, L_boxTo regress branch loss, L_maskIs the mask branch loss.

The method provided by the embodiment can solve the problems that the existing infrared image detection method is complex in calculation process under a complex scene and can guarantee real-time performance, can better adapt to a complex background, enhances the anti-interference capability on scenes such as the ground and the like, and expands the application range of the method.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 2 is a schematic structural diagram of an infrared target image segmentation system according to an embodiment of the present invention, where the system includes:

the acquisition module 210 is configured to acquire infrared target images in multiple scenes according to the characteristic attributes of the infrared target to be detected;

optionally, the acquiring infrared target images under multiple scenes according to the characteristic attributes of the infrared target to be detected includes:

the infrared thermal imager is used for shooting infrared images of different targets in different complex scenes, and the focusing, zooming and exposure parameters of the infrared thermal imager are continuously changed to form an infrared target image data set.

The labeling module 220 is configured to pre-process the infrared target image, label a target instance in the infrared target image by using a labeling tool, and make a pixel-level binary mask;

optionally, the preprocessing the infrared target image includes:

respectively carrying out data enhancement on the infrared target image by methods of image rotation translation, random trimming, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance;

performing data augmentation by adopting a class balance strategy based on data class imbalance;

and randomly sequencing the infrared target images.

An extracting module 230, configured to extract a multi-level resolution size feature map in the infrared target image through a pre-trained ResNet network, and preset a predetermined number of prior frames with different sizes for pixel points in the multi-level resolution size feature map one by one;

optionally, the extracting, by the pre-trained ResNet network, the multi-resolution size feature map in the infrared target image, and presetting a predetermined number of prior frames with different sizes for pixel points in the multi-resolution size feature map one by one includes:

firstly, generating full-connection features with the dimension of 256-dimension or 512-dimension length on a multi-level resolution feature map by using a 3 x 3 convolution kernel, and then generating 2 full-connection layer branches by using the generated 256-dimension or 512-dimension features;

the regression layer predicts the coordinates and the width and the height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;

the loss function of the RPN network is composed of softmax loss and regression loss according to certain weight. Calculating Softmax loss through a background calibration result and a prediction result corresponding to the candidate frame, wherein the regression loss is calculated as follows;

t_x＝(x-x_a)/w_a,t_y＝(y-y_a)/h_a

t_w＝log(w/w_a),t_h＝log(h/h_a)

the loss function of the RPN network is calculated as follows:

An input module 240, configured to input the prior frame as a candidate frame to a regional nomination network for binary classification and boundary frame modification of a foreground or a background, remove the candidate frame belonging to a background category, and perform ROI Align operation on the obtained candidate frame of the foreground category to obtain an ROI region;

optionally, the inputting the prior frame as a candidate frame into a regional nomination network for binary classification and bounding box modification of a foreground or a background, removing the candidate frame belonging to a background category, and performing roilign operation on the obtained candidate frame of the foreground category to obtain an ROI region includes:

dividing the ROI into a predetermined number of bin regions, selecting 4 sampling points in each bin region, obtaining pixel values of 4 characteristic points nearest to each sampling point, and determining the pixel value of each sampling point through bilinear interpolation;

the average or maximum pooling of each bin region is calculated, generating a feature map of the ROI region.

And the processing module 250 is configured to perform N-type classification, bounding box regression, and binary mask generation on the ROI region, respectively, to obtain a trained infrared target image segmentation model.

Optionally, the performing N-type classification, bounding box regression, and binary mask generation on the ROI region respectively to obtain the trained infrared target image segmentation model includes:

the classification branch structure utilizes two full-connection layers to extract full-connection features, and classification branches generate N-dimensional features to express N category scores

wherein the loss function is:

L＝L_cls+L_box+L_mask

In an embodiment of the present invention, an electronic device for infrared target image segmentation is provided, which includes a memory, a processor, and a computer program stored in the memory and executable by the processor, and the processor implements the steps of S101 to S105 in the embodiment of the present invention when executing the computer program.

In an embodiment of the present invention, there is also provided a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the infrared target image segmentation method provided in the above embodiment, the non-transitory computer readable storage medium including: ROM/RAM, magnetic disk, optical disk, etc.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An infrared target image segmentation method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the acquiring infrared target images in a plurality of scenes according to the characteristic attributes of the infrared target to be detected comprises:

3. The method of claim 1, wherein the pre-processing the infrared target image comprises:

and randomly sequencing the infrared target images.

4. The method of claim 1, wherein the extracting the multi-level resolution size feature map in the infrared target image through a pre-trained ResNet network, and presetting a predetermined number of different-size prior frames for pixel points in the multi-level resolution size feature map comprises:

generating fully-connected features with the dimension of 256-dimensional or 512-dimensional length on the multi-level resolution feature map by using a 3 x 3 convolution kernel, and then generating 2 fully-connected layer branches by using the generated 256-dimensional or 512-dimensional features;

t_x＝(x-x_a)/w_a,t_y＝(y-y_a)/h_a

t_w＝log(w/w_a),t_h＝log(h/h_a)

the loss function of the RPN network is calculated as follows:

5. The method according to claim 1, wherein the inputting the prior frame as a candidate frame into a regional nomination network for binary classification and bounding box modification of foreground or background, removing the candidate frame belonging to a background category, and performing ROI Align operation on the obtained candidate frame of the foreground category to obtain an ROI region comprises:

6. The method of claim 1, wherein the performing N-class classification, bounding box regression, and binary mask generation on the ROI region respectively to obtain a trained infrared target image segmentation model comprises:

the classification branch structure extracts full-connection features by using two full-connection layers, and the classification branch generates N-dimensional features to represent N category scores;

wherein the loss function is:

L＝L_cls+L_box+L_mask

7. An infrared image target detection system, comprising:

the input module is used for inputting the prior frame serving as a candidate frame into a regional nomination network for binary classification and boundary frame modification of a foreground or a background, removing the candidate frame belonging to a background category, and performing ROIAlign operation on the obtained candidate frame of the foreground category to obtain an ROI (region of interest);

8. The system according to claim 7, wherein the collecting the infrared target images in a plurality of scenes according to the characteristic attributes of the infrared target to be detected comprises:

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the infrared target image segmentation method as claimed in any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the infrared target image segmentation method as set forth in any one of claims 1 to 6.