CN111046880B

CN111046880B - Infrared target image segmentation method, system, electronic equipment and storage medium

Info

Publication number: CN111046880B
Application number: CN201911195519.9A
Authority: CN
Inventors: 荆楠; 张智杰; 雷波; 谭海; 孙钢波
Original assignee: 717th Research Institute of CSIC
Current assignee: 717th Research Institute of CSIC
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2023-12-26
Anticipated expiration: 2039-11-28
Also published as: CN111046880A

Abstract

The invention provides an infrared target image segmentation method, an infrared target image segmentation system, electronic equipment and a storage medium, wherein the method comprises the following steps: collecting infrared target images in various scenes as a training data set, preprocessing the infrared target images, labeling target examples, manufacturing pixel-level binary masks, extracting multi-level resolution size feature images of the infrared target images, presetting candidate frames with different sizes one by one pixel point, inputting the candidate frames into a region nominating network binary classification and boundary frame modification, filtering background candidate frames, performing ROI alignment operation on the foreground candidate frames to obtain ROI regions, performing N-type classification, boundary frame regression and binary mask generation on the ROI regions, and finally obtaining a trained infrared target image segmentation model. The method solves the problem that the real-time performance of the existing image processing method is difficult to guarantee in complex scenes, can be suitable for target detection in infrared images in various complex scenes, and meanwhile achieves the real-time performance of infrared image processing and reduces the calculated amount.

Description

Infrared target image segmentation method, system, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing, and in particular, to a method, a system, an electronic device, and a storage medium for segmenting an infrared target image.

Background

In the field of computer vision, target detection is widely applied to aspects such as traffic monitoring, image retrieval and man-machine interaction as a classical research direction. The infrared target detection is used as an important branch in computer image processing, and can be applied to the fields of security monitoring, military investigation, night driving, shipping and the like under the condition that the color and the shape of the target are similar to those of the surrounding environment. The infrared image reflects the relative temperature information of the object, is less influenced by weather factors, has the advantages of long detection distance, high detection reliability and the like compared with the infrared spectrum imaging of equipment such as an illumination camera, night vision and the like, but has the defects of lower resolution, fuzzy details and the like.

At present, in a common visible light target segmentation method, a method based on threshold segmentation is very easy to realize engineering, has small calculated amount, is difficult to process scenes with more complicated backgrounds and more interferences, and is easy to have false detection and omission detection; the complex method, such as convolutional neural network, can obtain better detection results, but has large calculation amount, and is difficult to meet the requirement of real-time processing.

Therefore, it is necessary to propose an infrared target image segmentation method that can process complex scenes and realize real-time processing.

Disclosure of Invention

In view of the above, the embodiment of the invention provides an infrared target image segmentation method, so as to solve the problem that the existing image segmentation method is difficult to adapt to complex application scenes and ensure real-time processing of images.

In a first aspect of an embodiment of the present invention, there is provided an infrared target image segmentation method, including:

according to the characteristic attribute of the infrared target to be detected, acquiring infrared target images in various scenes;

preprocessing the infrared target image, marking a target instance in the infrared target image by a marking tool, and manufacturing a pixel-level binary mask;

extracting a multi-resolution size feature map in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for pixel points in the multi-resolution size feature map;

inputting the prior frame as a candidate frame into a region nomination network for binary classification of foreground or background and boundary frame modification, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;

and respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI region to obtain a trained infrared target image segmentation model.

In a second aspect of an embodiment of the present invention, there is provided an infrared target image segmentation system including:

the acquisition module is used for acquiring infrared target images under various scenes according to the characteristic attribute of the infrared target to be detected;

the marking module is used for preprocessing the infrared target image, marking a target instance in the infrared target image through a marking tool and manufacturing a pixel-level binary mask;

the extraction module is used for extracting a multi-resolution size characteristic diagram in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for each pixel point in the multi-resolution size characteristic diagram;

the input module is used for inputting the prior frame as a candidate frame into a region nomination network for carrying out binary classification and boundary frame modification of the foreground or the background, removing the candidate frame belonging to the background category, and carrying out ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;

and the processing module is used for respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.

In a third aspect of the embodiments of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect of the embodiments of the present invention when the computer program is executed by the processor.

In a fourth aspect of the embodiments of the present invention, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method provided by the first aspect of the embodiments of the present invention.

According to the method, an infrared target image under various scenes is collected according to the characteristic attribute of the infrared target to be detected and used as a training data set, the infrared target image is preprocessed, a labeling tool is adopted to label a target instance and manufacture a pixel-level binary mask, a multi-level resolution characteristic diagram of the infrared target image is extracted through a pre-trained ResNet network, a certain number of priori frames with different sizes are preset for each pixel point in the multi-level resolution characteristic diagram, the priori frames are used as candidate frames and input into a region labeling network to carry out binary classification and boundary frame modification of a foreground or a background, the candidate frames of the background are filtered, alignment operation is carried out on the labeling region to obtain an ROI region, N-type classification, boundary frame regression and binary mask generation are respectively carried out on the ROI region, finally, a segmentation model of the infrared target image is obtained through training so as to carry out target detection segmentation in the infrared target image, the infrared image can adapt to target detection in various complex scenes, the high anti-interference capability is achieved, meanwhile, the detection recognition is simple, the infrared detection recognition is carried out on the region labeling network, the background is carried out, the real-time accuracy of the infrared image can be guaranteed, the real-time problem is solved, the real-time image can be obtained under the complex image detection is calculated, the real-time problem is solved, and the real-time image detection performance is improved, and the image detection performance is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an infrared target image segmentation method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an infrared target image segmentation system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art without making any inventive effort, based on the embodiments of the present invention will be made in the light of the following description of the principles and features of the present invention with reference to the accompanying drawings, the examples being given for the purpose of illustrating the invention only and not for the purpose of limiting the scope of the invention.

The term "comprising" in the description of the invention or in the claims and in the above-mentioned figures and other similar meaning expressions is meant to cover a non-exclusive inclusion, such as a process, method or system, apparatus comprising a series of steps or elements, without limitation to the steps or elements listed.

Referring to fig. 1, a flowchart of an infrared image target detection method according to an embodiment of the present invention includes:

s101, acquiring infrared target images under various scenes according to characteristic attributes of infrared targets to be detected;

the infrared target to be detected is a target object detected through an infrared signal radiated by the detected target, and can be a person, a vehicle, an animal and the like generally. The characteristic properties of the infrared target may be the size of the target, the radiation intensity, etc. The multiple scenes refer to multiple complex scenes, such as region scenes with dense people flow and traffic flow, at least three infrared target images in the complex scenes can be acquired, and the acquired scenes can be increased in order to improve the accuracy of infrared target detection.

Optionally, the infrared thermal imaging instrument shoots infrared images of different targets under different complex scenes, and focusing, zooming parameters and exposure parameters of the infrared thermal imaging instrument are continuously changed to form an infrared target image data set.

The shooting scene, the target, the focal length parameter, the exposure parameter and the like of the infrared image in the infrared target data set can be different, so that the diversity of the data set is ensured.

S102, preprocessing the infrared target image, marking a target instance in the infrared target image by a marking tool, and manufacturing a pixel-level binary mask;

the preprocessing process can specifically comprise methods of image rotation translation, random pruning, color dithering, translation transformation, scale transformation, contrast transformation, noise disturbance and the like, and data enhancement is carried out on the infrared target image. And labeling the target instance in the infrared target image by using a LabelImg labeling tool.

The binary mask refers to a binary image consisting of 0 and 1, and the image processing process is controlled by carrying out local or non-occlusion on the image. The image mask may be defined in particular by specifying data values, data ranges, unlimited or finite values, regions of interest, annotation files, etc.

Specifically, the infrared target image is subjected to data enhancement by the methods of image rotation translation, random pruning, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance; based on data class unbalance, adopting class balancing strategies to amplify data; and randomly sequencing the infrared target images.

In the classification learning algorithm, the classification accuracy is easily affected due to the fact that the proportion of samples of different types is greatly different, the data set can be diversified through data augmentation, and the model generalization capability is improved. The same target picture can appear continuously in the data enhancement process, the model can learn the characteristics of the same target continuously in the training process, the fitting phenomenon appears, and in the embodiment, the random ordering is performed after the data samples are disordered in sequence, so that the effect of improving the performance of the model is achieved in the training process.

S103, extracting a multi-resolution size feature map in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for pixel points in the multi-resolution size feature map;

the ResNet network model is a residual learning network for feature extraction, the ResNet network is pre-trained through an ImageNet classification data set, and a feature map of an infrared target image is extracted by using the trained ResNet network.

Optionally, a two-branch structure comprising two classifications (foreground and background) and a frame modification regression is output based on the convolutional neural network structure. Specifically, firstly, a 3×3 convolution kernel is utilized to generate full connection features with 256-dimensional or 512-dimensional length on a multi-level resolution feature map, and then 2 full connection layer branches are generated by utilizing the generated 256-dimensional or 512-dimensional features; the regression layer predicts the coordinates and width and height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;

the loss function of the RPN (Region Proposal Network, i.e., the region growing network) network is composed of both softmax loss and regression loss with a certain weight. The Softmax loss is calculated by a background calibration result and a prediction result corresponding to the candidate frame, and the regression loss is calculated as follows;

t _x ＝(x-x _a )/w _a ,t _y ＝(y-y _a )/h _a

t _w ＝log(w/w _a ),t _h ＝log(h/h _a )

the loss function of the RPN network is calculated as follows:

wherein x, y is the center point coordinate of the candidate frame predicted by the RPN network, w and h are the width and height of the candidate frame, and the position information of the priori frame is (x) _a ,y _a ,w _a ,h _a ) The position information of the real annotation frame is (x) ^* ,y ^* ,w ^* ,h ^* )。

S104, inputting the prior frame as a candidate frame into a region nomination network for binary classification of foreground or background and modification of a boundary frame, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;

the region nomination network (or region growing network, or RPN) may classify corresponding targets or non-targets in the candidate boxes and modify the frames of the candidate boxes. Wherein the classified candidate boxes can be classified into a foreground candidate box and a background candidate box. The ROI region is a region of interest in image processing.

The ROI alignment operation is to determine the characteristic value of each pixel point in the original image RoI region through bilinear interpolation, and then to carry out operations such as maximum or average pooling to improve the accuracy, so as to solve the deviation alignment problem caused by direct sampling in the pooling process.

Specifically, the ROI area is divided into a predetermined number of bin areas, such as 7×7 bin areas, 4 sampling points are selected in each bin area, the pixel values of 4 feature points nearest to each sampling point are obtained, and the pixel value of each sampling point is determined through bilinear interpolation;

the average or maximum pooling of each bin region is calculated, generating a feature map of the ROI region, i.e. a 7 x 7 size feature map.

S105, respectively carrying out N-type classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.

The candidate boxes are input into the region nomination network in S104 repeatedly, the background candidate boxes are removed, the ROI region is obtained, the ROI region is classified, the frame regressions and the binary mask generation operation are carried out in S105, training of an infrared target image segmentation model can be completed based on infrared target image data, and target detection segmentation can be rapidly and accurately carried out on the infrared image to be detected through the infrared target image segmentation model.

Optionally, the classification branch structure extracts full connection features by using two full connection layers, and the classification branch generates N-dimensional features to represent N-category scores

The regression branch structure utilizes two full connection layers to extract full connection characteristics and generate N _num * The 4-dimensional feature represents the generated bounding box coordinates;

the mask branch structure utilizes five full convolution layers to extract convolution characteristics, and mask branches generate N category masks which are generated by N28X 28D characteristic representation;

wherein, the loss function is:

L＝L _cls +L _box +L _mask

L _cls to classify branch loss, L _box For regression branch loss, L _mask For mask branch loss.

The method provided by the embodiment can solve the problems that the existing infrared image detection method is complex in calculation process under a complex scene, can ensure real-time performance, can be better adapted to complex background, enhances the anti-interference capability on scenes such as the ground, and expands the application range of the method.

It should be understood that the sequence number of each step in the above embodiment does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not be construed as limiting the implementation process of the embodiment of the present invention.

Fig. 2 is a schematic structural diagram of an infrared target image segmentation system according to an embodiment of the present invention, where the system includes:

the acquisition module 210 is configured to acquire infrared target images under multiple scenes according to the characteristic attribute of the infrared target to be detected;

optionally, the collecting the infrared target images under multiple scenes according to the characteristic attribute of the infrared target to be detected includes:

and shooting infrared images of different targets in different complex scenes through the infrared thermal imager, and continuously changing focusing, zooming and exposure parameters of the infrared thermal imager to form an infrared target image dataset.

The labeling module 220 is configured to pre-process the infrared target image, label a target instance in the infrared target image by using a labeling tool, and make a pixel-level binary mask;

optionally, the preprocessing the infrared target image includes:

the infrared target image is subjected to data enhancement by the methods of image rotation translation, random pruning, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance;

based on data class unbalance, adopting class balancing strategies to amplify data;

and randomly sequencing the infrared target images.

The extracting module 230 is configured to extract a multi-resolution size feature map in the infrared target image through a pre-trained res net network, and preset a predetermined number of prior frames with different sizes for each pixel point in the multi-resolution size feature map;

optionally, the extracting, by the pretrained res net network, a multi-resolution size feature map from the infrared target image, and presetting a predetermined number of prior frames with different sizes for each pixel point in the multi-resolution size feature map includes:

firstly, generating full connection features with 256-dimensional or 512-dimensional length on a multi-level resolution feature map by using a 3X 3 convolution kernel, and then generating 2 full connection layer branches by using the generated 256-dimensional or 512-dimensional features;

the regression layer predicts the coordinates and width and height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;

the loss function of the RPN network is composed of both softmax loss and regress loss with a certain weight. The Softmax loss is calculated by a background calibration result and a prediction result corresponding to the candidate frame, and the regression loss is calculated as follows;

t _x ＝(x-x _a )/w _a ,t _y ＝(y-y _a )/h _a

t _w ＝log(w/w _a ),t _h ＝log(h/h _a )

the loss function of the RPN network is calculated as follows:

The input module 240 is configured to input the prior frame as a candidate frame into a region naming network for performing binary classification and bounding box modification of a foreground or a background, remove candidate frames belonging to a background category, and perform ROI alignment operation on the obtained candidate frames of the foreground category to obtain an ROI region;

optionally, inputting the prior frame as a candidate frame into a region naming network to perform binary classification and bounding box modification of a foreground or a background, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region includes:

dividing the ROI area into a predetermined number of bin areas, selecting 4 sampling points in each bin area, acquiring pixel values of 4 characteristic points nearest to each sampling point, and determining the pixel value of each sampling point through bilinear interpolation;

and calculating the average or maximum pooling of each bin region, and generating a feature map of the ROI region.

And the processing module 250 is used for respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.

Optionally, the performing N-category classification, bounding box regression and binary mask generation on the ROI area respectively, to obtain a trained infrared target image segmentation model includes:

the classification branch structure utilizes two full connection layers to extract full connection characteristics, and classification branches generate N-dimensional characteristics to represent N-category scores

wherein, the loss function is:

L＝L _cls +L _box +L _mask

In one embodiment of the present invention, an electronic device for infrared target image segmentation is provided, including a memory, a processor, and a computer program stored in the memory and executable by the processor, the processor implementing steps S101 to S105 as in the embodiments of the present invention when executing the computer program.

There is also provided in one embodiment of the present invention a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the infrared target image segmentation method provided by the above embodiment, the non-transitory computer readable storage medium including, for example: ROM/RAM, magnetic disks, optical disks, etc.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An infrared target image segmentation method, comprising:

generating full connection features with 256-dimensional or 512-dimensional length on the multi-level resolution feature map by using a 3×3 convolution kernel, and generating 2 full connection layer branches by using the generated 256-dimensional or 512-dimensional features;

the loss function of the RPN network consists of a softmax loss and a regression loss according to a certain weight; the Softmax loss is calculated by a background calibration result and a prediction result corresponding to the candidate frame, and the regression loss is calculated as follows;

t _x ＝(x-x _a )/w _a ,t _y ＝(y-y _a )/h _a

t _w ＝log(w/w _a ),t _h ＝log(h/h _a )

the loss function of the RPN network is calculated as follows:

wherein x, y is the center point coordinate of the candidate frame predicted by the RPN network, w and h are the width and height of the candidate frame, and the position information of the priori frame is (x) _a ,y _a ,w _a ,h _a ) The position information of the real annotation frame is (x) ^* ,y ^* ,w ^* ,h ^* )；

2. The method of claim 1, wherein capturing infrared target images in a plurality of scenes according to the infrared target feature attributes to be detected comprises:

3. The method of claim 1, wherein the preprocessing the infrared target image comprises:

and randomly sequencing the infrared target images.

4. The method according to claim 1, wherein inputting the prior frame as a candidate frame into a region nomination network for binary classification of foreground or background and bounding box modification, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region comprises:

5. The method of claim 1, wherein the performing N-class classification, bounding box regression, and binary mask generation on the ROI region, respectively, to obtain a trained infrared target image segmentation model comprises:

the classification branch structure utilizes two full connection layers to extract full connection characteristics, and classification branches generate N-dimensional characteristics to represent N category scores;

the regression branch structure utilizes two full connection layers to extract full connection characteristics and generate N _num The 4-dimensional feature representation generated bounding box coordinates;

the mask branch structure utilizes five full convolution layers to extract convolution characteristics, and mask branches generate N class masks which are generated by N28X 28D characteristic representation;

wherein, the loss function is:

L＝L _cls +L _box +L _mask

L _cls is divided intoClass branch loss, L _box For regression branch loss, L _mask For mask branch loss.

6. An infrared image target detection system, comprising:

t _x ＝(x-x _a )/w _a ,t _y ＝(y-y _a )/h _a

t _w ＝log(w/w _a ),t _h ＝log(h/h _a )

the loss function of the RPN network is calculated as follows:

7. The system of claim 6, wherein capturing infrared target images in a plurality of scenes based on the infrared target feature attributes to be detected comprises:

8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the infrared target image segmentation method according to any one of claims 1 to 5 when the computer program is executed.

9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the infrared target image segmentation method according to any one of claims 1 to 5.