CN111046880B - Infrared target image segmentation method, system, electronic equipment and storage medium - Google Patents
Infrared target image segmentation method, system, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111046880B CN111046880B CN201911195519.9A CN201911195519A CN111046880B CN 111046880 B CN111046880 B CN 111046880B CN 201911195519 A CN201911195519 A CN 201911195519A CN 111046880 B CN111046880 B CN 111046880B
- Authority
- CN
- China
- Prior art keywords
- infrared target
- infrared
- target image
- candidate frame
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000003709 image segmentation Methods 0.000 title claims abstract description 28
- 238000001514 detection method Methods 0.000 claims abstract description 20
- 238000012986 modification Methods 0.000 claims abstract description 12
- 230000004048 modification Effects 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000004519 manufacturing process Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 12
- 230000009466 transformation Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 10
- 238000010586 diagram Methods 0.000 claims description 8
- 238000013519 translation Methods 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 5
- 238000013138 pruning Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 abstract description 9
- 238000012549 training Methods 0.000 abstract description 6
- 238000001914 filtration Methods 0.000 abstract 1
- 238000003672 processing method Methods 0.000 abstract 1
- 230000008569 process Effects 0.000 description 13
- 230000011218 segmentation Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000001931 thermography Methods 0.000 description 2
- 208000006440 Open Bite Diseases 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000002329 infrared spectrum Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000004297 night vision Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an infrared target image segmentation method, an infrared target image segmentation system, electronic equipment and a storage medium, wherein the method comprises the following steps: collecting infrared target images in various scenes as a training data set, preprocessing the infrared target images, labeling target examples, manufacturing pixel-level binary masks, extracting multi-level resolution size feature images of the infrared target images, presetting candidate frames with different sizes one by one pixel point, inputting the candidate frames into a region nominating network binary classification and boundary frame modification, filtering background candidate frames, performing ROI alignment operation on the foreground candidate frames to obtain ROI regions, performing N-type classification, boundary frame regression and binary mask generation on the ROI regions, and finally obtaining a trained infrared target image segmentation model. The method solves the problem that the real-time performance of the existing image processing method is difficult to guarantee in complex scenes, can be suitable for target detection in infrared images in various complex scenes, and meanwhile achieves the real-time performance of infrared image processing and reduces the calculated amount.
Description
Technical Field
The present invention relates to the field of image processing, and in particular, to a method, a system, an electronic device, and a storage medium for segmenting an infrared target image.
Background
In the field of computer vision, target detection is widely applied to aspects such as traffic monitoring, image retrieval and man-machine interaction as a classical research direction. The infrared target detection is used as an important branch in computer image processing, and can be applied to the fields of security monitoring, military investigation, night driving, shipping and the like under the condition that the color and the shape of the target are similar to those of the surrounding environment. The infrared image reflects the relative temperature information of the object, is less influenced by weather factors, has the advantages of long detection distance, high detection reliability and the like compared with the infrared spectrum imaging of equipment such as an illumination camera, night vision and the like, but has the defects of lower resolution, fuzzy details and the like.
At present, in a common visible light target segmentation method, a method based on threshold segmentation is very easy to realize engineering, has small calculated amount, is difficult to process scenes with more complicated backgrounds and more interferences, and is easy to have false detection and omission detection; the complex method, such as convolutional neural network, can obtain better detection results, but has large calculation amount, and is difficult to meet the requirement of real-time processing.
Therefore, it is necessary to propose an infrared target image segmentation method that can process complex scenes and realize real-time processing.
Disclosure of Invention
In view of the above, the embodiment of the invention provides an infrared target image segmentation method, so as to solve the problem that the existing image segmentation method is difficult to adapt to complex application scenes and ensure real-time processing of images.
In a first aspect of an embodiment of the present invention, there is provided an infrared target image segmentation method, including:
according to the characteristic attribute of the infrared target to be detected, acquiring infrared target images in various scenes;
preprocessing the infrared target image, marking a target instance in the infrared target image by a marking tool, and manufacturing a pixel-level binary mask;
extracting a multi-resolution size feature map in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for pixel points in the multi-resolution size feature map;
inputting the prior frame as a candidate frame into a region nomination network for binary classification of foreground or background and boundary frame modification, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;
and respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI region to obtain a trained infrared target image segmentation model.
In a second aspect of an embodiment of the present invention, there is provided an infrared target image segmentation system including:
the acquisition module is used for acquiring infrared target images under various scenes according to the characteristic attribute of the infrared target to be detected;
the marking module is used for preprocessing the infrared target image, marking a target instance in the infrared target image through a marking tool and manufacturing a pixel-level binary mask;
the extraction module is used for extracting a multi-resolution size characteristic diagram in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for each pixel point in the multi-resolution size characteristic diagram;
the input module is used for inputting the prior frame as a candidate frame into a region nomination network for carrying out binary classification and boundary frame modification of the foreground or the background, removing the candidate frame belonging to the background category, and carrying out ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;
and the processing module is used for respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.
In a third aspect of the embodiments of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect of the embodiments of the present invention when the computer program is executed by the processor.
In a fourth aspect of the embodiments of the present invention, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method provided by the first aspect of the embodiments of the present invention.
According to the method, an infrared target image under various scenes is collected according to the characteristic attribute of the infrared target to be detected and used as a training data set, the infrared target image is preprocessed, a labeling tool is adopted to label a target instance and manufacture a pixel-level binary mask, a multi-level resolution characteristic diagram of the infrared target image is extracted through a pre-trained ResNet network, a certain number of priori frames with different sizes are preset for each pixel point in the multi-level resolution characteristic diagram, the priori frames are used as candidate frames and input into a region labeling network to carry out binary classification and boundary frame modification of a foreground or a background, the candidate frames of the background are filtered, alignment operation is carried out on the labeling region to obtain an ROI region, N-type classification, boundary frame regression and binary mask generation are respectively carried out on the ROI region, finally, a segmentation model of the infrared target image is obtained through training so as to carry out target detection segmentation in the infrared target image, the infrared image can adapt to target detection in various complex scenes, the high anti-interference capability is achieved, meanwhile, the detection recognition is simple, the infrared detection recognition is carried out on the region labeling network, the background is carried out, the real-time accuracy of the infrared image can be guaranteed, the real-time problem is solved, the real-time image can be obtained under the complex image detection is calculated, the real-time problem is solved, and the real-time image detection performance is improved, and the image detection performance is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an infrared target image segmentation method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an infrared target image segmentation system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art without making any inventive effort, based on the embodiments of the present invention will be made in the light of the following description of the principles and features of the present invention with reference to the accompanying drawings, the examples being given for the purpose of illustrating the invention only and not for the purpose of limiting the scope of the invention.
The term "comprising" in the description of the invention or in the claims and in the above-mentioned figures and other similar meaning expressions is meant to cover a non-exclusive inclusion, such as a process, method or system, apparatus comprising a series of steps or elements, without limitation to the steps or elements listed.
Referring to fig. 1, a flowchart of an infrared image target detection method according to an embodiment of the present invention includes:
s101, acquiring infrared target images under various scenes according to characteristic attributes of infrared targets to be detected;
the infrared target to be detected is a target object detected through an infrared signal radiated by the detected target, and can be a person, a vehicle, an animal and the like generally. The characteristic properties of the infrared target may be the size of the target, the radiation intensity, etc. The multiple scenes refer to multiple complex scenes, such as region scenes with dense people flow and traffic flow, at least three infrared target images in the complex scenes can be acquired, and the acquired scenes can be increased in order to improve the accuracy of infrared target detection.
Optionally, the infrared thermal imaging instrument shoots infrared images of different targets under different complex scenes, and focusing, zooming parameters and exposure parameters of the infrared thermal imaging instrument are continuously changed to form an infrared target image data set.
The shooting scene, the target, the focal length parameter, the exposure parameter and the like of the infrared image in the infrared target data set can be different, so that the diversity of the data set is ensured.
S102, preprocessing the infrared target image, marking a target instance in the infrared target image by a marking tool, and manufacturing a pixel-level binary mask;
the preprocessing process can specifically comprise methods of image rotation translation, random pruning, color dithering, translation transformation, scale transformation, contrast transformation, noise disturbance and the like, and data enhancement is carried out on the infrared target image. And labeling the target instance in the infrared target image by using a LabelImg labeling tool.
The binary mask refers to a binary image consisting of 0 and 1, and the image processing process is controlled by carrying out local or non-occlusion on the image. The image mask may be defined in particular by specifying data values, data ranges, unlimited or finite values, regions of interest, annotation files, etc.
Specifically, the infrared target image is subjected to data enhancement by the methods of image rotation translation, random pruning, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance; based on data class unbalance, adopting class balancing strategies to amplify data; and randomly sequencing the infrared target images.
In the classification learning algorithm, the classification accuracy is easily affected due to the fact that the proportion of samples of different types is greatly different, the data set can be diversified through data augmentation, and the model generalization capability is improved. The same target picture can appear continuously in the data enhancement process, the model can learn the characteristics of the same target continuously in the training process, the fitting phenomenon appears, and in the embodiment, the random ordering is performed after the data samples are disordered in sequence, so that the effect of improving the performance of the model is achieved in the training process.
S103, extracting a multi-resolution size feature map in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for pixel points in the multi-resolution size feature map;
the ResNet network model is a residual learning network for feature extraction, the ResNet network is pre-trained through an ImageNet classification data set, and a feature map of an infrared target image is extracted by using the trained ResNet network.
Optionally, a two-branch structure comprising two classifications (foreground and background) and a frame modification regression is output based on the convolutional neural network structure. Specifically, firstly, a 3×3 convolution kernel is utilized to generate full connection features with 256-dimensional or 512-dimensional length on a multi-level resolution feature map, and then 2 full connection layer branches are generated by utilizing the generated 256-dimensional or 512-dimensional features; the regression layer predicts the coordinates and width and height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;
the loss function of the RPN (Region Proposal Network, i.e., the region growing network) network is composed of both softmax loss and regression loss with a certain weight. The Softmax loss is calculated by a background calibration result and a prediction result corresponding to the candidate frame, and the regression loss is calculated as follows;
t x =(x-x a )/w a ,t y =(y-y a )/h a
t w =log(w/w a ),t h =log(h/h a )
the loss function of the RPN network is calculated as follows:
wherein x, y is the center point coordinate of the candidate frame predicted by the RPN network, w and h are the width and height of the candidate frame, and the position information of the priori frame is (x) a ,y a ,w a ,h a ) The position information of the real annotation frame is (x) * ,y * ,w * ,h * )。
S104, inputting the prior frame as a candidate frame into a region nomination network for binary classification of foreground or background and modification of a boundary frame, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;
the region nomination network (or region growing network, or RPN) may classify corresponding targets or non-targets in the candidate boxes and modify the frames of the candidate boxes. Wherein the classified candidate boxes can be classified into a foreground candidate box and a background candidate box. The ROI region is a region of interest in image processing.
The ROI alignment operation is to determine the characteristic value of each pixel point in the original image RoI region through bilinear interpolation, and then to carry out operations such as maximum or average pooling to improve the accuracy, so as to solve the deviation alignment problem caused by direct sampling in the pooling process.
Specifically, the ROI area is divided into a predetermined number of bin areas, such as 7×7 bin areas, 4 sampling points are selected in each bin area, the pixel values of 4 feature points nearest to each sampling point are obtained, and the pixel value of each sampling point is determined through bilinear interpolation;
the average or maximum pooling of each bin region is calculated, generating a feature map of the ROI region, i.e. a 7 x 7 size feature map.
S105, respectively carrying out N-type classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.
The candidate boxes are input into the region nomination network in S104 repeatedly, the background candidate boxes are removed, the ROI region is obtained, the ROI region is classified, the frame regressions and the binary mask generation operation are carried out in S105, training of an infrared target image segmentation model can be completed based on infrared target image data, and target detection segmentation can be rapidly and accurately carried out on the infrared image to be detected through the infrared target image segmentation model.
Optionally, the classification branch structure extracts full connection features by using two full connection layers, and the classification branch generates N-dimensional features to represent N-category scores
The regression branch structure utilizes two full connection layers to extract full connection characteristics and generate N num * The 4-dimensional feature represents the generated bounding box coordinates;
the mask branch structure utilizes five full convolution layers to extract convolution characteristics, and mask branches generate N category masks which are generated by N28X 28D characteristic representation;
wherein, the loss function is:
L=L cls +L box +L mask
L cls to classify branch loss, L box For regression branch loss, L mask For mask branch loss.
The method provided by the embodiment can solve the problems that the existing infrared image detection method is complex in calculation process under a complex scene, can ensure real-time performance, can be better adapted to complex background, enhances the anti-interference capability on scenes such as the ground, and expands the application range of the method.
It should be understood that the sequence number of each step in the above embodiment does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not be construed as limiting the implementation process of the embodiment of the present invention.
Fig. 2 is a schematic structural diagram of an infrared target image segmentation system according to an embodiment of the present invention, where the system includes:
the acquisition module 210 is configured to acquire infrared target images under multiple scenes according to the characteristic attribute of the infrared target to be detected;
optionally, the collecting the infrared target images under multiple scenes according to the characteristic attribute of the infrared target to be detected includes:
and shooting infrared images of different targets in different complex scenes through the infrared thermal imager, and continuously changing focusing, zooming and exposure parameters of the infrared thermal imager to form an infrared target image dataset.
The labeling module 220 is configured to pre-process the infrared target image, label a target instance in the infrared target image by using a labeling tool, and make a pixel-level binary mask;
optionally, the preprocessing the infrared target image includes:
the infrared target image is subjected to data enhancement by the methods of image rotation translation, random pruning, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance;
based on data class unbalance, adopting class balancing strategies to amplify data;
and randomly sequencing the infrared target images.
The extracting module 230 is configured to extract a multi-resolution size feature map in the infrared target image through a pre-trained res net network, and preset a predetermined number of prior frames with different sizes for each pixel point in the multi-resolution size feature map;
optionally, the extracting, by the pretrained res net network, a multi-resolution size feature map from the infrared target image, and presetting a predetermined number of prior frames with different sizes for each pixel point in the multi-resolution size feature map includes:
firstly, generating full connection features with 256-dimensional or 512-dimensional length on a multi-level resolution feature map by using a 3X 3 convolution kernel, and then generating 2 full connection layer branches by using the generated 256-dimensional or 512-dimensional features;
the regression layer predicts the coordinates and width and height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;
the loss function of the RPN network is composed of both softmax loss and regress loss with a certain weight. The Softmax loss is calculated by a background calibration result and a prediction result corresponding to the candidate frame, and the regression loss is calculated as follows;
t x =(x-x a )/w a ,t y =(y-y a )/h a
t w =log(w/w a ),t h =log(h/h a )
the loss function of the RPN network is calculated as follows:
wherein x, y is the center point coordinate of the candidate frame predicted by the RPN network, w and h are the width and height of the candidate frame, and the position information of the priori frame is (x) a ,y a ,w a ,h a ) The position information of the real annotation frame is (x) * ,y * ,w * ,h * )。
The input module 240 is configured to input the prior frame as a candidate frame into a region naming network for performing binary classification and bounding box modification of a foreground or a background, remove candidate frames belonging to a background category, and perform ROI alignment operation on the obtained candidate frames of the foreground category to obtain an ROI region;
optionally, inputting the prior frame as a candidate frame into a region naming network to perform binary classification and bounding box modification of a foreground or a background, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region includes:
dividing the ROI area into a predetermined number of bin areas, selecting 4 sampling points in each bin area, acquiring pixel values of 4 characteristic points nearest to each sampling point, and determining the pixel value of each sampling point through bilinear interpolation;
and calculating the average or maximum pooling of each bin region, and generating a feature map of the ROI region.
And the processing module 250 is used for respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.
Optionally, the performing N-category classification, bounding box regression and binary mask generation on the ROI area respectively, to obtain a trained infrared target image segmentation model includes:
the classification branch structure utilizes two full connection layers to extract full connection characteristics, and classification branches generate N-dimensional characteristics to represent N-category scores
The regression branch structure utilizes two full connection layers to extract full connection characteristics and generate N num * The 4-dimensional feature represents the generated bounding box coordinates;
the mask branch structure utilizes five full convolution layers to extract convolution characteristics, and mask branches generate N category masks which are generated by N28X 28D characteristic representation;
wherein, the loss function is:
L=L cls +L box +L mask
L cls to classify branch loss, L box For regression branch loss, L mask For mask branch loss.
In one embodiment of the present invention, an electronic device for infrared target image segmentation is provided, including a memory, a processor, and a computer program stored in the memory and executable by the processor, the processor implementing steps S101 to S105 as in the embodiments of the present invention when executing the computer program.
There is also provided in one embodiment of the present invention a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the infrared target image segmentation method provided by the above embodiment, the non-transitory computer readable storage medium including, for example: ROM/RAM, magnetic disks, optical disks, etc.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (9)
1. An infrared target image segmentation method, comprising:
according to the characteristic attribute of the infrared target to be detected, acquiring infrared target images in various scenes;
preprocessing the infrared target image, marking a target instance in the infrared target image by a marking tool, and manufacturing a pixel-level binary mask;
extracting a multi-resolution size feature map in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for pixel points in the multi-resolution size feature map;
generating full connection features with 256-dimensional or 512-dimensional length on the multi-level resolution feature map by using a 3×3 convolution kernel, and generating 2 full connection layer branches by using the generated 256-dimensional or 512-dimensional features;
the regression layer predicts the coordinates and width and height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;
the loss function of the RPN network consists of a softmax loss and a regression loss according to a certain weight; the Softmax loss is calculated by a background calibration result and a prediction result corresponding to the candidate frame, and the regression loss is calculated as follows;
t x =(x-x a )/w a ,t y =(y-y a )/h a
t w =log(w/w a ),t h =log(h/h a )
the loss function of the RPN network is calculated as follows:
wherein x, y is the center point coordinate of the candidate frame predicted by the RPN network, w and h are the width and height of the candidate frame, and the position information of the priori frame is (x) a ,y a ,w a ,h a ) The position information of the real annotation frame is (x) * ,y * ,w * ,h * );
Inputting the prior frame as a candidate frame into a region nomination network for binary classification of foreground or background and boundary frame modification, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;
and respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI region to obtain a trained infrared target image segmentation model.
2. The method of claim 1, wherein capturing infrared target images in a plurality of scenes according to the infrared target feature attributes to be detected comprises:
and shooting infrared images of different targets in different complex scenes through the infrared thermal imager, and continuously changing focusing, zooming and exposure parameters of the infrared thermal imager to form an infrared target image dataset.
3. The method of claim 1, wherein the preprocessing the infrared target image comprises:
the infrared target image is subjected to data enhancement by the methods of image rotation translation, random pruning, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance;
based on data class unbalance, adopting class balancing strategies to amplify data;
and randomly sequencing the infrared target images.
4. The method according to claim 1, wherein inputting the prior frame as a candidate frame into a region nomination network for binary classification of foreground or background and bounding box modification, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region comprises:
dividing the ROI area into a predetermined number of bin areas, selecting 4 sampling points in each bin area, acquiring pixel values of 4 characteristic points nearest to each sampling point, and determining the pixel value of each sampling point through bilinear interpolation;
and calculating the average or maximum pooling of each bin region, and generating a feature map of the ROI region.
5. The method of claim 1, wherein the performing N-class classification, bounding box regression, and binary mask generation on the ROI region, respectively, to obtain a trained infrared target image segmentation model comprises:
the classification branch structure utilizes two full connection layers to extract full connection characteristics, and classification branches generate N-dimensional characteristics to represent N category scores;
the regression branch structure utilizes two full connection layers to extract full connection characteristics and generate N num The 4-dimensional feature representation generated bounding box coordinates;
the mask branch structure utilizes five full convolution layers to extract convolution characteristics, and mask branches generate N class masks which are generated by N28X 28D characteristic representation;
wherein, the loss function is:
L=L cls +L box +L mask
L cls is divided intoClass branch loss, L box For regression branch loss, L mask For mask branch loss.
6. An infrared image target detection system, comprising:
the acquisition module is used for acquiring infrared target images under various scenes according to the characteristic attribute of the infrared target to be detected;
the marking module is used for preprocessing the infrared target image, marking a target instance in the infrared target image through a marking tool and manufacturing a pixel-level binary mask;
the extraction module is used for extracting a multi-resolution size characteristic diagram in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for each pixel point in the multi-resolution size characteristic diagram;
generating full connection features with 256-dimensional or 512-dimensional length on the multi-level resolution feature map by using a 3×3 convolution kernel, and generating 2 full connection layer branches by using the generated 256-dimensional or 512-dimensional features;
the regression layer predicts the coordinates and width and height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;
the loss function of the RPN network consists of a softmax loss and a regression loss according to a certain weight; the Softmax loss is calculated by a background calibration result and a prediction result corresponding to the candidate frame, and the regression loss is calculated as follows;
t x =(x-x a )/w a ,t y =(y-y a )/h a
t w =log(w/w a ),t h =log(h/h a )
the loss function of the RPN network is calculated as follows:
wherein x, y is the center point coordinate of the candidate frame predicted by the RPN network, w and h are the width and height of the candidate frame, and the position information of the priori frame is (x) a ,y a ,w a ,h a ) The position information of the real annotation frame is (x) * ,y * ,w * ,h * );
The input module is used for inputting the prior frame as a candidate frame into a region nomination network for carrying out binary classification and boundary frame modification of the foreground or the background, removing the candidate frame belonging to the background category, and carrying out ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;
and the processing module is used for respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.
7. The system of claim 6, wherein capturing infrared target images in a plurality of scenes based on the infrared target feature attributes to be detected comprises:
and shooting infrared images of different targets in different complex scenes through the infrared thermal imager, and continuously changing focusing, zooming and exposure parameters of the infrared thermal imager to form an infrared target image dataset.
8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the infrared target image segmentation method according to any one of claims 1 to 5 when the computer program is executed.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the infrared target image segmentation method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911195519.9A CN111046880B (en) | 2019-11-28 | 2019-11-28 | Infrared target image segmentation method, system, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911195519.9A CN111046880B (en) | 2019-11-28 | 2019-11-28 | Infrared target image segmentation method, system, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111046880A CN111046880A (en) | 2020-04-21 |
CN111046880B true CN111046880B (en) | 2023-12-26 |
Family
ID=70234017
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911195519.9A Active CN111046880B (en) | 2019-11-28 | 2019-11-28 | Infrared target image segmentation method, system, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111046880B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111597920B (en) * | 2020-04-27 | 2022-11-15 | 东南大学 | Full convolution single-stage human body example segmentation method in natural scene |
CN111598951B (en) * | 2020-05-18 | 2022-09-30 | 清华大学 | Method, device and storage medium for identifying space target |
CN111627029B (en) * | 2020-05-28 | 2023-06-16 | 北京字节跳动网络技术有限公司 | Image instance segmentation result acquisition method and device |
CN111627033B (en) * | 2020-05-30 | 2023-10-20 | 郑州大学 | Method, equipment and computer readable storage medium for dividing difficult sample instance |
CN111652930B (en) * | 2020-06-04 | 2024-02-27 | 上海媒智科技有限公司 | Image target detection method, system and equipment |
CN112150471B (en) * | 2020-09-23 | 2023-09-05 | 创新奇智(上海)科技有限公司 | Semantic segmentation method and device based on few samples, electronic equipment and storage medium |
CN112200115B (en) * | 2020-10-21 | 2024-04-19 | 平安国际智慧城市科技股份有限公司 | Face recognition training method, recognition method, device, equipment and storage medium |
CN112307991A (en) * | 2020-11-04 | 2021-02-02 | 北京临近空间飞行器系统工程研究所 | Image recognition method, device and storage medium |
CN112614136B (en) * | 2020-12-31 | 2024-05-14 | 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) | Infrared small target real-time instance segmentation method and device |
CN113177947B (en) * | 2021-04-06 | 2024-04-26 | 广东省科学院智能制造研究所 | Multi-module convolutional neural network-based complex environment target segmentation method and device |
CN112907616B (en) * | 2021-04-27 | 2022-05-03 | 浙江大学 | Pedestrian detection method based on thermal imaging background filtering |
CN114034390B (en) * | 2021-11-08 | 2023-11-03 | 山东大学 | Equipment temperature anomaly detection system based on infrared detection |
CN114332566A (en) * | 2021-12-28 | 2022-04-12 | 中国航天空气动力技术研究院 | Target detection method, system and device for underwater image |
CN114782460B (en) * | 2022-06-21 | 2022-10-18 | 阿里巴巴达摩院(杭州)科技有限公司 | Image segmentation model generation method, image segmentation method and computer equipment |
CN116486259B (en) * | 2023-04-04 | 2024-06-04 | 自然资源部国土卫星遥感应用中心 | Method and device for extracting point target in remote sensing image |
CN117132777B (en) * | 2023-10-26 | 2024-03-22 | 腾讯科技(深圳)有限公司 | Image segmentation method, device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480730A (en) * | 2017-09-05 | 2017-12-15 | 广州供电局有限公司 | Power equipment identification model construction method and system, the recognition methods of power equipment |
CN108629354A (en) * | 2017-03-17 | 2018-10-09 | 杭州海康威视数字技术股份有限公司 | Object detection method and device |
CN109584248A (en) * | 2018-11-20 | 2019-04-05 | 西安电子科技大学 | Infrared surface object instance dividing method based on Fusion Features and dense connection network |
CN109711295A (en) * | 2018-12-14 | 2019-05-03 | 北京航空航天大学 | A kind of remote sensing image offshore Ship Detection |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9965719B2 (en) * | 2015-11-04 | 2018-05-08 | Nec Corporation | Subcategory-aware convolutional neural networks for object detection |
-
2019
- 2019-11-28 CN CN201911195519.9A patent/CN111046880B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629354A (en) * | 2017-03-17 | 2018-10-09 | 杭州海康威视数字技术股份有限公司 | Object detection method and device |
CN107480730A (en) * | 2017-09-05 | 2017-12-15 | 广州供电局有限公司 | Power equipment identification model construction method and system, the recognition methods of power equipment |
CN109584248A (en) * | 2018-11-20 | 2019-04-05 | 西安电子科技大学 | Infrared surface object instance dividing method based on Fusion Features and dense connection network |
CN109711295A (en) * | 2018-12-14 | 2019-05-03 | 北京航空航天大学 | A kind of remote sensing image offshore Ship Detection |
Also Published As
Publication number | Publication date |
---|---|
CN111046880A (en) | 2020-04-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111046880B (en) | Infrared target image segmentation method, system, electronic equipment and storage medium | |
Ma et al. | GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion | |
CN108764372B (en) | Construction method and device, mobile terminal, the readable storage medium storing program for executing of data set | |
CN108304873B (en) | Target detection method and system based on high-resolution optical satellite remote sensing image | |
Xie et al. | Multilevel cloud detection in remote sensing images based on deep learning | |
Wang et al. | Data-driven based tiny-YOLOv3 method for front vehicle detection inducing SPP-net | |
CN113065558A (en) | Lightweight small target detection method combined with attention mechanism | |
CN110929593B (en) | Real-time significance pedestrian detection method based on detail discrimination | |
CN112446270A (en) | Training method of pedestrian re-identification network, and pedestrian re-identification method and device | |
CN107247930A (en) | SAR image object detection method based on CNN and Selective Attention Mechanism | |
CN112598713A (en) | Offshore submarine fish detection and tracking statistical method based on deep learning | |
CN109714526B (en) | Intelligent camera and control system | |
CN109977899B (en) | Training, reasoning and new variety adding method and system for article identification | |
Zhao et al. | An adaptation of CNN for small target detection in the infrared | |
CN111695373A (en) | Zebra crossing positioning method, system, medium and device | |
CN111832508B (en) | DIE _ GA-based low-illumination target detection method | |
CN116612272A (en) | Intelligent digital detection system for image processing and detection method thereof | |
Wang et al. | Deep learning-based human activity analysis for aerial images | |
CN115953312A (en) | Joint defogging detection method and device based on single image and storage medium | |
CN106934344B (en) | quick pedestrian detection method based on neural network | |
Xu et al. | Infrared image semantic segmentation based on improved deeplab and residual network | |
CN115984712A (en) | Multi-scale feature-based remote sensing image small target detection method and system | |
Santhaseelan et al. | Automated whale blow detection in infrared video | |
Yu et al. | Haze removal algorithm using color attenuation prior and guided filter | |
Palanivel et al. | Object Detection and Recognition In Dark Using YOLO |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |