CN111046880B - Infrared target image segmentation method, system, electronic equipment and storage medium - Google Patents

Infrared target image segmentation method, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN111046880B
CN111046880B CN201911195519.9A CN201911195519A CN111046880B CN 111046880 B CN111046880 B CN 111046880B CN 201911195519 A CN201911195519 A CN 201911195519A CN 111046880 B CN111046880 B CN 111046880B
Authority
CN
China
Prior art keywords
infrared target
infrared
target image
candidate frame
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911195519.9A
Other languages
Chinese (zh)
Other versions
CN111046880A (en
Inventor
荆楠
张智杰
雷波
谭海
孙钢波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
717th Research Institute of CSIC
Original Assignee
717th Research Institute of CSIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 717th Research Institute of CSIC filed Critical 717th Research Institute of CSIC
Priority to CN201911195519.9A priority Critical patent/CN111046880B/en
Publication of CN111046880A publication Critical patent/CN111046880A/en
Application granted granted Critical
Publication of CN111046880B publication Critical patent/CN111046880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an infrared target image segmentation method, an infrared target image segmentation system, electronic equipment and a storage medium, wherein the method comprises the following steps: collecting infrared target images in various scenes as a training data set, preprocessing the infrared target images, labeling target examples, manufacturing pixel-level binary masks, extracting multi-level resolution size feature images of the infrared target images, presetting candidate frames with different sizes one by one pixel point, inputting the candidate frames into a region nominating network binary classification and boundary frame modification, filtering background candidate frames, performing ROI alignment operation on the foreground candidate frames to obtain ROI regions, performing N-type classification, boundary frame regression and binary mask generation on the ROI regions, and finally obtaining a trained infrared target image segmentation model. The method solves the problem that the real-time performance of the existing image processing method is difficult to guarantee in complex scenes, can be suitable for target detection in infrared images in various complex scenes, and meanwhile achieves the real-time performance of infrared image processing and reduces the calculated amount.

Description

Infrared target image segmentation method, system, electronic equipment and storage medium
Technical Field
The present invention relates to the field of image processing, and in particular, to a method, a system, an electronic device, and a storage medium for segmenting an infrared target image.
Background
In the field of computer vision, target detection is widely applied to aspects such as traffic monitoring, image retrieval and man-machine interaction as a classical research direction. The infrared target detection is used as an important branch in computer image processing, and can be applied to the fields of security monitoring, military investigation, night driving, shipping and the like under the condition that the color and the shape of the target are similar to those of the surrounding environment. The infrared image reflects the relative temperature information of the object, is less influenced by weather factors, has the advantages of long detection distance, high detection reliability and the like compared with the infrared spectrum imaging of equipment such as an illumination camera, night vision and the like, but has the defects of lower resolution, fuzzy details and the like.
At present, in a common visible light target segmentation method, a method based on threshold segmentation is very easy to realize engineering, has small calculated amount, is difficult to process scenes with more complicated backgrounds and more interferences, and is easy to have false detection and omission detection; the complex method, such as convolutional neural network, can obtain better detection results, but has large calculation amount, and is difficult to meet the requirement of real-time processing.
Therefore, it is necessary to propose an infrared target image segmentation method that can process complex scenes and realize real-time processing.
Disclosure of Invention
In view of the above, the embodiment of the invention provides an infrared target image segmentation method, so as to solve the problem that the existing image segmentation method is difficult to adapt to complex application scenes and ensure real-time processing of images.
In a first aspect of an embodiment of the present invention, there is provided an infrared target image segmentation method, including:
according to the characteristic attribute of the infrared target to be detected, acquiring infrared target images in various scenes;
preprocessing the infrared target image, marking a target instance in the infrared target image by a marking tool, and manufacturing a pixel-level binary mask;
extracting a multi-resolution size feature map in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for pixel points in the multi-resolution size feature map;
inputting the prior frame as a candidate frame into a region nomination network for binary classification of foreground or background and boundary frame modification, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;
and respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI region to obtain a trained infrared target image segmentation model.
In a second aspect of an embodiment of the present invention, there is provided an infrared target image segmentation system including:
the acquisition module is used for acquiring infrared target images under various scenes according to the characteristic attribute of the infrared target to be detected;
the marking module is used for preprocessing the infrared target image, marking a target instance in the infrared target image through a marking tool and manufacturing a pixel-level binary mask;
the extraction module is used for extracting a multi-resolution size characteristic diagram in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for each pixel point in the multi-resolution size characteristic diagram;
the input module is used for inputting the prior frame as a candidate frame into a region nomination network for carrying out binary classification and boundary frame modification of the foreground or the background, removing the candidate frame belonging to the background category, and carrying out ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;
and the processing module is used for respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.
In a third aspect of the embodiments of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect of the embodiments of the present invention when the computer program is executed by the processor.
In a fourth aspect of the embodiments of the present invention, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method provided by the first aspect of the embodiments of the present invention.
According to the method, an infrared target image under various scenes is collected according to the characteristic attribute of the infrared target to be detected and used as a training data set, the infrared target image is preprocessed, a labeling tool is adopted to label a target instance and manufacture a pixel-level binary mask, a multi-level resolution characteristic diagram of the infrared target image is extracted through a pre-trained ResNet network, a certain number of priori frames with different sizes are preset for each pixel point in the multi-level resolution characteristic diagram, the priori frames are used as candidate frames and input into a region labeling network to carry out binary classification and boundary frame modification of a foreground or a background, the candidate frames of the background are filtered, alignment operation is carried out on the labeling region to obtain an ROI region, N-type classification, boundary frame regression and binary mask generation are respectively carried out on the ROI region, finally, a segmentation model of the infrared target image is obtained through training so as to carry out target detection segmentation in the infrared target image, the infrared image can adapt to target detection in various complex scenes, the high anti-interference capability is achieved, meanwhile, the detection recognition is simple, the infrared detection recognition is carried out on the region labeling network, the background is carried out, the real-time accuracy of the infrared image can be guaranteed, the real-time problem is solved, the real-time image can be obtained under the complex image detection is calculated, the real-time problem is solved, and the real-time image detection performance is improved, and the image detection performance is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an infrared target image segmentation method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an infrared target image segmentation system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art without making any inventive effort, based on the embodiments of the present invention will be made in the light of the following description of the principles and features of the present invention with reference to the accompanying drawings, the examples being given for the purpose of illustrating the invention only and not for the purpose of limiting the scope of the invention.
The term "comprising" in the description of the invention or in the claims and in the above-mentioned figures and other similar meaning expressions is meant to cover a non-exclusive inclusion, such as a process, method or system, apparatus comprising a series of steps or elements, without limitation to the steps or elements listed.
Referring to fig. 1, a flowchart of an infrared image target detection method according to an embodiment of the present invention includes:
s101, acquiring infrared target images under various scenes according to characteristic attributes of infrared targets to be detected;
the infrared target to be detected is a target object detected through an infrared signal radiated by the detected target, and can be a person, a vehicle, an animal and the like generally. The characteristic properties of the infrared target may be the size of the target, the radiation intensity, etc. The multiple scenes refer to multiple complex scenes, such as region scenes with dense people flow and traffic flow, at least three infrared target images in the complex scenes can be acquired, and the acquired scenes can be increased in order to improve the accuracy of infrared target detection.
Optionally, the infrared thermal imaging instrument shoots infrared images of different targets under different complex scenes, and focusing, zooming parameters and exposure parameters of the infrared thermal imaging instrument are continuously changed to form an infrared target image data set.
The shooting scene, the target, the focal length parameter, the exposure parameter and the like of the infrared image in the infrared target data set can be different, so that the diversity of the data set is ensured.
S102, preprocessing the infrared target image, marking a target instance in the infrared target image by a marking tool, and manufacturing a pixel-level binary mask;
the preprocessing process can specifically comprise methods of image rotation translation, random pruning, color dithering, translation transformation, scale transformation, contrast transformation, noise disturbance and the like, and data enhancement is carried out on the infrared target image. And labeling the target instance in the infrared target image by using a LabelImg labeling tool.
The binary mask refers to a binary image consisting of 0 and 1, and the image processing process is controlled by carrying out local or non-occlusion on the image. The image mask may be defined in particular by specifying data values, data ranges, unlimited or finite values, regions of interest, annotation files, etc.
Specifically, the infrared target image is subjected to data enhancement by the methods of image rotation translation, random pruning, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance; based on data class unbalance, adopting class balancing strategies to amplify data; and randomly sequencing the infrared target images.
In the classification learning algorithm, the classification accuracy is easily affected due to the fact that the proportion of samples of different types is greatly different, the data set can be diversified through data augmentation, and the model generalization capability is improved. The same target picture can appear continuously in the data enhancement process, the model can learn the characteristics of the same target continuously in the training process, the fitting phenomenon appears, and in the embodiment, the random ordering is performed after the data samples are disordered in sequence, so that the effect of improving the performance of the model is achieved in the training process.
S103, extracting a multi-resolution size feature map in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for pixel points in the multi-resolution size feature map;
the ResNet network model is a residual learning network for feature extraction, the ResNet network is pre-trained through an ImageNet classification data set, and a feature map of an infrared target image is extracted by using the trained ResNet network.
Optionally, a two-branch structure comprising two classifications (foreground and background) and a frame modification regression is output based on the convolutional neural network structure. Specifically, firstly, a 3×3 convolution kernel is utilized to generate full connection features with 256-dimensional or 512-dimensional length on a multi-level resolution feature map, and then 2 full connection layer branches are generated by utilizing the generated 256-dimensional or 512-dimensional features; the regression layer predicts the coordinates and width and height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;
the loss function of the RPN (Region Proposal Network, i.e., the region growing network) network is composed of both softmax loss and regression loss with a certain weight. The Softmax loss is calculated by a background calibration result and a prediction result corresponding to the candidate frame, and the regression loss is calculated as follows;
t x =(x-x a )/w a ,t y =(y-y a )/h a
t w =log(w/w a ),t h =log(h/h a )
the loss function of the RPN network is calculated as follows:
wherein x, y is the center point coordinate of the candidate frame predicted by the RPN network, w and h are the width and height of the candidate frame, and the position information of the priori frame is (x) a ,y a ,w a ,h a ) The position information of the real annotation frame is (x) * ,y * ,w * ,h * )。
S104, inputting the prior frame as a candidate frame into a region nomination network for binary classification of foreground or background and modification of a boundary frame, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;
the region nomination network (or region growing network, or RPN) may classify corresponding targets or non-targets in the candidate boxes and modify the frames of the candidate boxes. Wherein the classified candidate boxes can be classified into a foreground candidate box and a background candidate box. The ROI region is a region of interest in image processing.
The ROI alignment operation is to determine the characteristic value of each pixel point in the original image RoI region through bilinear interpolation, and then to carry out operations such as maximum or average pooling to improve the accuracy, so as to solve the deviation alignment problem caused by direct sampling in the pooling process.
Specifically, the ROI area is divided into a predetermined number of bin areas, such as 7×7 bin areas, 4 sampling points are selected in each bin area, the pixel values of 4 feature points nearest to each sampling point are obtained, and the pixel value of each sampling point is determined through bilinear interpolation;
the average or maximum pooling of each bin region is calculated, generating a feature map of the ROI region, i.e. a 7 x 7 size feature map.
S105, respectively carrying out N-type classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.
The candidate boxes are input into the region nomination network in S104 repeatedly, the background candidate boxes are removed, the ROI region is obtained, the ROI region is classified, the frame regressions and the binary mask generation operation are carried out in S105, training of an infrared target image segmentation model can be completed based on infrared target image data, and target detection segmentation can be rapidly and accurately carried out on the infrared image to be detected through the infrared target image segmentation model.
Optionally, the classification branch structure extracts full connection features by using two full connection layers, and the classification branch generates N-dimensional features to represent N-category scores
The regression branch structure utilizes two full connection layers to extract full connection characteristics and generate N num * The 4-dimensional feature represents the generated bounding box coordinates;
the mask branch structure utilizes five full convolution layers to extract convolution characteristics, and mask branches generate N category masks which are generated by N28X 28D characteristic representation;
wherein, the loss function is:
L=L cls +L box +L mask
L cls to classify branch loss, L box For regression branch loss, L mask For mask branch loss.
The method provided by the embodiment can solve the problems that the existing infrared image detection method is complex in calculation process under a complex scene, can ensure real-time performance, can be better adapted to complex background, enhances the anti-interference capability on scenes such as the ground, and expands the application range of the method.
It should be understood that the sequence number of each step in the above embodiment does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not be construed as limiting the implementation process of the embodiment of the present invention.
Fig. 2 is a schematic structural diagram of an infrared target image segmentation system according to an embodiment of the present invention, where the system includes:
the acquisition module 210 is configured to acquire infrared target images under multiple scenes according to the characteristic attribute of the infrared target to be detected;
optionally, the collecting the infrared target images under multiple scenes according to the characteristic attribute of the infrared target to be detected includes:
and shooting infrared images of different targets in different complex scenes through the infrared thermal imager, and continuously changing focusing, zooming and exposure parameters of the infrared thermal imager to form an infrared target image dataset.
The labeling module 220 is configured to pre-process the infrared target image, label a target instance in the infrared target image by using a labeling tool, and make a pixel-level binary mask;
optionally, the preprocessing the infrared target image includes:
the infrared target image is subjected to data enhancement by the methods of image rotation translation, random pruning, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance;
based on data class unbalance, adopting class balancing strategies to amplify data;
and randomly sequencing the infrared target images.
The extracting module 230 is configured to extract a multi-resolution size feature map in the infrared target image through a pre-trained res net network, and preset a predetermined number of prior frames with different sizes for each pixel point in the multi-resolution size feature map;
optionally, the extracting, by the pretrained res net network, a multi-resolution size feature map from the infrared target image, and presetting a predetermined number of prior frames with different sizes for each pixel point in the multi-resolution size feature map includes:
firstly, generating full connection features with 256-dimensional or 512-dimensional length on a multi-level resolution feature map by using a 3X 3 convolution kernel, and then generating 2 full connection layer branches by using the generated 256-dimensional or 512-dimensional features;
the regression layer predicts the coordinates and width and height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;
the loss function of the RPN network is composed of both softmax loss and regress loss with a certain weight. The Softmax loss is calculated by a background calibration result and a prediction result corresponding to the candidate frame, and the regression loss is calculated as follows;
t x =(x-x a )/w a ,t y =(y-y a )/h a
t w =log(w/w a ),t h =log(h/h a )
the loss function of the RPN network is calculated as follows:
wherein x, y is the center point coordinate of the candidate frame predicted by the RPN network, w and h are the width and height of the candidate frame, and the position information of the priori frame is (x) a ,y a ,w a ,h a ) The position information of the real annotation frame is (x) * ,y * ,w * ,h * )。
The input module 240 is configured to input the prior frame as a candidate frame into a region naming network for performing binary classification and bounding box modification of a foreground or a background, remove candidate frames belonging to a background category, and perform ROI alignment operation on the obtained candidate frames of the foreground category to obtain an ROI region;
optionally, inputting the prior frame as a candidate frame into a region naming network to perform binary classification and bounding box modification of a foreground or a background, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region includes:
dividing the ROI area into a predetermined number of bin areas, selecting 4 sampling points in each bin area, acquiring pixel values of 4 characteristic points nearest to each sampling point, and determining the pixel value of each sampling point through bilinear interpolation;
and calculating the average or maximum pooling of each bin region, and generating a feature map of the ROI region.
And the processing module 250 is used for respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.
Optionally, the performing N-category classification, bounding box regression and binary mask generation on the ROI area respectively, to obtain a trained infrared target image segmentation model includes:
the classification branch structure utilizes two full connection layers to extract full connection characteristics, and classification branches generate N-dimensional characteristics to represent N-category scores
The regression branch structure utilizes two full connection layers to extract full connection characteristics and generate N num * The 4-dimensional feature represents the generated bounding box coordinates;
the mask branch structure utilizes five full convolution layers to extract convolution characteristics, and mask branches generate N category masks which are generated by N28X 28D characteristic representation;
wherein, the loss function is:
L=L cls +L box +L mask
L cls to classify branch loss, L box For regression branch loss, L mask For mask branch loss.
In one embodiment of the present invention, an electronic device for infrared target image segmentation is provided, including a memory, a processor, and a computer program stored in the memory and executable by the processor, the processor implementing steps S101 to S105 as in the embodiments of the present invention when executing the computer program.
There is also provided in one embodiment of the present invention a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the infrared target image segmentation method provided by the above embodiment, the non-transitory computer readable storage medium including, for example: ROM/RAM, magnetic disks, optical disks, etc.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. An infrared target image segmentation method, comprising:
according to the characteristic attribute of the infrared target to be detected, acquiring infrared target images in various scenes;
preprocessing the infrared target image, marking a target instance in the infrared target image by a marking tool, and manufacturing a pixel-level binary mask;
extracting a multi-resolution size feature map in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for pixel points in the multi-resolution size feature map;
generating full connection features with 256-dimensional or 512-dimensional length on the multi-level resolution feature map by using a 3×3 convolution kernel, and generating 2 full connection layer branches by using the generated 256-dimensional or 512-dimensional features;
the regression layer predicts the coordinates and width and height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;
the loss function of the RPN network consists of a softmax loss and a regression loss according to a certain weight; the Softmax loss is calculated by a background calibration result and a prediction result corresponding to the candidate frame, and the regression loss is calculated as follows;
t x =(x-x a )/w a ,t y =(y-y a )/h a
t w =log(w/w a ),t h =log(h/h a )
the loss function of the RPN network is calculated as follows:
wherein x, y is the center point coordinate of the candidate frame predicted by the RPN network, w and h are the width and height of the candidate frame, and the position information of the priori frame is (x) a ,y a ,w a ,h a ) The position information of the real annotation frame is (x) * ,y * ,w * ,h * );
Inputting the prior frame as a candidate frame into a region nomination network for binary classification of foreground or background and boundary frame modification, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;
and respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI region to obtain a trained infrared target image segmentation model.
2. The method of claim 1, wherein capturing infrared target images in a plurality of scenes according to the infrared target feature attributes to be detected comprises:
and shooting infrared images of different targets in different complex scenes through the infrared thermal imager, and continuously changing focusing, zooming and exposure parameters of the infrared thermal imager to form an infrared target image dataset.
3. The method of claim 1, wherein the preprocessing the infrared target image comprises:
the infrared target image is subjected to data enhancement by the methods of image rotation translation, random pruning, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance;
based on data class unbalance, adopting class balancing strategies to amplify data;
and randomly sequencing the infrared target images.
4. The method according to claim 1, wherein inputting the prior frame as a candidate frame into a region nomination network for binary classification of foreground or background and bounding box modification, removing the candidate frame belonging to the background category, and performing ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region comprises:
dividing the ROI area into a predetermined number of bin areas, selecting 4 sampling points in each bin area, acquiring pixel values of 4 characteristic points nearest to each sampling point, and determining the pixel value of each sampling point through bilinear interpolation;
and calculating the average or maximum pooling of each bin region, and generating a feature map of the ROI region.
5. The method of claim 1, wherein the performing N-class classification, bounding box regression, and binary mask generation on the ROI region, respectively, to obtain a trained infrared target image segmentation model comprises:
the classification branch structure utilizes two full connection layers to extract full connection characteristics, and classification branches generate N-dimensional characteristics to represent N category scores;
the regression branch structure utilizes two full connection layers to extract full connection characteristics and generate N num The 4-dimensional feature representation generated bounding box coordinates;
the mask branch structure utilizes five full convolution layers to extract convolution characteristics, and mask branches generate N class masks which are generated by N28X 28D characteristic representation;
wherein, the loss function is:
L=L cls +L box +L mask
L cls is divided intoClass branch loss, L box For regression branch loss, L mask For mask branch loss.
6. An infrared image target detection system, comprising:
the acquisition module is used for acquiring infrared target images under various scenes according to the characteristic attribute of the infrared target to be detected;
the marking module is used for preprocessing the infrared target image, marking a target instance in the infrared target image through a marking tool and manufacturing a pixel-level binary mask;
the extraction module is used for extracting a multi-resolution size characteristic diagram in the infrared target image through a pre-trained ResNet network, and presetting a preset number of prior frames with different sizes for each pixel point in the multi-resolution size characteristic diagram;
generating full connection features with 256-dimensional or 512-dimensional length on the multi-level resolution feature map by using a 3×3 convolution kernel, and generating 2 full connection layer branches by using the generated 256-dimensional or 512-dimensional features;
the regression layer predicts the coordinates and width and height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;
the loss function of the RPN network consists of a softmax loss and a regression loss according to a certain weight; the Softmax loss is calculated by a background calibration result and a prediction result corresponding to the candidate frame, and the regression loss is calculated as follows;
t x =(x-x a )/w a ,t y =(y-y a )/h a
t w =log(w/w a ),t h =log(h/h a )
the loss function of the RPN network is calculated as follows:
wherein x, y is the center point coordinate of the candidate frame predicted by the RPN network, w and h are the width and height of the candidate frame, and the position information of the priori frame is (x) a ,y a ,w a ,h a ) The position information of the real annotation frame is (x) * ,y * ,w * ,h * );
The input module is used for inputting the prior frame as a candidate frame into a region nomination network for carrying out binary classification and boundary frame modification of the foreground or the background, removing the candidate frame belonging to the background category, and carrying out ROI alignment operation on the obtained candidate frame of the foreground category to obtain an ROI region;
and the processing module is used for respectively carrying out N-category classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.
7. The system of claim 6, wherein capturing infrared target images in a plurality of scenes based on the infrared target feature attributes to be detected comprises:
and shooting infrared images of different targets in different complex scenes through the infrared thermal imager, and continuously changing focusing, zooming and exposure parameters of the infrared thermal imager to form an infrared target image dataset.
8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the infrared target image segmentation method according to any one of claims 1 to 5 when the computer program is executed.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the infrared target image segmentation method according to any one of claims 1 to 5.
CN201911195519.9A 2019-11-28 2019-11-28 Infrared target image segmentation method, system, electronic equipment and storage medium Active CN111046880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911195519.9A CN111046880B (en) 2019-11-28 2019-11-28 Infrared target image segmentation method, system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911195519.9A CN111046880B (en) 2019-11-28 2019-11-28 Infrared target image segmentation method, system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111046880A CN111046880A (en) 2020-04-21
CN111046880B true CN111046880B (en) 2023-12-26

Family

ID=70234017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911195519.9A Active CN111046880B (en) 2019-11-28 2019-11-28 Infrared target image segmentation method, system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111046880B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597920B (en) * 2020-04-27 2022-11-15 东南大学 Full convolution single-stage human body example segmentation method in natural scene
CN111598951B (en) * 2020-05-18 2022-09-30 清华大学 Method, device and storage medium for identifying space target
CN111627029B (en) * 2020-05-28 2023-06-16 北京字节跳动网络技术有限公司 Image instance segmentation result acquisition method and device
CN111627033B (en) * 2020-05-30 2023-10-20 郑州大学 Method, equipment and computer readable storage medium for dividing difficult sample instance
CN111652930B (en) * 2020-06-04 2024-02-27 上海媒智科技有限公司 Image target detection method, system and equipment
CN112150471B (en) * 2020-09-23 2023-09-05 创新奇智(上海)科技有限公司 Semantic segmentation method and device based on few samples, electronic equipment and storage medium
CN112200115B (en) * 2020-10-21 2024-04-19 平安国际智慧城市科技股份有限公司 Face recognition training method, recognition method, device, equipment and storage medium
CN112307991A (en) * 2020-11-04 2021-02-02 北京临近空间飞行器系统工程研究所 Image recognition method, device and storage medium
CN112614136B (en) * 2020-12-31 2024-05-14 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Infrared small target real-time instance segmentation method and device
CN113177947B (en) * 2021-04-06 2024-04-26 广东省科学院智能制造研究所 Multi-module convolutional neural network-based complex environment target segmentation method and device
CN112907616B (en) * 2021-04-27 2022-05-03 浙江大学 Pedestrian detection method based on thermal imaging background filtering
CN114034390B (en) * 2021-11-08 2023-11-03 山东大学 Equipment temperature anomaly detection system based on infrared detection
CN114332566A (en) * 2021-12-28 2022-04-12 中国航天空气动力技术研究院 Target detection method, system and device for underwater image
CN114782460B (en) * 2022-06-21 2022-10-18 阿里巴巴达摩院(杭州)科技有限公司 Image segmentation model generation method, image segmentation method and computer equipment
CN116486259B (en) * 2023-04-04 2024-06-04 自然资源部国土卫星遥感应用中心 Method and device for extracting point target in remote sensing image
CN117132777B (en) * 2023-10-26 2024-03-22 腾讯科技(深圳)有限公司 Image segmentation method, device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480730A (en) * 2017-09-05 2017-12-15 广州供电局有限公司 Power equipment identification model construction method and system, the recognition methods of power equipment
CN108629354A (en) * 2017-03-17 2018-10-09 杭州海康威视数字技术股份有限公司 Object detection method and device
CN109584248A (en) * 2018-11-20 2019-04-05 西安电子科技大学 Infrared surface object instance dividing method based on Fusion Features and dense connection network
CN109711295A (en) * 2018-12-14 2019-05-03 北京航空航天大学 A kind of remote sensing image offshore Ship Detection

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9965719B2 (en) * 2015-11-04 2018-05-08 Nec Corporation Subcategory-aware convolutional neural networks for object detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629354A (en) * 2017-03-17 2018-10-09 杭州海康威视数字技术股份有限公司 Object detection method and device
CN107480730A (en) * 2017-09-05 2017-12-15 广州供电局有限公司 Power equipment identification model construction method and system, the recognition methods of power equipment
CN109584248A (en) * 2018-11-20 2019-04-05 西安电子科技大学 Infrared surface object instance dividing method based on Fusion Features and dense connection network
CN109711295A (en) * 2018-12-14 2019-05-03 北京航空航天大学 A kind of remote sensing image offshore Ship Detection

Also Published As

Publication number Publication date
CN111046880A (en) 2020-04-21

Similar Documents

Publication Publication Date Title
CN111046880B (en) Infrared target image segmentation method, system, electronic equipment and storage medium
Ma et al. GANMcC: A generative adversarial network with multiclassification constraints for infrared and visible image fusion
CN108764372B (en) Construction method and device, mobile terminal, the readable storage medium storing program for executing of data set
CN108304873B (en) Target detection method and system based on high-resolution optical satellite remote sensing image
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
Wang et al. Data-driven based tiny-YOLOv3 method for front vehicle detection inducing SPP-net
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
CN112446270A (en) Training method of pedestrian re-identification network, and pedestrian re-identification method and device
CN107247930A (en) SAR image object detection method based on CNN and Selective Attention Mechanism
CN112598713A (en) Offshore submarine fish detection and tracking statistical method based on deep learning
CN109714526B (en) Intelligent camera and control system
CN109977899B (en) Training, reasoning and new variety adding method and system for article identification
Zhao et al. An adaptation of CNN for small target detection in the infrared
CN111695373A (en) Zebra crossing positioning method, system, medium and device
CN111832508B (en) DIE _ GA-based low-illumination target detection method
CN116612272A (en) Intelligent digital detection system for image processing and detection method thereof
Wang et al. Deep learning-based human activity analysis for aerial images
CN115953312A (en) Joint defogging detection method and device based on single image and storage medium
CN106934344B (en) quick pedestrian detection method based on neural network
Xu et al. Infrared image semantic segmentation based on improved deeplab and residual network
CN115984712A (en) Multi-scale feature-based remote sensing image small target detection method and system
Santhaseelan et al. Automated whale blow detection in infrared video
Yu et al. Haze removal algorithm using color attenuation prior and guided filter
Palanivel et al. Object Detection and Recognition In Dark Using YOLO

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant