CN111046880A - Infrared target image segmentation method and system, electronic device and storage medium - Google Patents

Infrared target image segmentation method and system, electronic device and storage medium Download PDF

Info

Publication number
CN111046880A
CN111046880A CN201911195519.9A CN201911195519A CN111046880A CN 111046880 A CN111046880 A CN 111046880A CN 201911195519 A CN201911195519 A CN 201911195519A CN 111046880 A CN111046880 A CN 111046880A
Authority
CN
China
Prior art keywords
infrared target
infrared
target image
candidate frame
roi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911195519.9A
Other languages
Chinese (zh)
Other versions
CN111046880B (en
Inventor
荆楠
张智杰
雷波
谭海
孙钢波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
717th Research Institute of CSIC
Original Assignee
717th Research Institute of CSIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 717th Research Institute of CSIC filed Critical 717th Research Institute of CSIC
Priority to CN201911195519.9A priority Critical patent/CN111046880B/en
Publication of CN111046880A publication Critical patent/CN111046880A/en
Application granted granted Critical
Publication of CN111046880B publication Critical patent/CN111046880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention provides an infrared target image segmentation method, a system, electronic equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of collecting infrared target images under various scenes as a training data set, preprocessing the infrared target images, marking target examples, making a pixel level binary mask, extracting a multi-level resolution size characteristic diagram of the infrared target images, presetting candidate frames with different sizes one by pixel points, inputting the candidate frames into a region nomination network binary classification and boundary frame modification, filtering the candidate frames of a background, carrying out ROI Align operation on a foreground candidate frame to obtain an ROI region, carrying out N class classification, boundary frame regression and binary mask generation on the ROI region, and finally obtaining a trained infrared target image segmentation model. The method solves the problem that the real-time performance of the existing image processing method is difficult to guarantee in complex scenes, can be suitable for target detection in infrared images in various complex scenes, and meanwhile realizes the real-time performance of infrared image processing and reduces the calculated amount.

Description

Infrared target image segmentation method and system, electronic device and storage medium
Technical Field
The present invention relates to the field of image processing, and in particular, to an infrared target image segmentation method, system, electronic device, and storage medium.
Background
In the field of computer vision, target detection is taken as a classical research direction and is widely applied to aspects of traffic monitoring, image retrieval, human-computer interaction and the like. The infrared target detection is used as an important branch in computer image processing, is suitable for the condition that the color and the form of a target are similar to the surrounding environment, and can be applied to the fields of security monitoring, military investigation, night driving, shipping and the like. The infrared image reflects the relative temperature information of an object, is less influenced by weather factors, has the advantages of long detection distance, high detection reliability and the like compared with infrared spectrum imaging of equipment such as an illumination camera, night vision and the like, but has the defects of low resolution, fuzzy details and the like.
At present, in common visible light target segmentation methods, a threshold segmentation-based method is easy to implement in engineering and small in calculation amount, but is difficult to process scenes with complex backgrounds and more interference, and is easy to have false detection and missed detection; complicated methods such as convolutional neural networks can obtain better detection results, but the calculation amount is large, and the requirement of real-time processing is difficult to meet.
Therefore, it is necessary to provide an infrared target image segmentation method capable of processing complex scenes and realizing real-time processing.
Disclosure of Invention
In view of this, the embodiment of the present invention provides an infrared target image segmentation method to solve the problem that the existing image segmentation method is difficult to adapt to a complex application scene and ensure real-time image processing.
In a first aspect of the embodiments of the present invention, a method for segmenting an infrared target image is provided, including:
acquiring infrared target images under various scenes according to the characteristic attributes of the infrared target to be detected;
preprocessing the infrared target image, marking a target instance in the infrared target image by a marking tool, and manufacturing a pixel-level binary mask;
extracting a multi-level resolution size characteristic diagram in the infrared target image through a pre-trained ResNet network, and presetting a predetermined number of prior frames with different sizes for pixel points in the multi-level resolution size characteristic diagram one by one;
inputting the prior frame serving as a candidate frame into a regional nomination network for binary classification and boundary frame modification of a foreground or a background, removing the candidate frame belonging to a background class, and performing ROI Align operation on the obtained candidate frame of the foreground class to obtain an ROI region;
and respectively carrying out N category classification, bounding box regression and binary mask generation on the ROI to obtain a trained infrared target image segmentation model.
In a second aspect of the embodiments of the present invention, there is provided an infrared target image segmentation system, including:
the acquisition module is used for acquiring infrared target images under various scenes according to the characteristic attributes of the infrared target to be detected;
the marking module is used for preprocessing the infrared target image, marking a target instance in the infrared target image through a marking tool and manufacturing a pixel-level binary mask;
the extraction module is used for extracting a multi-level resolution ratio size characteristic diagram in the infrared target image through a pre-trained ResNet network and presetting a predetermined number of prior frames with different sizes for pixel points one by one in the multi-level resolution ratio size characteristic diagram;
the input module is used for inputting the prior frame serving as a candidate frame into a regional nomination network for binary classification and boundary frame modification of a foreground or a background, removing the candidate frame belonging to a background class, and performing ROI Align operation on the obtained candidate frame of the foreground class to obtain an ROI region;
and the processing module is used for respectively carrying out N-type classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.
In a third aspect of the embodiments of the present invention, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable by the processor, where the processor executes the computer program to implement the steps of the method according to the first aspect of the embodiments of the present invention.
In a fourth aspect of the embodiments of the present invention, a computer-readable storage medium is provided, which stores a computer program, which when executed by a processor implements the steps of the method provided by the first aspect of the embodiments of the present invention.
In the embodiment of the invention, infrared target images under various scenes are collected according to the characteristic attributes of the infrared target to be detected and used as training data sets, the infrared target images are preprocessed, a labeling tool is adopted to label a target instance and make a pixel level binary mask, a multi-level resolution size characteristic map of the infrared target images is extracted through a pre-trained ResNet network, a certain number of prior frames with different sizes are preset for pixel points in the multi-level resolution size characteristic map one by one, the prior frames are used as candidate frames and input to a region nomination network to carry out foreground or background binary classification and boundary frame modification, the candidate frames of a background are filtered, the nomination regions are aligned to obtain ROI regions, the ROI regions are respectively subjected to N class classification, boundary frame regression and binary mask generation, and finally an infrared target image segmentation model is trained to carry out target detection segmentation in the infrared images, the method can adapt to target detection in the infrared image under various complex scenes, has strong anti-interference capability, can ensure the accuracy of detection, has simple and quick detection and identification processes, solves the problems that the calculation process of the existing infrared image target detection method is complex under the complex scenes and the real-time performance can be guaranteed, can reduce the calculated amount, improves the infrared image processing and analyzing capability, realizes the real-time detection of the target under the complex scenes, provides pixel-level target acquisition and segmentation, and has high practical value.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of an infrared target image segmentation method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an infrared target image segmentation system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons skilled in the art without any inventive work shall fall within the protection scope of the present invention, and the principle and features of the present invention shall be described below with reference to the accompanying drawings.
The terms "comprises" and "comprising," when used in this specification and claims, and in the accompanying drawings and figures, are intended to cover non-exclusive inclusions, such that a process, method or system, or apparatus that comprises a list of steps or elements is not limited to the listed steps or elements.
Referring to fig. 1, a schematic flow chart of an infrared image target detection method according to an embodiment of the present invention includes:
s101, acquiring infrared target images under various scenes according to the characteristic attributes of the infrared target to be detected;
the infrared target to be detected is a target object detected by infrared signals of detected target radiation, and can be a person, a vehicle, an animal and the like generally. The characteristic property of the infrared target may be the size of the target, the radiation intensity, etc. The multiple scenes refer to multiple complex scenes, such as regional scenes with dense people flow and traffic flow, infrared target images under at least three complex scenes can be acquired, and in order to improve the accuracy of infrared target detection, the acquired scenes can be increased.
Optionally, the infrared thermal imager is used for shooting infrared images of different targets in different complex scenes, and focusing, zooming and exposure parameters of the infrared thermal imager are continuously changed to form an infrared target image data set.
The shooting scene, the target, the focal length parameter, the exposure parameter and the like of the infrared image in the infrared target data set can be different, and the diversity of the data set is ensured.
S102, preprocessing the infrared target image, marking a target instance in the infrared target image through a marking tool, and manufacturing a pixel-level binary mask;
the preprocessing process specifically includes methods such as image rotation and translation, random trimming, color dithering, translation transformation, scale transformation, contrast transformation, noise disturbance and the like, and data enhancement is performed on the infrared target image. And labeling the target instance in the infrared target image by using a LabelImg labeling tool.
The binary mask refers to a binary image composed of 0 and 1, and controls the image processing process by locally or non-shielding the image. Image masks may be defined by specifying data values, data ranges, infinite or finite values, regions of interest, annotation files, and the like.
Specifically, data enhancement is carried out on the infrared target image by methods of image rotation translation, random trimming, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance; performing data augmentation by adopting a class balance strategy based on data class imbalance; and randomly sequencing the infrared target images.
In the classification learning algorithm, the classification accuracy is easily influenced by the great difference of the proportions of different classes of samples, the data set can be diversified through data augmentation, and the generalization capability of the model is improved. The same target picture can appear in succession among the data enhancement process, can lead to the model to learn the characteristic of same target in succession in the training process, appears overfitting phenomenon, carries out random sequencing after disordering data sample order in this embodiment to reach the effect that promotes model performance in the training process.
S103, extracting a multi-level resolution size characteristic diagram in the infrared target image through a pre-trained ResNet network, and presetting a predetermined number of prior frames with different sizes for pixel points one by one in the multi-level resolution size characteristic diagram;
the ResNet network model is a residual learning network for feature extraction, the ResNet network is pre-trained through ImageNet classification data sets, and feature maps of infrared target images are extracted by using the trained ResNet network.
Optionally, the convolutional neural network based structure outputs a two-branch structure comprising two classes (foreground and background) and bounding box modified regression. Specifically, firstly, a 3 × 3 convolution kernel is used to generate fully connected features with 256-dimensional or 512-dimensional dimensions on a multi-level resolution feature map, and then the generated 256-dimensional or 512-dimensional features are used to generate 2 fully connected layer branches; the regression layer predicts the coordinates and the width and the height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;
the loss function of an RPN (Region pro-social Network) Network is composed of certain weights of both softmax loss and regression loss. Calculating Softmax loss through a background calibration result and a prediction result corresponding to the candidate frame, wherein the regression loss is calculated as follows;
tx=(x-xa)/wa,ty=(y-ya)/ha
tw=log(w/wa),th=log(h/ha)
Figure BDA0002294565530000061
Figure BDA0002294565530000062
the loss function of the RPN network is calculated as follows:
Figure BDA0002294565530000063
wherein, x and y are coordinates of the center point of the candidate frame predicted by the RPN network, w and h are width and height of the candidate frame, and the position information of the prior frame is (x)a,ya,wa,ha) The position information of the real label frame is (x)*,y*,w*,h*)。
S104, inputting the prior frame serving as a candidate frame into a regional nomination network for binary classification and boundary frame modification of a foreground or a background, removing the candidate frame belonging to a background class, and performing ROI Align operation on the obtained candidate frame of the foreground class to obtain an ROI region;
the region nomination network (or region growing network, i.e. RPN) can classify the corresponding target or non-target in the candidate box and perform frame modification on the candidate box. The classified candidate frames can be divided into foreground candidate frames and background candidate frames. The ROI region is a region of interest in image processing.
The ROI Align operation is to determine the characteristic value of each pixel point in the original image RoI area through bilinear interpolation, then perform operations such as maximum or average pooling and the like to improve the precision, and solve the problem of deviation alignment caused by direct sampling in the pooling process.
Specifically, the ROI region is divided into a predetermined number of bin regions, such as 7 × 7 bin regions, 4 sampling points are selected in each bin region, pixel values of 4 feature points closest to each sampling point are obtained, and the pixel value of each sampling point is determined by bilinear interpolation;
the average or maximum pooling for each bin region is calculated to generate a feature map of the ROI region, i.e., a 7 × 7 size feature map.
And S105, respectively carrying out N-type classification, bounding box regression and binary mask generation on the ROI to obtain a trained infrared target image segmentation model.
And repeating the steps of inputting the candidate frame into the region nomination network in the step S104, removing the candidate frame belonging to the background, obtaining the ROI region, and performing classification, frame regression and binary mask generation operations on the ROI region in the step S105, so that training of an infrared target image segmentation model can be completed based on infrared target image data, and target detection segmentation can be rapidly and accurately performed on the infrared image to be detected through the infrared target image segmentation model.
Optionally, the classification branch structure extracts full-connection features by using two full-connection layers, and the classification branch generates N-dimensional features to represent N category scores
The regression branch structure extracts full-connection features by using two full-connection layers and generates Nnum4-dimensional features represent generated bounding box coordinates;
extracting convolution characteristics by using five full convolution layers in the mask branch structure, and generating N28X 28 dimensional characteristics by using the mask branch to represent the generated N types of masks;
wherein the loss function is:
L=Lcls+Lbox+Lmask
Lclsto classify the branch loss, LboxTo regress branch loss, LmaskIs the mask branch loss.
The method provided by the embodiment can solve the problems that the existing infrared image detection method is complex in calculation process under a complex scene and can guarantee real-time performance, can better adapt to a complex background, enhances the anti-interference capability on scenes such as the ground and the like, and expands the application range of the method.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 2 is a schematic structural diagram of an infrared target image segmentation system according to an embodiment of the present invention, where the system includes:
the acquisition module 210 is configured to acquire infrared target images in multiple scenes according to the characteristic attributes of the infrared target to be detected;
optionally, the acquiring infrared target images under multiple scenes according to the characteristic attributes of the infrared target to be detected includes:
the infrared thermal imager is used for shooting infrared images of different targets in different complex scenes, and the focusing, zooming and exposure parameters of the infrared thermal imager are continuously changed to form an infrared target image data set.
The labeling module 220 is configured to pre-process the infrared target image, label a target instance in the infrared target image by using a labeling tool, and make a pixel-level binary mask;
optionally, the preprocessing the infrared target image includes:
respectively carrying out data enhancement on the infrared target image by methods of image rotation translation, random trimming, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance;
performing data augmentation by adopting a class balance strategy based on data class imbalance;
and randomly sequencing the infrared target images.
An extracting module 230, configured to extract a multi-level resolution size feature map in the infrared target image through a pre-trained ResNet network, and preset a predetermined number of prior frames with different sizes for pixel points in the multi-level resolution size feature map one by one;
optionally, the extracting, by the pre-trained ResNet network, the multi-resolution size feature map in the infrared target image, and presetting a predetermined number of prior frames with different sizes for pixel points in the multi-resolution size feature map one by one includes:
firstly, generating full-connection features with the dimension of 256-dimension or 512-dimension length on a multi-level resolution feature map by using a 3 x 3 convolution kernel, and then generating 2 full-connection layer branches by using the generated 256-dimension or 512-dimension features;
the regression layer predicts the coordinates and the width and the height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;
the loss function of the RPN network is composed of softmax loss and regression loss according to certain weight. Calculating Softmax loss through a background calibration result and a prediction result corresponding to the candidate frame, wherein the regression loss is calculated as follows;
tx=(x-xa)/wa,ty=(y-ya)/ha
tw=log(w/wa),th=log(h/ha)
Figure BDA0002294565530000091
Figure BDA0002294565530000092
the loss function of the RPN network is calculated as follows:
Figure BDA0002294565530000093
wherein, x and y are coordinates of the center point of the candidate frame predicted by the RPN network, w and h are width and height of the candidate frame, and the position information of the prior frame is (x)a,ya,wa,ha) The position information of the real label frame is (x)*,y*,w*,h*)。
An input module 240, configured to input the prior frame as a candidate frame to a regional nomination network for binary classification and boundary frame modification of a foreground or a background, remove the candidate frame belonging to a background category, and perform ROI Align operation on the obtained candidate frame of the foreground category to obtain an ROI region;
optionally, the inputting the prior frame as a candidate frame into a regional nomination network for binary classification and bounding box modification of a foreground or a background, removing the candidate frame belonging to a background category, and performing roilign operation on the obtained candidate frame of the foreground category to obtain an ROI region includes:
dividing the ROI into a predetermined number of bin regions, selecting 4 sampling points in each bin region, obtaining pixel values of 4 characteristic points nearest to each sampling point, and determining the pixel value of each sampling point through bilinear interpolation;
the average or maximum pooling of each bin region is calculated, generating a feature map of the ROI region.
And the processing module 250 is configured to perform N-type classification, bounding box regression, and binary mask generation on the ROI region, respectively, to obtain a trained infrared target image segmentation model.
Optionally, the performing N-type classification, bounding box regression, and binary mask generation on the ROI region respectively to obtain the trained infrared target image segmentation model includes:
the classification branch structure utilizes two full-connection layers to extract full-connection features, and classification branches generate N-dimensional features to express N category scores
The regression branch structure extracts full-connection features by using two full-connection layers and generates Nnum4-dimensional features represent generated bounding box coordinates;
extracting convolution characteristics by using five full convolution layers in the mask branch structure, and generating N28X 28 dimensional characteristics by using the mask branch to represent the generated N types of masks;
wherein the loss function is:
L=Lcls+Lbox+Lmask
Lclsto classify the branch loss, LboxTo regress branch loss, LmaskIs the mask branch loss.
In an embodiment of the present invention, an electronic device for infrared target image segmentation is provided, which includes a memory, a processor, and a computer program stored in the memory and executable by the processor, and the processor implements the steps of S101 to S105 in the embodiment of the present invention when executing the computer program.
In an embodiment of the present invention, there is also provided a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the infrared target image segmentation method provided in the above embodiment, the non-transitory computer readable storage medium including: ROM/RAM, magnetic disk, optical disk, etc.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An infrared target image segmentation method is characterized by comprising the following steps:
acquiring infrared target images under various scenes according to the characteristic attributes of the infrared target to be detected;
preprocessing the infrared target image, marking a target instance in the infrared target image by a marking tool, and manufacturing a pixel-level binary mask;
extracting a multi-level resolution size characteristic diagram in the infrared target image through a pre-trained ResNet network, and presetting a predetermined number of prior frames with different sizes for pixel points in the multi-level resolution size characteristic diagram one by one;
inputting the prior frame serving as a candidate frame into a regional nomination network for binary classification and boundary frame modification of a foreground or a background, removing the candidate frame belonging to a background class, and performing ROI Align operation on the obtained candidate frame of the foreground class to obtain an ROI region;
and respectively carrying out N category classification, bounding box regression and binary mask generation on the ROI to obtain a trained infrared target image segmentation model.
2. The method according to claim 1, wherein the acquiring infrared target images in a plurality of scenes according to the characteristic attributes of the infrared target to be detected comprises:
the infrared thermal imager is used for shooting infrared images of different targets in different complex scenes, and the focusing, zooming and exposure parameters of the infrared thermal imager are continuously changed to form an infrared target image data set.
3. The method of claim 1, wherein the pre-processing the infrared target image comprises:
respectively carrying out data enhancement on the infrared target image by methods of image rotation translation, random trimming, color dithering, translation transformation, scale transformation, contrast transformation and noise disturbance;
performing data augmentation by adopting a class balance strategy based on data class imbalance;
and randomly sequencing the infrared target images.
4. The method of claim 1, wherein the extracting the multi-level resolution size feature map in the infrared target image through a pre-trained ResNet network, and presetting a predetermined number of different-size prior frames for pixel points in the multi-level resolution size feature map comprises:
generating fully-connected features with the dimension of 256-dimensional or 512-dimensional length on the multi-level resolution feature map by using a 3 x 3 convolution kernel, and then generating 2 fully-connected layer branches by using the generated 256-dimensional or 512-dimensional features;
the regression layer predicts the coordinates and the width and the height of the center point of the candidate frame, and the classification layer judges whether the candidate frame belongs to the foreground or the background;
the loss function of the RPN network is composed of softmax loss and regression loss according to certain weight. Calculating Softmax loss through a background calibration result and a prediction result corresponding to the candidate frame, wherein the regression loss is calculated as follows;
tx=(x-xa)/wa,ty=(y-ya)/ha
tw=log(w/wa),th=log(h/ha)
Figure FDA0002294565520000021
Figure FDA0002294565520000022
the loss function of the RPN network is calculated as follows:
Figure FDA0002294565520000023
wherein, x and y are coordinates of the center point of the candidate frame predicted by the RPN network, w and h are width and height of the candidate frame, and the position information of the prior frame is (x)a,ya,wa,ha) The position information of the real label frame is (x)*,y*,w*,h*)。
5. The method according to claim 1, wherein the inputting the prior frame as a candidate frame into a regional nomination network for binary classification and bounding box modification of foreground or background, removing the candidate frame belonging to a background category, and performing ROI Align operation on the obtained candidate frame of the foreground category to obtain an ROI region comprises:
dividing the ROI into a predetermined number of bin regions, selecting 4 sampling points in each bin region, obtaining pixel values of 4 characteristic points nearest to each sampling point, and determining the pixel value of each sampling point through bilinear interpolation;
the average or maximum pooling of each bin region is calculated, generating a feature map of the ROI region.
6. The method of claim 1, wherein the performing N-class classification, bounding box regression, and binary mask generation on the ROI region respectively to obtain a trained infrared target image segmentation model comprises:
the classification branch structure extracts full-connection features by using two full-connection layers, and the classification branch generates N-dimensional features to represent N category scores;
the regression branch structure extracts full-connection features by using two full-connection layers and generates Nnum4-dimensional features represent generated bounding box coordinates;
extracting convolution characteristics by using five full convolution layers in the mask branch structure, and generating N28X 28 dimensional characteristics by using the mask branch to represent the generated N types of masks;
wherein the loss function is:
L=Lcls+Lbox+Lmask
Lclsto classify the branch loss, LboxTo regress branch loss, LmaskIs the mask branch loss.
7. An infrared image target detection system, comprising:
the acquisition module is used for acquiring infrared target images under various scenes according to the characteristic attributes of the infrared target to be detected;
the marking module is used for preprocessing the infrared target image, marking a target instance in the infrared target image through a marking tool and manufacturing a pixel-level binary mask;
the extraction module is used for extracting a multi-level resolution ratio size characteristic diagram in the infrared target image through a pre-trained ResNet network and presetting a predetermined number of prior frames with different sizes for pixel points one by one in the multi-level resolution ratio size characteristic diagram;
the input module is used for inputting the prior frame serving as a candidate frame into a regional nomination network for binary classification and boundary frame modification of a foreground or a background, removing the candidate frame belonging to a background category, and performing ROIAlign operation on the obtained candidate frame of the foreground category to obtain an ROI (region of interest);
and the processing module is used for respectively carrying out N-type classification, bounding box regression and binary mask generation on the ROI area to obtain a trained infrared target image segmentation model.
8. The system according to claim 7, wherein the collecting the infrared target images in a plurality of scenes according to the characteristic attributes of the infrared target to be detected comprises:
the infrared thermal imager is used for shooting infrared images of different targets in different complex scenes, and the focusing, zooming and exposure parameters of the infrared thermal imager are continuously changed to form an infrared target image data set.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the infrared target image segmentation method as claimed in any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the infrared target image segmentation method as set forth in any one of claims 1 to 6.
CN201911195519.9A 2019-11-28 2019-11-28 Infrared target image segmentation method, system, electronic equipment and storage medium Active CN111046880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911195519.9A CN111046880B (en) 2019-11-28 2019-11-28 Infrared target image segmentation method, system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911195519.9A CN111046880B (en) 2019-11-28 2019-11-28 Infrared target image segmentation method, system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111046880A true CN111046880A (en) 2020-04-21
CN111046880B CN111046880B (en) 2023-12-26

Family

ID=70234017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911195519.9A Active CN111046880B (en) 2019-11-28 2019-11-28 Infrared target image segmentation method, system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111046880B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597920A (en) * 2020-04-27 2020-08-28 东南大学 Full convolution single-stage human body example segmentation method in natural scene
CN111598951A (en) * 2020-05-18 2020-08-28 清华大学 Method, device and storage medium for identifying space target
CN111627029A (en) * 2020-05-28 2020-09-04 北京字节跳动网络技术有限公司 Method and device for acquiring image instance segmentation result
CN111627033A (en) * 2020-05-30 2020-09-04 郑州大学 Hard sample instance segmentation method and device and computer readable storage medium
CN111652930A (en) * 2020-06-04 2020-09-11 上海媒智科技有限公司 Image target detection method, system and equipment
CN112150471A (en) * 2020-09-23 2020-12-29 创新奇智(上海)科技有限公司 Semantic segmentation method and device based on few samples, electronic equipment and storage medium
CN112200115A (en) * 2020-10-21 2021-01-08 平安国际智慧城市科技股份有限公司 Face recognition training method, recognition method, device, equipment and storage medium
CN112307991A (en) * 2020-11-04 2021-02-02 北京临近空间飞行器系统工程研究所 Image recognition method, device and storage medium
CN112614136A (en) * 2020-12-31 2021-04-06 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Infrared small target real-time instance segmentation method and device
CN112907616A (en) * 2021-04-27 2021-06-04 浙江大学 Pedestrian detection method based on thermal imaging background filtering
CN113177947A (en) * 2021-04-06 2021-07-27 广东省科学院智能制造研究所 Complex environment target segmentation method and device based on multi-module convolutional neural network
CN114034390A (en) * 2021-11-08 2022-02-11 山东大学 Equipment temperature anomaly detection system based on infrared detection
CN114782460A (en) * 2022-06-21 2022-07-22 阿里巴巴达摩院(杭州)科技有限公司 Image segmentation model generation method, image segmentation method and computer equipment
CN117132777A (en) * 2023-10-26 2023-11-28 腾讯科技(深圳)有限公司 Image segmentation method, device, electronic equipment and storage medium
CN112200115B (en) * 2020-10-21 2024-04-19 平安国际智慧城市科技股份有限公司 Face recognition training method, recognition method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124415A1 (en) * 2015-11-04 2017-05-04 Nec Laboratories America, Inc. Subcategory-aware convolutional neural networks for object detection
CN107480730A (en) * 2017-09-05 2017-12-15 广州供电局有限公司 Power equipment identification model construction method and system, the recognition methods of power equipment
CN108629354A (en) * 2017-03-17 2018-10-09 杭州海康威视数字技术股份有限公司 Object detection method and device
CN109584248A (en) * 2018-11-20 2019-04-05 西安电子科技大学 Infrared surface object instance dividing method based on Fusion Features and dense connection network
CN109711295A (en) * 2018-12-14 2019-05-03 北京航空航天大学 A kind of remote sensing image offshore Ship Detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124415A1 (en) * 2015-11-04 2017-05-04 Nec Laboratories America, Inc. Subcategory-aware convolutional neural networks for object detection
CN108629354A (en) * 2017-03-17 2018-10-09 杭州海康威视数字技术股份有限公司 Object detection method and device
CN107480730A (en) * 2017-09-05 2017-12-15 广州供电局有限公司 Power equipment identification model construction method and system, the recognition methods of power equipment
CN109584248A (en) * 2018-11-20 2019-04-05 西安电子科技大学 Infrared surface object instance dividing method based on Fusion Features and dense connection network
CN109711295A (en) * 2018-12-14 2019-05-03 北京航空航天大学 A kind of remote sensing image offshore Ship Detection

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597920A (en) * 2020-04-27 2020-08-28 东南大学 Full convolution single-stage human body example segmentation method in natural scene
CN111598951B (en) * 2020-05-18 2022-09-30 清华大学 Method, device and storage medium for identifying space target
CN111598951A (en) * 2020-05-18 2020-08-28 清华大学 Method, device and storage medium for identifying space target
CN111627029A (en) * 2020-05-28 2020-09-04 北京字节跳动网络技术有限公司 Method and device for acquiring image instance segmentation result
CN111627033A (en) * 2020-05-30 2020-09-04 郑州大学 Hard sample instance segmentation method and device and computer readable storage medium
CN111627033B (en) * 2020-05-30 2023-10-20 郑州大学 Method, equipment and computer readable storage medium for dividing difficult sample instance
CN111652930A (en) * 2020-06-04 2020-09-11 上海媒智科技有限公司 Image target detection method, system and equipment
CN111652930B (en) * 2020-06-04 2024-02-27 上海媒智科技有限公司 Image target detection method, system and equipment
CN112150471B (en) * 2020-09-23 2023-09-05 创新奇智(上海)科技有限公司 Semantic segmentation method and device based on few samples, electronic equipment and storage medium
CN112150471A (en) * 2020-09-23 2020-12-29 创新奇智(上海)科技有限公司 Semantic segmentation method and device based on few samples, electronic equipment and storage medium
CN112200115A (en) * 2020-10-21 2021-01-08 平安国际智慧城市科技股份有限公司 Face recognition training method, recognition method, device, equipment and storage medium
CN112200115B (en) * 2020-10-21 2024-04-19 平安国际智慧城市科技股份有限公司 Face recognition training method, recognition method, device, equipment and storage medium
CN112307991A (en) * 2020-11-04 2021-02-02 北京临近空间飞行器系统工程研究所 Image recognition method, device and storage medium
CN112614136A (en) * 2020-12-31 2021-04-06 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Infrared small target real-time instance segmentation method and device
CN113177947A (en) * 2021-04-06 2021-07-27 广东省科学院智能制造研究所 Complex environment target segmentation method and device based on multi-module convolutional neural network
CN113177947B (en) * 2021-04-06 2024-04-26 广东省科学院智能制造研究所 Multi-module convolutional neural network-based complex environment target segmentation method and device
CN112907616B (en) * 2021-04-27 2022-05-03 浙江大学 Pedestrian detection method based on thermal imaging background filtering
CN112907616A (en) * 2021-04-27 2021-06-04 浙江大学 Pedestrian detection method based on thermal imaging background filtering
CN114034390B (en) * 2021-11-08 2023-11-03 山东大学 Equipment temperature anomaly detection system based on infrared detection
CN114034390A (en) * 2021-11-08 2022-02-11 山东大学 Equipment temperature anomaly detection system based on infrared detection
CN114782460A (en) * 2022-06-21 2022-07-22 阿里巴巴达摩院(杭州)科技有限公司 Image segmentation model generation method, image segmentation method and computer equipment
CN117132777A (en) * 2023-10-26 2023-11-28 腾讯科技(深圳)有限公司 Image segmentation method, device, electronic equipment and storage medium
CN117132777B (en) * 2023-10-26 2024-03-22 腾讯科技(深圳)有限公司 Image segmentation method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111046880B (en) 2023-12-26

Similar Documents

Publication Publication Date Title
CN111046880B (en) Infrared target image segmentation method, system, electronic equipment and storage medium
Bhalla et al. A fuzzy convolutional neural network for enhancing multi-focus image fusion
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN108898065B (en) Deep network ship target detection method with candidate area rapid screening and scale self-adaption
CN105930822A (en) Human face snapshot method and system
CN112598713A (en) Offshore submarine fish detection and tracking statistical method based on deep learning
CN109034184B (en) Grading ring detection and identification method based on deep learning
CN104299006A (en) Vehicle license plate recognition method based on deep neural network
CN113435407B (en) Small target identification method and device for power transmission system
CN109977899B (en) Training, reasoning and new variety adding method and system for article identification
CN112614136A (en) Infrared small target real-time instance segmentation method and device
CN113516126A (en) Adaptive threshold scene text detection method based on attention feature fusion
CN113569981A (en) Power inspection bird nest detection method based on single-stage target detection network
Aarathi et al. Vehicle color recognition using deep learning for hazy images
CN111695373A (en) Zebra crossing positioning method, system, medium and device
CN111832508B (en) DIE _ GA-based low-illumination target detection method
CN109284752A (en) A kind of rapid detection method of vehicle
CN116612272A (en) Intelligent digital detection system for image processing and detection method thereof
Xu et al. Infrared image semantic segmentation based on improved deeplab and residual network
CN110334703B (en) Ship detection and identification method in day and night image
CN113963178A (en) Method, device, equipment and medium for detecting infrared dim and small target under ground-air background
CN113177956A (en) Semantic segmentation method for unmanned aerial vehicle remote sensing image
CN111476129A (en) Soil impurity detection method based on deep learning
Kiruthika Devi et al. A deep learning-based residual network model for traffic sign detection and classification
CN114220053B (en) Unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant