CN116935033A - Infrared target detection and identification method based on convolutional neural network - Google Patents
Infrared target detection and identification method based on convolutional neural network Download PDFInfo
- Publication number
- CN116935033A CN116935033A CN202310879338.8A CN202310879338A CN116935033A CN 116935033 A CN116935033 A CN 116935033A CN 202310879338 A CN202310879338 A CN 202310879338A CN 116935033 A CN116935033 A CN 116935033A
- Authority
- CN
- China
- Prior art keywords
- target
- pooling
- layer
- convolutional neural
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 14
- 238000011176 pooling Methods 0.000 claims description 35
- 230000000694 effects Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 8
- 210000002569 neuron Anatomy 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 238000013461 design Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 230000001537 neural effect Effects 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 3
- 238000004088 simulation Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 230000006870 function Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to an infrared target detection and identification method based on a convolutional neural network, which comprises the following steps: s1, reading a picture: collecting images of a plurality of frames in a prediction area video as input images; s2, building a convolutional neural network model and training; s3, detecting a target based on a convolutional neural network model by the image to be identified in the S1 to obtain target probability; s4, if the target is detected in the S3, recording the size and the motion information of the target and displaying a detection result; and S5, if the target is not detected in the step S3, predicting a possible occurrence area of the target by using the motion information recorded in the step S4, obtaining a predicted area, representing the target position by using the predicted position, reducing a threshold value in the predicted area, checking and verifying, and finally correcting the target track by using the detection result, and displaying the detection result. The invention can realize accurate detection on the far-distance target with low signal-to-noise/noise ratio of the infrared image, and has high target detection rate and wider application environment range.
Description
Technical Field
The invention relates to the technical field of image recognition, in particular to an infrared target detection and recognition method based on a convolutional neural network.
Background
In general, due to the influence of detection environment, performance of sensors and other factors, the signal-to-noise ratio of the acquired infrared image is often not high, which is represented by low contrast of the image, which makes detection of the target difficult. In addition to the problem of low signal-to-noise ratio of the infrared image, in practical situations, the problem of low signal-to-noise ratio may be faced, for example, when the target passes through the detection scene, the detection of the target may be challenged due to the fact that there may be more bright background interference in the scene (such as cloud layers, leaves, buildings, etc. in the image).
Under the scene of low signal-to-noise ratio and low signal-to-noise ratio, the conventional infrared target detection algorithm has the following disadvantages: firstly, the lack of robustness of the dependent structured features results in the inability to distinguish between background and small objects in complex scenes; second, while some approaches have employed local contrast, these comparisons are made on low-level visual features, lacking cognition and understanding of high-level features. Therefore, the traditional method often causes the problems of low target detection rate, high false alarm rate and the like.
The present invention relates to a method for identifying a target, and more particularly, to a method for identifying a target by using a convolutional neural network, which is capable of solving the defects of few conventional characteristics and difficult description of the infrared target due to the strong high-level characteristic extraction capability of the convolutional neural network, and has the advantages of rolling the conventional method in many target identification fields.
The far-distance infrared target is focused, so that the target is represented as a point or a spot on an image, the target image at the moment has no geometric characteristic and texture characteristic, and only the track characteristic of a target size machine can be considered, so that the target can be detected in a mode of combining detection and prediction according to the two characteristics, and the far-distance infrared target extraction method based on the convolutional neural network can realize the accurate detection of the target of the far-distance infrared image with low signal-to-noise/noise ratio.
Disclosure of Invention
Aiming at the problems, the invention provides the infrared target detection and identification method based on the convolutional neural network, which aims at the target of the far-distance infrared image with low signal-to-noise/noise ratio and can realize high detection rate and accurate detection of the target.
The technical scheme adopted for solving the technical problems is as follows: the infrared target detection and identification method based on the convolutional neural network comprises the following steps:
s1, reading a picture: collecting images of a plurality of frames in a prediction area video as input images;
s2, building a convolutional neural network model and training, wherein the method comprises the following specific steps of:
s201, data set preparation: performing target labeling on the image data, and enhancing the training data set by using target simulation;
s202, network structure design: constructing a network node, comprising an input layer, an implicit layer and an output layer, and initializing network parameters;
s203, parameter training: inputting the data set of the S201 into the network structure of the S202 for parameter training of the network model, so that the network model is obtained by determining a cost function, interpreting a forward result and updating weights reversely;
s3, detecting a target based on a convolutional neural network model by the image to be identified in the S1 to obtain target probability;
s4, if the target is detected in the S3, recording the size and the motion information of the target, displaying the detection result, and identifying the position of the target in the image and the position of the framed target;
s5, if the target is not detected in S3, predicting a possible occurrence area of the target by utilizing the motion information recorded in S4 to obtain a predicted area, if the target is not detected in the image predicted area of a plurality of continuous frames, confirming that the target is lost, representing the target position by using the predicted position, reducing a threshold value in the predicted area for verification, correcting the target track by utilizing the detection result, and displaying the detection result to identify the target in the image and the position of the framed target.
Preferably, the hidden layers in the step S202 include a 3-layer convolution layer, a 2-layer pooling layer and a 1-layer full connection layer,
convolution layer: for feature extraction, the convolution layer internally comprises a plurality of convolution kernels, each element composing the convolution kernels corresponds to a weight coefficient and a deviation amount,
given an image: x epsilon R M×N And a filter: w epsilon R U×V Typically U < M, V < N, the convolution is:
pooling layer: the pooling layer comprises a set pooling function, and the function of the pooling function is to replace the result of a single point in the feature map with the feature map statistic of the adjacent area of the feature map;
common pooling includes both maximum pooling and average pooling:
max Pooling (Max Pooling): for a regionThe maximum activity value of all neurons in this region is chosen as a representation of this region:
wherein x is i Is a regionAn activity value for each neuron within;
average Pooling (Mean Pooling): for a regionThe average of all neuronal activity values in this region was chosen as a representation of this region:
full tie layer: the full connection layer does not have feature extraction capability per se, but is used for carrying out nonlinear combination on the extracted features to obtain output, so that the target feature map loses a space topological structure and is expressed as a group of vectors.
Compared with the prior art, the invention has the following beneficial effects:
the invention constructs a constructed convolutional neural network model for target detection, can realize accurate detection on targets of far-distance infrared images with low signal-to-noise/noise ratio, has high target detection rate, greatly reduces the influence of complex background interference factors, has wider application environment range and high automation degree, and can meet the practical requirements.
Drawings
FIG. 1 is a flow chart of the infrared target detection and identification process of the present invention;
FIG. 2 is a schematic diagram of the convolutional neural network model-based target detection of the present invention;
FIG. 3 is a schematic diagram of the construction of a convolutional neural network of the present invention;
FIG. 4 is a schematic diagram of experimental result 1 in the example of the present invention;
FIG. 5 is a schematic diagram of experimental result 2 in the example of the present invention.
Detailed Description
The present invention will now be described in detail with reference to fig. 1-5, wherein the exemplary embodiments and descriptions of the present invention are provided for illustration of the present invention and are not intended to be limiting.
The infrared target detection and identification method based on the convolutional neural network comprises the following steps:
s1, reading a picture: collecting images of a plurality of frames in a prediction area video as input images;
s2, building a convolutional neural network model and training, wherein the method comprises the following specific steps of:
s201, data set preparation: performing target labeling on the image data, and enhancing the training data set by using target simulation;
s202, network structure design: constructing a network node, comprising an input layer, an implicit layer and an output layer, and initializing network parameters;
the invention comprehensively considers the balance between efficiency and effect to further determine the layer number of the network. In order to ensure the effect of the method, namely the detection rate and the omission rate of the target, the hidden layers comprise 3 convolution layers, 2 pooling layers and 1 full connection layer,
convolution layer: for feature extraction, the convolution layer internally comprises a plurality of convolution kernels, each element composing the convolution kernels corresponds to a weight coefficient and a deviation amount,
given an image: x epsilon R M×N And a filter: w epsilon R U×V Typically U < M, V < N, the convolution is:
pooling layer: the pooling layer comprises a set pooling function, and the function of the pooling function is to replace the result of a single point in the feature map with the feature map statistic of the adjacent area of the feature map;
common pooling includes both maximum pooling and average pooling:
maximum valuePooling (Max Pooling): for a regionThe maximum activity value of all neurons in this region is chosen as a representation of this region:
wherein x is i Is a regionAn activity value for each neuron within;
average Pooling (Mean Pooling): for a regionThe average of all neuronal activity values in this region was chosen as a representation of this region:
full tie layer: the full connection layer does not have feature extraction capability per se, but is used for carrying out nonlinear combination on the extracted features to obtain output, so that the target feature map loses a space topological structure and is expressed as a group of vectors.
S203, parameter training: inputting the data set of the S201 into the network structure of the S202 for parameter training of the network model, so that the network model is obtained by determining a cost function, interpreting a forward result and updating weights reversely;
the input needed in the training process is single frame image and label data corresponding to targets such as bounding boxes, masks and the like, and the network structure and cost function are required to be designed and updated by adopting a proper optimizer. The reasoning process is input as a single frame image, and a proper evaluation system is required to be designed to measure the similarity between the reasoning result and the real label.
S3, detecting a target based on a convolutional neural network model by the image to be identified in the S1 to obtain target probability;
s4, if the target is detected in the S3, recording the size and the motion information of the target and displaying the detection result, wherein the step aims to store the dynamic characteristics of the target and provide a priori condition for accurate screening of the subsequent target again so as to identify the target in the image and the position of the framed target;
s5, if the target is not detected in the S3, predicting a possible occurrence area of the target by utilizing the motion information recorded in the S4 to obtain a predicted area, wherein the aim of the step is to screen out a suspected area, avoid full-image detection, reduce the processed data amount, thereby reducing the time consumption of an algorithm, if the target is not detected in the image predicted area of a plurality of continuous frames, confirm that the target is lost, represent the target position by a predicted position, reduce a threshold value in the predicted area for verification and perform verification, ensure that the target is not missed in the predicted area through the verification and verification, and finally correct the target track by utilizing the detection result, display the detection result, provide priori knowledge for the subsequent detection, thereby identifying the target in the image and the position of the framed target.
In the implementation process, in order to verify the feasibility and effectiveness of the method, relevant data are collected by the subject group, experiments are performed according to the flow, and partial experimental results are shown in fig. 4 and 5. Fig. 4 and 5 are two frames in the captured video. The experimental result shows that the scene at this time contains a large number of trees, the intensity difference between the target and the interference is very small, and the target is difficult to accurately detect by using the traditional method, but the target can be perfectly detected by adopting the technical route of the project, so that the feasibility and the effectiveness of the method are preliminarily verified.
The foregoing has described in detail the technical solutions provided by the embodiments of the present invention, and specific examples have been applied to illustrate the principles and implementations of the embodiments of the present invention, where the above description of the embodiments is only suitable for helping to understand the principles of the embodiments of the present invention; meanwhile, as for those skilled in the art, according to the embodiments of the present invention, there are variations in the specific embodiments and the application scope, and the present description should not be construed as limiting the present invention.
Claims (2)
1. The infrared target detection and identification method based on the convolutional neural network is characterized by comprising the following steps of:
s1, reading a picture: collecting images of a plurality of frames in a prediction area video as input images;
s2, building a convolutional neural network model and training, wherein the method comprises the following specific steps of:
s201, data set preparation: performing target labeling on the image data, and enhancing the training data set by using target simulation;
s202, network structure design: constructing a network node, comprising an input layer, an implicit layer and an output layer, and initializing network parameters;
s203, parameter training: inputting the data set of the S201 into the network structure of the S202 for parameter training of the network model, so that the network model is obtained by determining a cost function, interpreting a forward result and updating weights reversely;
s3, detecting a target based on a convolutional neural network model by the image to be identified in the S1 to obtain target probability;
s4, if the target is detected in the S3, recording the size and the motion information of the target, displaying the detection result, and identifying the position of the target in the image and the position of the framed target;
s5, if the target is not detected in S3, predicting a possible occurrence area of the target by utilizing the motion information recorded in S4 to obtain a predicted area, if the target is not detected in the image predicted area of a plurality of continuous frames, confirming that the target is lost, representing the target position by using the predicted position, reducing a threshold value in the predicted area for verification, correcting the target track by utilizing the detection result, and displaying the detection result to identify the target in the image and the position of the framed target.
2. The method for detecting and identifying an infrared target based on a convolutional neural network according to claim 1, wherein the method comprises the following steps: the hidden layers in the step S202 include 3 convolutional layers, 2 pooling layers and 1 fully-connected layer,
convolution layer: for feature extraction, the convolution layer internally comprises a plurality of convolution kernels, each element composing the convolution kernels corresponds to a weight coefficient and a deviation amount,
given an image: x epsilon R M×N And a filter: w epsilon R U×V Typically U < M, V < N, the convolution is:
pooling layer: the pooling layer comprises a set pooling function, and the function of the pooling function is to replace the result of a single point in the feature map with the feature map statistic of the adjacent area of the feature map;
common pooling includes both maximum pooling and average pooling:
max Pooling (Max Pooling): for a regionThe maximum activity value of all neurons in this region is chosen as a representation of this region:
wherein x is i Is a regionAn activity value for each neuron within;
average Pooling (Mean Pooling): for a regionThe average of all neuronal activity values in this region was chosen as a representation of this region:
full tie layer: the full connection layer does not have feature extraction capability per se, but is used for carrying out nonlinear combination on the extracted features to obtain output, so that the target feature map loses a space topological structure and is expressed as a group of vectors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310879338.8A CN116935033A (en) | 2023-07-18 | 2023-07-18 | Infrared target detection and identification method based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310879338.8A CN116935033A (en) | 2023-07-18 | 2023-07-18 | Infrared target detection and identification method based on convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116935033A true CN116935033A (en) | 2023-10-24 |
Family
ID=88376826
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310879338.8A Pending CN116935033A (en) | 2023-07-18 | 2023-07-18 | Infrared target detection and identification method based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116935033A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117475262A (en) * | 2023-12-26 | 2024-01-30 | 苏州镁伽科技有限公司 | Image generation method and device, storage medium and electronic equipment |
-
2023
- 2023-07-18 CN CN202310879338.8A patent/CN116935033A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117475262A (en) * | 2023-12-26 | 2024-01-30 | 苏州镁伽科技有限公司 | Image generation method and device, storage medium and electronic equipment |
CN117475262B (en) * | 2023-12-26 | 2024-03-19 | 苏州镁伽科技有限公司 | Image generation method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107423702B (en) | Video target tracking method based on TLD tracking system | |
CN106897681B (en) | Remote sensing image contrast analysis method and system | |
CN110309808B (en) | Self-adaptive smoke root node detection method in large-scale space | |
CN113591968A (en) | Infrared weak and small target detection method based on asymmetric attention feature fusion | |
CN110415260B (en) | Smoke image segmentation and identification method based on dictionary and BP neural network | |
CN112597928B (en) | Event detection method and related device | |
CN116935033A (en) | Infrared target detection and identification method based on convolutional neural network | |
CN111967345B (en) | Method for judging shielding state of camera in real time | |
CN114973112A (en) | Scale-adaptive dense crowd counting method based on antagonistic learning network | |
CN112907626A (en) | Moving object extraction method based on satellite time-exceeding phase data multi-source information | |
CN115719463A (en) | Smoke and fire detection method based on super-resolution reconstruction and adaptive extrusion excitation | |
CN116402852A (en) | Dynamic high-speed target tracking method and device based on event camera | |
US20130271601A1 (en) | Method and device for the detection of change in illumination for vision systems | |
CN113962900A (en) | Method, device, equipment and medium for detecting infrared dim target under complex background | |
CN113657264A (en) | Forest fire smoke root node detection method based on fusion of dark channel and KNN algorithm | |
CN110472639B (en) | Target extraction method based on significance prior information | |
CN110837787B (en) | Multispectral remote sensing image detection method and system for three-party generated countermeasure network | |
CN114596244A (en) | Infrared image identification method and system based on visual processing and multi-feature fusion | |
Park et al. | Automatic radial un-distortion using conditional generative adversarial network | |
CN114757941A (en) | Transformer substation equipment defect identification method and device, electronic equipment and storage medium | |
CN113313678A (en) | Automatic sperm morphology analysis method based on multi-scale feature fusion | |
CN112084922A (en) | Abnormal behavior crowd detection method based on gestures and facial expressions | |
CN111191575A (en) | Naked flame detection method and system based on flame jumping modeling | |
CN113591705B (en) | Inspection robot instrument identification system and method and storage medium | |
CN117011196B (en) | Infrared small target detection method and system based on combined filtering optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |