CN111986240A - Drowning person detection method and system based on visible light and thermal imaging data fusion - Google Patents
Drowning person detection method and system based on visible light and thermal imaging data fusion Download PDFInfo
- Publication number
- CN111986240A CN111986240A CN202010904133.7A CN202010904133A CN111986240A CN 111986240 A CN111986240 A CN 111986240A CN 202010904133 A CN202010904133 A CN 202010904133A CN 111986240 A CN111986240 A CN 111986240A
- Authority
- CN
- China
- Prior art keywords
- image
- visible light
- fusion
- infrared
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 78
- 238000001514 detection method Methods 0.000 title claims abstract description 59
- 238000001931 thermography Methods 0.000 title claims abstract description 27
- 206010013647 Drowning Diseases 0.000 title claims description 12
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims abstract description 37
- 238000000034 method Methods 0.000 claims abstract description 34
- 230000003287 optical effect Effects 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims description 16
- 238000010586 diagram Methods 0.000 claims description 10
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000007499 fusion processing Methods 0.000 claims description 5
- 238000011176 pooling Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 238000011478 gradient descent method Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 17
- 230000006872 improvement Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000002411 adverse Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000005855 radiation Effects 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a method and a system for detecting people falling into water based on fusion of visible light and thermal imaging data, wherein the method comprises the following steps: use two optical cameras to acquire visible light image and infrared image simultaneously, two optical cameras include: an optical camera and an infrared thermal imaging camera; carrying out image registration on the infrared image and the visible light image; inputting the registered infrared image and visible light image into a pre-trained fusion network, and outputting a fusion image; and inputting the fused image into a pre-trained detection network, and outputting a detection result of whether the person falls into water or not. The method disclosed by the invention fuses the visible light images and the infrared images, so that the images not only highlight human bodies, but also contain certain texture characteristics, and the detection accuracy and recall rate can be greatly improved.
Description
Technical Field
The invention relates to the field of search and rescue, in particular to a drowning person detection method and system based on visible light and thermal imaging data fusion.
Background
Every year, people fall into water, crews fall into water accidentally, visitors fall into water accidentally, ships turn over and sink, and the like, so that tens of thousands of people die of drowning every year. Mainly, the water flow is turbulent, the area of the water area is large, and people falling into the water are difficult to find and position. With the upgrade of computing hardware and the optimization of artificial intelligence algorithms, image processing and detection have been applied to solve various problems, and the problem of detection by people falling into water still needs to be solved urgently.
Image fusion is an enhanced technology, and aims to combine images acquired by different types of sensors to generate an image with stronger robustness or richer information so as to facilitate subsequent processing or help decision making.
First, their signals come from different forms, providing different aspects of scene information, i.e. visible light images capture reflected light, while infrared images capture thermal radiation, and therefore this combination is more informative than the single-modality signals. Second, infrared and visible light images exhibit characteristics inherent to almost all objects, and these images can be obtained with relatively simple equipment. And finally, the infrared image and the visible light image have complementary characteristics, so that a fused image with strong robustness and rich information is generated. Visible light images generally have high spatial resolution and considerable detail and contrast, and therefore they conform to the human visual perception. However, these images are susceptible to adverse conditions such as low light, fog, and other adverse weather effects. While infrared images, which describe the thermal radiation of an object, are resistant to these disturbances, they are generally of lower resolution and of poorer texture. Visible and infrared image fusion techniques have a wider range of applications than other fusion techniques due to the ubiquitous nature and complementarity of the images utilized.
The visible light and infrared image fusion has great significance for personnel detection, especially for personnel detection falling into water. Firstly, if only use the visible light image to detect, the people is in rivers torrent and unclear river, and the proportion of the surface of water that exposes when falling into water is very little, and the personnel of falling into water almost fuses with river water as an organic whole, and naked eye and camera are all difficult to distinguish, even very outstanding detection algorithm also is difficult to detect accurately and without omission, and the light condition is good fashion just so, just can't detect under dark night or the fog condition completely. The infrared image can well distinguish the human body from the background, and the human body is higher in temperature compared with river water and the brightness of the human body reflected on the infrared image is higher than that of the river water, so that the infrared image is more prominent. However, the infrared image has low resolution and lacks texture features, only rough contour information can be acquired, and if a high-temperature object with a shape similar to that of a person falling into water exists in the picture, misjudgment and missed judgment are easily caused.
Disclosure of Invention
The present invention is directed to overcoming the technical defects, and embodiment 1 of the present invention provides a method for detecting a man falling into water based on visible light and thermal imaging data fusion, where the method includes:
use two optical cameras to acquire visible light image and infrared image simultaneously, two optical cameras include: an optical camera and an infrared thermal imaging camera;
carrying out image registration on the infrared image and the visible light image;
inputting the registered infrared image and visible light image into a pre-trained fusion network, and outputting a fusion image;
and inputting the fused image into a pre-trained detection network, and outputting a detection result of whether the person falls into water or not.
As an improvement of the above method, the image registration of the infrared image and the visible light image specifically includes:
respectively extracting an edge map of the infrared image and an edge map of the visible light image;
aligning the edge graph of the infrared image and the edge graph of the visible light image to obtain an aligned edge graph;
and respectively carrying out image conversion on the infrared image and the visible light image according to the aligned edge images to obtain the aligned infrared image and visible light image.
As an improvement of the above method, the fusion network comprises a first convolutional layer, a dense block, a fusion layer and a plurality of cascaded convolutional layers which are connected in sequence;
the first convolution layer is used for respectively extracting the depth characteristics of the aligned visible light image and infrared image and outputting the depth characteristics of the visible light image and the infrared image;
the dense block comprises a visible light branch and an infrared branch; the visible light branch comprises three convolution layers which are connected in sequence, and the infrared branch comprises three convolution layers which are connected in sequence; the depth characteristics of the visible light image are respectively used as the input of three convolution layers of the visible light branch, and in the visible light branch, the output of each convolution layer is cascaded into the input of all the convolution layers behind the convolution layer; the depth characteristics of the infrared image are respectively used as the input of three convolution layers of the infrared branch, and in the infrared branch, the output of each convolution layer is cascaded into the input of all the convolution layers behind the convolution layer;
the fusion layer is used for fusing a visible light image characteristic diagram output by the visible light branch and an infrared image characteristic diagram output by the infrared branch by applying L1 norm and softmax operation to output a fusion characteristic diagram;
the plurality of cascaded convolutional layers are used for forming a decoder and converting the fused feature map into a fused picture.
As an improvement of the above method, the loss function L of the converged networkfusBy the pixel loss function LpAnd structural similarity loss function LssimThe weighting results in:
Lfus=λLssim+Lp
Lp=‖O-I‖2
Lssim=1-SSIM(O,I)
wherein L ispRepresenting the euclidean distance between the output image O and the input image I, SSIM (O, I) representing the structural similarity between the output image O and the input image I, the structural similarity comprising three components: correlation, luminance loss and contrast distortion, λ 1000.
As an improvement of the above method, the detection network is a convolutional neural network CNN, and its backbone network adopts modified dark net-53, which removes the last full connection layer, and uses convolution to realize down-sampling instead of pooling layer, forming a full convolutional network using many residual error layer jumps;
detecting that the input of the network is a fused picture; the treatment process comprises the following steps: dividing the fused picture into S multiplied by S unit cells, and if the center of an object falls on a certain unit cell, the unit cell is responsible for predicting the object; predicting a plurality of bounding box values for each cell, predicting a confidence coefficient for each bounding box, and performing prediction analysis by taking each cell as a unit;
the output of the detection network is three feature maps with different scales, so that targets with different sizes are detected by adopting multiple scales, and finally, predicted bounding boxes, classification and confidence coefficients are output to identify people falling into water.
As an improvement of the above-mentioned method,loss function L of the detection networkdecL introducing error to bounding boxboxError L by categoryclsError L due to sum confidenceobjThe sum of (1):
Ldec=Lbox+Lcls+Lobj
wherein S represents the number of horizontal unit grids, the number of the horizontal unit grids is the same as that of the vertical unit grids, B represents box,indicates whether the ith anchor box of the ith grid is responsible for the object, wiAnd hiFor the predicted width and height of the ith mesh,andwidth and height of the true ith grid; x is the number ofiAnd yiTo predict the center coordinates of the ith grid,andthe central coordinate of the real ith grid is obtained; lambda [ alpha ]coord、λclass、λnobjAnd λobjAre all parameters; p is a radical ofi(c) Is the predicted probability for the class c,class is the true probability of the class c, the set of classes; c. CiFor the purpose of the confidence level of the prediction,for the true confidence, the value is determined by whether the cell is responsible for predicting the object.
As an improvement of the above method, the method further comprises: the step of training the fusion network and the detection network specifically comprises the following steps:
establishing a training set, capturing visible light and infrared images by using a dual-light camera, obtaining a fused image through the registration and fusion processes, and marking the image containing the person falling into the water;
the joint loss function L for both networks is:
L=Lfus+Ldec
and training by using a training set and using the loss function and a gradient descent method to obtain the parameters of the network.
The embodiment 2 of the invention provides a drowning person detection system based on visible light and thermal imaging data fusion, which comprises: the system comprises an infrared thermal imaging camera, an optical camera, a trained fusion network, a trained detection network, an image registration module, a fusion module and a detection module;
the image registration module is used for simultaneously acquiring an infrared image acquired by the infrared thermal imaging camera and a visible light image acquired by the optical camera and performing image registration on the infrared image and the visible light image;
the fusion module is used for inputting the registered infrared image and visible light image into a trained fusion network and outputting a fusion image;
and the detection module is used for inputting the fusion image into a trained detection network and outputting a detection result of whether the person falls into water or not.
The invention has the advantages that:
the method and the system fuse the visible light image and the infrared image, so that the image not only highlights the human body, but also contains certain texture characteristics, and the detection accuracy and the recall rate are greatly improved.
Drawings
FIG. 1 is a flow chart of image registration of the present invention;
FIG. 2 is a schematic diagram of a converged network of the present invention;
FIG. 3 is a schematic diagram of a fusion layer of the fusion network of the present invention;
FIG. 4 is a schematic diagram of a detection network of the present invention;
FIG. 5 is a schematic illustration of the use of multiple scales to detect targets of different sizes;
FIG. 6 is a schematic diagram of the detection-fusion reverse training of the present invention;
fig. 7 is a flowchart of the method for detecting a person falling into water based on fusion of visible light and thermal imaging data according to the present invention.
Detailed Description
The technical solution of the present invention will be described in detail below with reference to the accompanying drawings.
The embodiment 1 of the invention provides a drowning person detection method based on visible light and thermal imaging data fusion, which comprises the following steps:
step 1) image acquisition and registration
Step 1-1) image acquisition
Use two optical cameras to acquire visible light image and infrared image simultaneously, two optical cameras include: optical cameras and infrared thermal imaging cameras.
Step 1-2) image registration
Since the infrared image and the visible image are acquired by different sensors, and are usually different in size, perspective and field of view, the dual-optical camera also causes a difference in the viewing angle. Successful image fusion, however, requires strict geometric alignment of the fused images, thus requiring registration of the visible and infrared images prior to fusion. Registration of infrared images with visible light images is a multimodal registration problem.
For the registration problem here, a feature-based registration method is used, which first extracts two groups of salient structures, then determines the correct correspondence between them, and estimates the spatial transformation accordingly, which is then used to align the given image pair.
The first step of the feature-based approach is to extract robust common features that can represent the original image. Edge information is one of the most common choices in infrared and visible image registration, as shown in fig. 1, because the size and direction of edge information can be well preserved by different registration methods. Edge mapping can be discretized into a set of points, and one popular strategy for solving the point matching problem involves two steps: a set of hypothetical correspondences is computed and then outliers are removed by geometric constraints. The given parametric model is estimated by computing feature descriptors at points, eliminating matches between points with too different descriptors, removing false matches from the set of hypotheses using random sample consensus (RANSAC), and attempting to obtain the smallest possible non-subset of outliers by resampling using the hypothesis-verification method.
Step 2) establishing a fusion network for image fusion
A deep learning architecture is employed that addresses the problem of infrared and visible image fusion. Compared with the traditional convolutional network, the coding network is combined with convolutional layers, fusion layers and dense blocks, wherein the output of each layer is connected with each other, the system structure is used for acquiring more useful features from a source image in the coding process, a proper fusion strategy is selected for fusing the features, and finally a fused image is reconstructed through a decoder.
As shown in fig. 2, the depth features of the visible light image and the infrared image are extracted before fusion, the first convolutional layer extracts the coarse features, and then three convolutional layers (the output of each layer is cascaded as the input of the subsequent layer) constitute a dense block. Such an architecture has two advantages. First, the size of the filter and the step size of the convolution operation are 3 × 3 and 1, respectively. Using this strategy, the input image can be any size; second, dense blocks can preserve depth features as much as possible in the coding network, and this operation can ensure that all salient features are used in the fusion strategy.
As shown in fig. 3, the L1 norm and softmax operations are applied at the fusion level.
The fused layer includes a plurality of convolutional layers (3 × 3 convolutions), the output of the fused layer is the input of the convolutional layers, and the plurality of convolutional layers are used to reconstruct the fused image to constitute a decoder, and the fused feature map is converted into a fused picture. This simple and efficient architecture is used to reconstruct the final fused image.
Loss function of fusion network is composed of pixel loss function LpAnd structural similarity loss function LssimThe weighting results in:
Lp=‖O-I‖2
Lssim=1-SSIM(O,I)
Lfus=λLssim+Lp
where O and I denote the output image and the input image, respectively. L ispIs the euclidean distance between the output O and the input I, SSIM represents the structural similarity, which represents the structural similarity of two images, the index mainly consisting of three parts: correlation, brightness loss and contrast distortion, and the product of the three components is the evaluation result of the fused image. Since there is a difference of three orders of magnitude between the pixel loss and the SSIM loss, λ is set to 1000 during the training phase.
Step 3) establishing a detection network for detecting the person falling into water
The convolutional neural network CNN is adopted to carry out target recognition on people falling into water, the central idea of the detection network is to divide a picture into S multiplied by S unit cells, and if the center of an object falls on a certain unit cell, the unit cell is responsible for predicting the object. Multiple bounding box values are predicted for each cell, a confidence is predicted for each bounding box, and prediction analysis is performed on a per cell basis.
The backbone network employs a modified darknet-53 as shown in fig. 4. The network has high classification precision, high calculation speed and fewer network layers, removes all-connection layers, is a full convolution network, largely uses layer jump connection of residual errors, abandons a pooling layer in order to reduce gradient negative effects caused by pooling, and realizes down-sampling by using the step length of the convolution layer. In this network structure, the down-sampling is performed using a convolution with a step size of 2.
The network outputs three feature maps of different scales, with reference to the FPN, and adopts multiple scales to detect targets of different sizes, so that more precise units can detect more precise objects. As shown in fig. 5.
Before model training, firstly, a data set of fused images needs to be made, visible light images and infrared images are captured and captured through double-light cameras, the fused images are obtained through the registration and fusion processes, personnel falling into water are marked to make the data set of a format required by training, a pre-training model is selected for training, and an algorithm model capable of identifying the personnel falling into water in the visible light infrared fused images is obtained. And then, indexes such as the accuracy of the model are evaluated, and optimization is performed from the aspects of data sets, algorithms and the like, so that a better identification effect can be achieved.
The loss function of the detection network is divided into three parts, L brought by a bounding boxboxL by confidenceobjError L due to sum classcls:
Wherein S represents the number of horizontal unit grids, the number of the horizontal unit grids is the same as that of the vertical unit grids, B represents box,indicates whether the ith anchor box of the ith grid is responsible for the object, wiAnd hiFor the predicted width and height of the ith mesh,andwidth and height of the true ith grid; x is the number ofiAnd yiTo predict the center coordinates of the ith grid,andthe central coordinate of the real ith grid is obtained; lambda [ alpha ]coord、λclass、λnobjAnd λobjAre all parameters; p is a radical ofi(c) Is the predicted probability for the class c,class is the true probability of the class c, the set of classes; c. CiFor the purpose of the confidence level of the prediction,for the true confidence, the value is determined by whether the cell is responsible for predicting the object.
The loss function is the sum of the above three errors:
Ldec=Lbox+Lcls+Lobj
L=Lfus+Ldec
step 4) detection-fusion reverse training
The purpose of the common visible light infrared image fusion technology is to make a fused image contain information of two images as much as possible, not lose contrast information in the infrared image and not lose texture information in the visible light image, so to speak, make the fused image more conform to the human visual system, so that a loss function of an initial fusion process is defined as a weighted sum of a pixel loss function and a structural similarity loss function.
The method of the invention is mainly characterized in that people falling into water can be accurately detected, the image fusion result is only an intermediate process, and the optimization of the image fusion result is the final target of accurate detection in both the image fusion process and the detection process. In order to achieve the final goal, the training of image fusion should be corrected, so that the loss function of the detection process can guide fusion, and the final detection result is optimized in the fusion stage.
As shown in fig. 6, firstly, a person falling into the water is marked on the registered visible light or infrared image, since the images are registered and aligned, and the position of the fused target is unchanged, the label can be copied on the fused image as a groudtruth, the fused image passes through a detection network to obtain a predicted bounding box, classification and confidence, and the predicted bounding box, classification and confidence are compared with the label to obtain a detection error, i.e., LdecThe loss function is not only used for evaluating and optimizing the detection network, but also used for evaluating and optimizing the fusion network, and is equivalent to the loss function of the fusion network to be corrected as follows:
L=Lfus+Ldec
in this way, the achievement of the final objective is facilitated.
And 5) detecting the person falling into the water by using image acquisition, image registration, image fusion and target detection so as to facilitate subsequent positioning and rescue. As shown in fig. 7.
The embodiment 2 of the invention provides a drowning person detection system based on visible light and thermal imaging data fusion, which comprises: the system comprises an infrared thermal imaging camera, an optical camera, a trained fusion network, a trained detection network, an image registration module, a fusion module and a detection module;
the image registration module is used for simultaneously acquiring an infrared image acquired by the infrared thermal imaging camera and a visible light image acquired by the optical camera and performing image registration on the infrared image and the visible light image;
the fusion module is used for inputting the registered infrared image and visible light image into a trained fusion network and outputting a fusion image;
and the detection module is used for inputting the fusion image into a trained detection network and outputting a detection result of whether the person falls into water or not.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (8)
1. A method of drowning person detection based on visible light and thermal imaging data fusion, the method comprising:
use two optical cameras to acquire visible light image and infrared image simultaneously, two optical cameras include: an optical camera and an infrared thermal imaging camera;
carrying out image registration on the infrared image and the visible light image;
inputting the registered infrared image and visible light image into a pre-trained fusion network, and outputting a fusion image;
and inputting the fused image into a pre-trained detection network, and outputting a detection result of whether the person falls into water or not.
2. The method for detecting man overboard based on fusion of visible light and thermal imaging data as claimed in claim 1, wherein the image registration of the infrared image and the visible light image specifically comprises:
respectively extracting an edge map of the infrared image and an edge map of the visible light image;
aligning the edge graph of the infrared image and the edge graph of the visible light image to obtain an aligned edge graph;
and respectively carrying out image conversion on the infrared image and the visible light image according to the aligned edge images to obtain the aligned infrared image and visible light image.
3. The method for detecting man-in-the-water based on fusion of visible light and thermal imaging data according to claim 2, wherein the fusion network comprises a first convolutional layer, a dense block, a fusion layer and a plurality of cascaded convolutional layers connected in sequence;
the first convolution layer is used for respectively extracting the depth characteristics of the aligned visible light image and infrared image and outputting the depth characteristics of the visible light image and the infrared image;
the dense block comprises a visible light branch and an infrared branch; the visible light branch comprises three convolution layers which are connected in sequence, and the infrared branch comprises three convolution layers which are connected in sequence; the depth characteristics of the visible light image are respectively used as the input of three convolution layers of the visible light branch, and in the visible light branch, the output of each convolution layer is cascaded into the input of all the convolution layers behind the convolution layer; the depth characteristics of the infrared image are respectively used as the input of three convolution layers of the infrared branch, and in the infrared branch, the output of each convolution layer is cascaded into the input of all the convolution layers behind the convolution layer;
the fusion layer is used for fusing a visible light image characteristic diagram output by the visible light branch and an infrared image characteristic diagram output by the infrared branch by applying L1 norm and softmax operation to output a fusion characteristic diagram;
the plurality of cascaded convolutional layers are used for forming a decoder and converting the fused feature map into a fused picture.
4. The method for drowning person detection based on fusion of visible light and thermal imaging data according to claim 3, characterized in that the fusion network has a loss function LfusBy the pixel loss function LpAnd structural similarity loss function LssimThe weighting results in:
Lfus=λLssim+Lp
Lp=‖O-I‖2
Lssim=1-SSIM(O,I)
wherein L ispRepresenting the euclidean distance between the output image O and the input image I, SSIM (O, I) representing the structural similarity between the output image O and the input image I, the structural similarity comprising three components: correlation, luminance loss and contrast distortion, λ 1000.
5. The method for detecting man-in-water based on fusion of visible light and thermal imaging data as claimed in claim 4, wherein the detection network is a Convolutional Neural Network (CNN), a backbone network thereof adopts modified dark net-53, a last full connection layer is removed, and a convolution is used to realize down-sampling to replace a pooling layer, so as to form a full convolution network using a plurality of residual skip layers;
detecting that the input of the network is a fused picture; the treatment process comprises the following steps: dividing the fused picture into S multiplied by S unit cells, and if the center of an object falls on a certain unit cell, the unit cell is responsible for predicting the object; predicting a plurality of bounding box values for each cell, predicting a confidence coefficient for each bounding box, and performing prediction analysis by taking each cell as a unit;
the output of the detection network is three feature maps with different scales, so that targets with different sizes are detected by adopting multiple scales, and finally, predicted bounding boxes, classification and confidence coefficients are output to identify people falling into water.
6. The drowning person detection method based on visible light and thermal imaging data fusion of claim 5, characterized in that the loss function L of the detection networkdecL introducing error to bounding boxboxError L by categoryclsError L due to sum confidenceobjThe sum of (1):
Ldec=Lbox+Lcls+Lobj
wherein S represents the number of horizontal unit grids, the number of the horizontal unit grids is the same as that of the vertical unit grids, B represents box,indicates whether the ith anchor box of the ith grid is responsible for the object, wiAnd hiFor the predicted width and height of the ith mesh,andwidth and height of the true ith grid; x is the number ofiAnd yiTo predict the center coordinates of the ith grid,andthe central coordinate of the real ith grid is obtained; lambda [ alpha ]coord、λclass、λnobjAnd λobjAre all parameters; p is a radical ofi(c) Is the predicted probability for the class c,class is the true probability of the class c, the set of classes; c. CiFor the purpose of the confidence level of the prediction,for the true confidence, the value is determined by whether the cell is responsible for predicting the object.
7. The method for drowning person detection based on fusion of visible light and thermal imaging data according to claim 6, characterized in that the method further comprises: the step of training the fusion network and the detection network specifically comprises the following steps:
establishing a training set, capturing visible light and infrared images by using a dual-light camera, obtaining a fused image through the registration and fusion processes, and marking the image containing the person falling into the water;
the joint loss function L for both networks is:
L=Lfus+Ldec
and training by using a training set and using the loss function and a gradient descent method to obtain the parameters of the network.
8. A drowning person detection system based on visible light and thermal imaging data fusion, the system comprising: the system comprises an infrared thermal imaging camera, an optical camera, a trained fusion network, a trained detection network, an image registration module, a fusion module and a detection module;
the image registration module is used for simultaneously acquiring an infrared image acquired by the infrared thermal imaging camera and a visible light image acquired by the optical camera and performing image registration on the infrared image and the visible light image;
the fusion module is used for inputting the registered infrared image and visible light image into a trained fusion network and outputting a fusion image;
and the detection module is used for inputting the fusion image into a trained detection network and outputting a detection result of whether the person falls into water or not.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010904133.7A CN111986240A (en) | 2020-09-01 | 2020-09-01 | Drowning person detection method and system based on visible light and thermal imaging data fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010904133.7A CN111986240A (en) | 2020-09-01 | 2020-09-01 | Drowning person detection method and system based on visible light and thermal imaging data fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111986240A true CN111986240A (en) | 2020-11-24 |
Family
ID=73447224
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010904133.7A Pending CN111986240A (en) | 2020-09-01 | 2020-09-01 | Drowning person detection method and system based on visible light and thermal imaging data fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111986240A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112418181A (en) * | 2020-12-13 | 2021-02-26 | 西北工业大学 | Personnel overboard detection method based on convolutional neural network |
CN112560763A (en) * | 2020-12-24 | 2021-03-26 | 国网上海市电力公司 | Target detection method fusing infrared and visible light images |
CN112598637A (en) * | 2020-12-21 | 2021-04-02 | 华能安阳能源有限责任公司 | Automatic flight method for routing inspection of blades of wind turbine generator in blade area |
CN112927139A (en) * | 2021-03-23 | 2021-06-08 | 广东工业大学 | Binocular thermal imaging system and super-resolution image acquisition method |
CN112990149A (en) * | 2021-05-08 | 2021-06-18 | 创新奇智(北京)科技有限公司 | Multi-mode-based high-altitude safety belt detection method, device, equipment and storage medium |
CN113255797A (en) * | 2021-06-02 | 2021-08-13 | 通号智慧城市研究设计院有限公司 | Dangerous goods detection method and system based on deep learning model |
CN113379658A (en) * | 2021-06-01 | 2021-09-10 | 大连海事大学 | Unmanned aerial vehicle observation target feature double-light fusion method and system |
CN113724250A (en) * | 2021-09-26 | 2021-11-30 | 新希望六和股份有限公司 | Animal target counting method based on double-optical camera |
CN114359776A (en) * | 2021-11-25 | 2022-04-15 | 国网安徽省电力有限公司检修分公司 | Flame detection method and device integrating light imaging and thermal imaging |
CN114937309A (en) * | 2022-05-12 | 2022-08-23 | 吉林大学 | Pedestrian detection method based on fusion of visible light image and infrared image, model, electronic device and computer readable medium |
CN115690578A (en) * | 2022-10-26 | 2023-02-03 | 中国电子科技集团公司信息科学研究院 | Image fusion method and target identification method and device |
CN115994911A (en) * | 2023-03-24 | 2023-04-21 | 山东上水环境科技集团有限公司 | Natatorium target detection method based on multi-mode visual information fusion |
CN118196909A (en) * | 2024-05-16 | 2024-06-14 | 杭州巨岩欣成科技有限公司 | Swimming pool struggling behavior identification method, device, computer equipment and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107134144A (en) * | 2017-04-27 | 2017-09-05 | 武汉理工大学 | A kind of vehicle checking method for traffic monitoring |
CN108447036A (en) * | 2018-03-23 | 2018-08-24 | 北京大学 | A kind of low light image Enhancement Method based on convolutional neural networks |
CN108805070A (en) * | 2018-06-05 | 2018-11-13 | 合肥湛达智能科技有限公司 | A kind of deep learning pedestrian detection method based on built-in terminal |
CN109188421A (en) * | 2018-07-25 | 2019-01-11 | 江苏科技大学 | A kind of maritime search and rescue system and method for unmanned rescue boat |
CN110210539A (en) * | 2019-05-22 | 2019-09-06 | 西安电子科技大学 | The RGB-T saliency object detection method of multistage depth characteristic fusion |
CN110473154A (en) * | 2019-07-31 | 2019-11-19 | 西安理工大学 | A kind of image de-noising method based on generation confrontation network |
CN110795991A (en) * | 2019-09-11 | 2020-02-14 | 西安科技大学 | Mining locomotive pedestrian detection method based on multi-information fusion |
CN110929577A (en) * | 2019-10-23 | 2020-03-27 | 桂林电子科技大学 | Improved target identification method based on YOLOv3 lightweight framework |
CN110956094A (en) * | 2019-11-09 | 2020-04-03 | 北京工业大学 | RGB-D multi-mode fusion personnel detection method based on asymmetric double-current network |
US20200143205A1 (en) * | 2017-08-10 | 2020-05-07 | Intel Corporation | Convolutional neural network framework using reverse connections and objectness priors for object detection |
CN111210464A (en) * | 2019-12-30 | 2020-05-29 | 中国船舶重工集团公司第七一一研究所 | System and method for alarming people falling into water based on convolutional neural network and image fusion |
CN111259814A (en) * | 2020-01-17 | 2020-06-09 | 杭州涂鸦信息技术有限公司 | Living body detection method and system |
CN111539247A (en) * | 2020-03-10 | 2020-08-14 | 西安电子科技大学 | Hyper-spectrum face recognition method and device, electronic equipment and storage medium thereof |
-
2020
- 2020-09-01 CN CN202010904133.7A patent/CN111986240A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107134144A (en) * | 2017-04-27 | 2017-09-05 | 武汉理工大学 | A kind of vehicle checking method for traffic monitoring |
US20200143205A1 (en) * | 2017-08-10 | 2020-05-07 | Intel Corporation | Convolutional neural network framework using reverse connections and objectness priors for object detection |
CN108447036A (en) * | 2018-03-23 | 2018-08-24 | 北京大学 | A kind of low light image Enhancement Method based on convolutional neural networks |
CN108805070A (en) * | 2018-06-05 | 2018-11-13 | 合肥湛达智能科技有限公司 | A kind of deep learning pedestrian detection method based on built-in terminal |
CN109188421A (en) * | 2018-07-25 | 2019-01-11 | 江苏科技大学 | A kind of maritime search and rescue system and method for unmanned rescue boat |
CN110210539A (en) * | 2019-05-22 | 2019-09-06 | 西安电子科技大学 | The RGB-T saliency object detection method of multistage depth characteristic fusion |
CN110473154A (en) * | 2019-07-31 | 2019-11-19 | 西安理工大学 | A kind of image de-noising method based on generation confrontation network |
CN110795991A (en) * | 2019-09-11 | 2020-02-14 | 西安科技大学 | Mining locomotive pedestrian detection method based on multi-information fusion |
CN110929577A (en) * | 2019-10-23 | 2020-03-27 | 桂林电子科技大学 | Improved target identification method based on YOLOv3 lightweight framework |
CN110956094A (en) * | 2019-11-09 | 2020-04-03 | 北京工业大学 | RGB-D multi-mode fusion personnel detection method based on asymmetric double-current network |
CN111210464A (en) * | 2019-12-30 | 2020-05-29 | 中国船舶重工集团公司第七一一研究所 | System and method for alarming people falling into water based on convolutional neural network and image fusion |
CN111259814A (en) * | 2020-01-17 | 2020-06-09 | 杭州涂鸦信息技术有限公司 | Living body detection method and system |
CN111539247A (en) * | 2020-03-10 | 2020-08-14 | 西安电子科技大学 | Hyper-spectrum face recognition method and device, electronic equipment and storage medium thereof |
Non-Patent Citations (2)
Title |
---|
冉鑫等: "基于可见光视频图像处理的水上弱小目标检测方法", 《上海海事大学学报》 * |
谢春宇等: "基于深度学习的红外与可见光图像融合方法", 《指挥信息系统与技术》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112418181B (en) * | 2020-12-13 | 2023-05-02 | 西北工业大学 | Personnel falling water detection method based on convolutional neural network |
CN112418181A (en) * | 2020-12-13 | 2021-02-26 | 西北工业大学 | Personnel overboard detection method based on convolutional neural network |
CN112598637A (en) * | 2020-12-21 | 2021-04-02 | 华能安阳能源有限责任公司 | Automatic flight method for routing inspection of blades of wind turbine generator in blade area |
CN112560763A (en) * | 2020-12-24 | 2021-03-26 | 国网上海市电力公司 | Target detection method fusing infrared and visible light images |
CN112927139A (en) * | 2021-03-23 | 2021-06-08 | 广东工业大学 | Binocular thermal imaging system and super-resolution image acquisition method |
CN112927139B (en) * | 2021-03-23 | 2023-06-02 | 广东工业大学 | Binocular thermal imaging system and super-resolution image acquisition method |
CN112990149A (en) * | 2021-05-08 | 2021-06-18 | 创新奇智(北京)科技有限公司 | Multi-mode-based high-altitude safety belt detection method, device, equipment and storage medium |
CN113379658A (en) * | 2021-06-01 | 2021-09-10 | 大连海事大学 | Unmanned aerial vehicle observation target feature double-light fusion method and system |
CN113379658B (en) * | 2021-06-01 | 2024-03-15 | 大连海事大学 | Unmanned aerial vehicle observation target feature double-light fusion method and system |
CN113255797A (en) * | 2021-06-02 | 2021-08-13 | 通号智慧城市研究设计院有限公司 | Dangerous goods detection method and system based on deep learning model |
CN113255797B (en) * | 2021-06-02 | 2024-04-05 | 通号智慧城市研究设计院有限公司 | Dangerous goods detection method and system based on deep learning model |
CN113724250A (en) * | 2021-09-26 | 2021-11-30 | 新希望六和股份有限公司 | Animal target counting method based on double-optical camera |
CN114359776A (en) * | 2021-11-25 | 2022-04-15 | 国网安徽省电力有限公司检修分公司 | Flame detection method and device integrating light imaging and thermal imaging |
CN114359776B (en) * | 2021-11-25 | 2024-04-26 | 国网安徽省电力有限公司检修分公司 | Flame detection method and device integrating light and thermal imaging |
CN114937309A (en) * | 2022-05-12 | 2022-08-23 | 吉林大学 | Pedestrian detection method based on fusion of visible light image and infrared image, model, electronic device and computer readable medium |
CN115690578A (en) * | 2022-10-26 | 2023-02-03 | 中国电子科技集团公司信息科学研究院 | Image fusion method and target identification method and device |
CN115994911A (en) * | 2023-03-24 | 2023-04-21 | 山东上水环境科技集团有限公司 | Natatorium target detection method based on multi-mode visual information fusion |
CN118196909A (en) * | 2024-05-16 | 2024-06-14 | 杭州巨岩欣成科技有限公司 | Swimming pool struggling behavior identification method, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111986240A (en) | Drowning person detection method and system based on visible light and thermal imaging data fusion | |
CN113065558A (en) | Lightweight small target detection method combined with attention mechanism | |
CN111783590A (en) | Multi-class small target detection method based on metric learning | |
Kim et al. | GAN-based synthetic data augmentation for infrared small target detection | |
CN112733950A (en) | Power equipment fault diagnosis method based on combination of image fusion and target detection | |
CN109325395A (en) | The recognition methods of image, convolutional neural networks model training method and device | |
CN110796009A (en) | Method and system for detecting marine vessel based on multi-scale convolution neural network model | |
CN111814661A (en) | Human behavior identification method based on residual error-recurrent neural network | |
CN109919026B (en) | Surface unmanned ship local path planning method | |
CN113159466B (en) | Short-time photovoltaic power generation prediction system and method | |
CN110263768A (en) | A kind of face identification method based on depth residual error network | |
CN110334703B (en) | Ship detection and identification method in day and night image | |
CN109635726B (en) | Landslide identification method based on combination of symmetric deep network and multi-scale pooling | |
CN115393404A (en) | Double-light image registration method, device and equipment and storage medium | |
CN115294655A (en) | Method, device and equipment for countermeasures generation pedestrian re-recognition based on multilevel module features of non-local mechanism | |
CN115861756A (en) | Earth background small target identification method based on cascade combination network | |
CN116994135A (en) | Ship target detection method based on vision and radar fusion | |
Lari et al. | Automated building extraction from high-resolution satellite imagery using spectral and structural information based on artificial neural networks | |
CN112069997B (en) | Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net | |
CN117576461A (en) | Semantic understanding method, medium and system for transformer substation scene | |
CN117333948A (en) | End-to-end multi-target broiler behavior identification method integrating space-time attention mechanism | |
CN117372697A (en) | Point cloud segmentation method and system for single-mode sparse orbit scene | |
CN111898671A (en) | Target identification method and system based on fusion of laser imager and color camera codes | |
CN115471782B (en) | Unmanned ship-oriented infrared ship target detection method and device | |
WO2023222643A1 (en) | Method for image segmentation matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201124 |
|
RJ01 | Rejection of invention patent application after publication |