CN111986240A - Drowning person detection method and system based on visible light and thermal imaging data fusion - Google Patents

Drowning person detection method and system based on visible light and thermal imaging data fusion Download PDF

Info

Publication number
CN111986240A
CN111986240A CN202010904133.7A CN202010904133A CN111986240A CN 111986240 A CN111986240 A CN 111986240A CN 202010904133 A CN202010904133 A CN 202010904133A CN 111986240 A CN111986240 A CN 111986240A
Authority
CN
China
Prior art keywords
image
visible light
fusion
infrared
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010904133.7A
Other languages
Chinese (zh)
Inventor
文捷
祝闯
李春旭
贾昕宇
姚治萱
刘军
耿雄飞
乔媛媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
China Waterborne Transport Research Institute
Original Assignee
Beijing University of Posts and Telecommunications
China Waterborne Transport Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications, China Waterborne Transport Research Institute filed Critical Beijing University of Posts and Telecommunications
Priority to CN202010904133.7A priority Critical patent/CN111986240A/en
Publication of CN111986240A publication Critical patent/CN111986240A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method and a system for detecting people falling into water based on fusion of visible light and thermal imaging data, wherein the method comprises the following steps: use two optical cameras to acquire visible light image and infrared image simultaneously, two optical cameras include: an optical camera and an infrared thermal imaging camera; carrying out image registration on the infrared image and the visible light image; inputting the registered infrared image and visible light image into a pre-trained fusion network, and outputting a fusion image; and inputting the fused image into a pre-trained detection network, and outputting a detection result of whether the person falls into water or not. The method disclosed by the invention fuses the visible light images and the infrared images, so that the images not only highlight human bodies, but also contain certain texture characteristics, and the detection accuracy and recall rate can be greatly improved.

Description

Drowning person detection method and system based on visible light and thermal imaging data fusion
Technical Field
The invention relates to the field of search and rescue, in particular to a drowning person detection method and system based on visible light and thermal imaging data fusion.
Background
Every year, people fall into water, crews fall into water accidentally, visitors fall into water accidentally, ships turn over and sink, and the like, so that tens of thousands of people die of drowning every year. Mainly, the water flow is turbulent, the area of the water area is large, and people falling into the water are difficult to find and position. With the upgrade of computing hardware and the optimization of artificial intelligence algorithms, image processing and detection have been applied to solve various problems, and the problem of detection by people falling into water still needs to be solved urgently.
Image fusion is an enhanced technology, and aims to combine images acquired by different types of sensors to generate an image with stronger robustness or richer information so as to facilitate subsequent processing or help decision making.
First, their signals come from different forms, providing different aspects of scene information, i.e. visible light images capture reflected light, while infrared images capture thermal radiation, and therefore this combination is more informative than the single-modality signals. Second, infrared and visible light images exhibit characteristics inherent to almost all objects, and these images can be obtained with relatively simple equipment. And finally, the infrared image and the visible light image have complementary characteristics, so that a fused image with strong robustness and rich information is generated. Visible light images generally have high spatial resolution and considerable detail and contrast, and therefore they conform to the human visual perception. However, these images are susceptible to adverse conditions such as low light, fog, and other adverse weather effects. While infrared images, which describe the thermal radiation of an object, are resistant to these disturbances, they are generally of lower resolution and of poorer texture. Visible and infrared image fusion techniques have a wider range of applications than other fusion techniques due to the ubiquitous nature and complementarity of the images utilized.
The visible light and infrared image fusion has great significance for personnel detection, especially for personnel detection falling into water. Firstly, if only use the visible light image to detect, the people is in rivers torrent and unclear river, and the proportion of the surface of water that exposes when falling into water is very little, and the personnel of falling into water almost fuses with river water as an organic whole, and naked eye and camera are all difficult to distinguish, even very outstanding detection algorithm also is difficult to detect accurately and without omission, and the light condition is good fashion just so, just can't detect under dark night or the fog condition completely. The infrared image can well distinguish the human body from the background, and the human body is higher in temperature compared with river water and the brightness of the human body reflected on the infrared image is higher than that of the river water, so that the infrared image is more prominent. However, the infrared image has low resolution and lacks texture features, only rough contour information can be acquired, and if a high-temperature object with a shape similar to that of a person falling into water exists in the picture, misjudgment and missed judgment are easily caused.
Disclosure of Invention
The present invention is directed to overcoming the technical defects, and embodiment 1 of the present invention provides a method for detecting a man falling into water based on visible light and thermal imaging data fusion, where the method includes:
use two optical cameras to acquire visible light image and infrared image simultaneously, two optical cameras include: an optical camera and an infrared thermal imaging camera;
carrying out image registration on the infrared image and the visible light image;
inputting the registered infrared image and visible light image into a pre-trained fusion network, and outputting a fusion image;
and inputting the fused image into a pre-trained detection network, and outputting a detection result of whether the person falls into water or not.
As an improvement of the above method, the image registration of the infrared image and the visible light image specifically includes:
respectively extracting an edge map of the infrared image and an edge map of the visible light image;
aligning the edge graph of the infrared image and the edge graph of the visible light image to obtain an aligned edge graph;
and respectively carrying out image conversion on the infrared image and the visible light image according to the aligned edge images to obtain the aligned infrared image and visible light image.
As an improvement of the above method, the fusion network comprises a first convolutional layer, a dense block, a fusion layer and a plurality of cascaded convolutional layers which are connected in sequence;
the first convolution layer is used for respectively extracting the depth characteristics of the aligned visible light image and infrared image and outputting the depth characteristics of the visible light image and the infrared image;
the dense block comprises a visible light branch and an infrared branch; the visible light branch comprises three convolution layers which are connected in sequence, and the infrared branch comprises three convolution layers which are connected in sequence; the depth characteristics of the visible light image are respectively used as the input of three convolution layers of the visible light branch, and in the visible light branch, the output of each convolution layer is cascaded into the input of all the convolution layers behind the convolution layer; the depth characteristics of the infrared image are respectively used as the input of three convolution layers of the infrared branch, and in the infrared branch, the output of each convolution layer is cascaded into the input of all the convolution layers behind the convolution layer;
the fusion layer is used for fusing a visible light image characteristic diagram output by the visible light branch and an infrared image characteristic diagram output by the infrared branch by applying L1 norm and softmax operation to output a fusion characteristic diagram;
the plurality of cascaded convolutional layers are used for forming a decoder and converting the fused feature map into a fused picture.
As an improvement of the above method, the loss function L of the converged networkfusBy the pixel loss function LpAnd structural similarity loss function LssimThe weighting results in:
Lfus=λLssim+Lp
Lp=‖O-I‖2
Lssim=1-SSIM(O,I)
wherein L ispRepresenting the euclidean distance between the output image O and the input image I, SSIM (O, I) representing the structural similarity between the output image O and the input image I, the structural similarity comprising three components: correlation, luminance loss and contrast distortion, λ 1000.
As an improvement of the above method, the detection network is a convolutional neural network CNN, and its backbone network adopts modified dark net-53, which removes the last full connection layer, and uses convolution to realize down-sampling instead of pooling layer, forming a full convolutional network using many residual error layer jumps;
detecting that the input of the network is a fused picture; the treatment process comprises the following steps: dividing the fused picture into S multiplied by S unit cells, and if the center of an object falls on a certain unit cell, the unit cell is responsible for predicting the object; predicting a plurality of bounding box values for each cell, predicting a confidence coefficient for each bounding box, and performing prediction analysis by taking each cell as a unit;
the output of the detection network is three feature maps with different scales, so that targets with different sizes are detected by adopting multiple scales, and finally, predicted bounding boxes, classification and confidence coefficients are output to identify people falling into water.
As an improvement of the above-mentioned method,loss function L of the detection networkdecL introducing error to bounding boxboxError L by categoryclsError L due to sum confidenceobjThe sum of (1):
Ldec=Lbox+Lcls+Lobj
Figure BDA0002660784590000031
Figure BDA0002660784590000032
Figure BDA0002660784590000033
wherein S represents the number of horizontal unit grids, the number of the horizontal unit grids is the same as that of the vertical unit grids, B represents box,
Figure BDA0002660784590000034
indicates whether the ith anchor box of the ith grid is responsible for the object, wiAnd hiFor the predicted width and height of the ith mesh,
Figure BDA0002660784590000035
and
Figure BDA0002660784590000036
width and height of the true ith grid; x is the number ofiAnd yiTo predict the center coordinates of the ith grid,
Figure BDA0002660784590000037
and
Figure BDA0002660784590000038
the central coordinate of the real ith grid is obtained; lambda [ alpha ]coord、λclass、λnobjAnd λobjAre all parameters; p is a radical ofi(c) Is the predicted probability for the class c,
Figure BDA0002660784590000041
class is the true probability of the class c, the set of classes; c. CiFor the purpose of the confidence level of the prediction,
Figure BDA0002660784590000042
for the true confidence, the value is determined by whether the cell is responsible for predicting the object.
As an improvement of the above method, the method further comprises: the step of training the fusion network and the detection network specifically comprises the following steps:
establishing a training set, capturing visible light and infrared images by using a dual-light camera, obtaining a fused image through the registration and fusion processes, and marking the image containing the person falling into the water;
the joint loss function L for both networks is:
L=Lfus+Ldec
and training by using a training set and using the loss function and a gradient descent method to obtain the parameters of the network.
The embodiment 2 of the invention provides a drowning person detection system based on visible light and thermal imaging data fusion, which comprises: the system comprises an infrared thermal imaging camera, an optical camera, a trained fusion network, a trained detection network, an image registration module, a fusion module and a detection module;
the image registration module is used for simultaneously acquiring an infrared image acquired by the infrared thermal imaging camera and a visible light image acquired by the optical camera and performing image registration on the infrared image and the visible light image;
the fusion module is used for inputting the registered infrared image and visible light image into a trained fusion network and outputting a fusion image;
and the detection module is used for inputting the fusion image into a trained detection network and outputting a detection result of whether the person falls into water or not.
The invention has the advantages that:
the method and the system fuse the visible light image and the infrared image, so that the image not only highlights the human body, but also contains certain texture characteristics, and the detection accuracy and the recall rate are greatly improved.
Drawings
FIG. 1 is a flow chart of image registration of the present invention;
FIG. 2 is a schematic diagram of a converged network of the present invention;
FIG. 3 is a schematic diagram of a fusion layer of the fusion network of the present invention;
FIG. 4 is a schematic diagram of a detection network of the present invention;
FIG. 5 is a schematic illustration of the use of multiple scales to detect targets of different sizes;
FIG. 6 is a schematic diagram of the detection-fusion reverse training of the present invention;
fig. 7 is a flowchart of the method for detecting a person falling into water based on fusion of visible light and thermal imaging data according to the present invention.
Detailed Description
The technical solution of the present invention will be described in detail below with reference to the accompanying drawings.
The embodiment 1 of the invention provides a drowning person detection method based on visible light and thermal imaging data fusion, which comprises the following steps:
step 1) image acquisition and registration
Step 1-1) image acquisition
Use two optical cameras to acquire visible light image and infrared image simultaneously, two optical cameras include: optical cameras and infrared thermal imaging cameras.
Step 1-2) image registration
Since the infrared image and the visible image are acquired by different sensors, and are usually different in size, perspective and field of view, the dual-optical camera also causes a difference in the viewing angle. Successful image fusion, however, requires strict geometric alignment of the fused images, thus requiring registration of the visible and infrared images prior to fusion. Registration of infrared images with visible light images is a multimodal registration problem.
For the registration problem here, a feature-based registration method is used, which first extracts two groups of salient structures, then determines the correct correspondence between them, and estimates the spatial transformation accordingly, which is then used to align the given image pair.
The first step of the feature-based approach is to extract robust common features that can represent the original image. Edge information is one of the most common choices in infrared and visible image registration, as shown in fig. 1, because the size and direction of edge information can be well preserved by different registration methods. Edge mapping can be discretized into a set of points, and one popular strategy for solving the point matching problem involves two steps: a set of hypothetical correspondences is computed and then outliers are removed by geometric constraints. The given parametric model is estimated by computing feature descriptors at points, eliminating matches between points with too different descriptors, removing false matches from the set of hypotheses using random sample consensus (RANSAC), and attempting to obtain the smallest possible non-subset of outliers by resampling using the hypothesis-verification method.
Step 2) establishing a fusion network for image fusion
A deep learning architecture is employed that addresses the problem of infrared and visible image fusion. Compared with the traditional convolutional network, the coding network is combined with convolutional layers, fusion layers and dense blocks, wherein the output of each layer is connected with each other, the system structure is used for acquiring more useful features from a source image in the coding process, a proper fusion strategy is selected for fusing the features, and finally a fused image is reconstructed through a decoder.
As shown in fig. 2, the depth features of the visible light image and the infrared image are extracted before fusion, the first convolutional layer extracts the coarse features, and then three convolutional layers (the output of each layer is cascaded as the input of the subsequent layer) constitute a dense block. Such an architecture has two advantages. First, the size of the filter and the step size of the convolution operation are 3 × 3 and 1, respectively. Using this strategy, the input image can be any size; second, dense blocks can preserve depth features as much as possible in the coding network, and this operation can ensure that all salient features are used in the fusion strategy.
As shown in fig. 3, the L1 norm and softmax operations are applied at the fusion level.
The fused layer includes a plurality of convolutional layers (3 × 3 convolutions), the output of the fused layer is the input of the convolutional layers, and the plurality of convolutional layers are used to reconstruct the fused image to constitute a decoder, and the fused feature map is converted into a fused picture. This simple and efficient architecture is used to reconstruct the final fused image.
Loss function of fusion network is composed of pixel loss function LpAnd structural similarity loss function LssimThe weighting results in:
Lp=‖O-I‖2
Lssim=1-SSIM(O,I)
Lfus=λLssim+Lp
where O and I denote the output image and the input image, respectively. L ispIs the euclidean distance between the output O and the input I, SSIM represents the structural similarity, which represents the structural similarity of two images, the index mainly consisting of three parts: correlation, brightness loss and contrast distortion, and the product of the three components is the evaluation result of the fused image. Since there is a difference of three orders of magnitude between the pixel loss and the SSIM loss, λ is set to 1000 during the training phase.
Step 3) establishing a detection network for detecting the person falling into water
The convolutional neural network CNN is adopted to carry out target recognition on people falling into water, the central idea of the detection network is to divide a picture into S multiplied by S unit cells, and if the center of an object falls on a certain unit cell, the unit cell is responsible for predicting the object. Multiple bounding box values are predicted for each cell, a confidence is predicted for each bounding box, and prediction analysis is performed on a per cell basis.
The backbone network employs a modified darknet-53 as shown in fig. 4. The network has high classification precision, high calculation speed and fewer network layers, removes all-connection layers, is a full convolution network, largely uses layer jump connection of residual errors, abandons a pooling layer in order to reduce gradient negative effects caused by pooling, and realizes down-sampling by using the step length of the convolution layer. In this network structure, the down-sampling is performed using a convolution with a step size of 2.
The network outputs three feature maps of different scales, with reference to the FPN, and adopts multiple scales to detect targets of different sizes, so that more precise units can detect more precise objects. As shown in fig. 5.
Before model training, firstly, a data set of fused images needs to be made, visible light images and infrared images are captured and captured through double-light cameras, the fused images are obtained through the registration and fusion processes, personnel falling into water are marked to make the data set of a format required by training, a pre-training model is selected for training, and an algorithm model capable of identifying the personnel falling into water in the visible light infrared fused images is obtained. And then, indexes such as the accuracy of the model are evaluated, and optimization is performed from the aspects of data sets, algorithms and the like, so that a better identification effect can be achieved.
The loss function of the detection network is divided into three parts, L brought by a bounding boxboxL by confidenceobjError L due to sum classcls
Figure BDA0002660784590000071
Figure BDA0002660784590000072
Figure BDA0002660784590000073
Wherein S represents the number of horizontal unit grids, the number of the horizontal unit grids is the same as that of the vertical unit grids, B represents box,
Figure BDA0002660784590000074
indicates whether the ith anchor box of the ith grid is responsible for the object, wiAnd hiFor the predicted width and height of the ith mesh,
Figure BDA0002660784590000075
and
Figure BDA0002660784590000076
width and height of the true ith grid; x is the number ofiAnd yiTo predict the center coordinates of the ith grid,
Figure BDA0002660784590000077
and
Figure BDA0002660784590000078
the central coordinate of the real ith grid is obtained; lambda [ alpha ]coord、λclass、λnobjAnd λobjAre all parameters; p is a radical ofi(c) Is the predicted probability for the class c,
Figure BDA0002660784590000079
class is the true probability of the class c, the set of classes; c. CiFor the purpose of the confidence level of the prediction,
Figure BDA00026607845900000710
for the true confidence, the value is determined by whether the cell is responsible for predicting the object.
The loss function is the sum of the above three errors:
Ldec=Lbox+Lcls+Lobj
L=Lfus+Ldec
step 4) detection-fusion reverse training
The purpose of the common visible light infrared image fusion technology is to make a fused image contain information of two images as much as possible, not lose contrast information in the infrared image and not lose texture information in the visible light image, so to speak, make the fused image more conform to the human visual system, so that a loss function of an initial fusion process is defined as a weighted sum of a pixel loss function and a structural similarity loss function.
The method of the invention is mainly characterized in that people falling into water can be accurately detected, the image fusion result is only an intermediate process, and the optimization of the image fusion result is the final target of accurate detection in both the image fusion process and the detection process. In order to achieve the final goal, the training of image fusion should be corrected, so that the loss function of the detection process can guide fusion, and the final detection result is optimized in the fusion stage.
As shown in fig. 6, firstly, a person falling into the water is marked on the registered visible light or infrared image, since the images are registered and aligned, and the position of the fused target is unchanged, the label can be copied on the fused image as a groudtruth, the fused image passes through a detection network to obtain a predicted bounding box, classification and confidence, and the predicted bounding box, classification and confidence are compared with the label to obtain a detection error, i.e., LdecThe loss function is not only used for evaluating and optimizing the detection network, but also used for evaluating and optimizing the fusion network, and is equivalent to the loss function of the fusion network to be corrected as follows:
L=Lfus+Ldec
in this way, the achievement of the final objective is facilitated.
And 5) detecting the person falling into the water by using image acquisition, image registration, image fusion and target detection so as to facilitate subsequent positioning and rescue. As shown in fig. 7.
The embodiment 2 of the invention provides a drowning person detection system based on visible light and thermal imaging data fusion, which comprises: the system comprises an infrared thermal imaging camera, an optical camera, a trained fusion network, a trained detection network, an image registration module, a fusion module and a detection module;
the image registration module is used for simultaneously acquiring an infrared image acquired by the infrared thermal imaging camera and a visible light image acquired by the optical camera and performing image registration on the infrared image and the visible light image;
the fusion module is used for inputting the registered infrared image and visible light image into a trained fusion network and outputting a fusion image;
and the detection module is used for inputting the fusion image into a trained detection network and outputting a detection result of whether the person falls into water or not.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (8)

1. A method of drowning person detection based on visible light and thermal imaging data fusion, the method comprising:
use two optical cameras to acquire visible light image and infrared image simultaneously, two optical cameras include: an optical camera and an infrared thermal imaging camera;
carrying out image registration on the infrared image and the visible light image;
inputting the registered infrared image and visible light image into a pre-trained fusion network, and outputting a fusion image;
and inputting the fused image into a pre-trained detection network, and outputting a detection result of whether the person falls into water or not.
2. The method for detecting man overboard based on fusion of visible light and thermal imaging data as claimed in claim 1, wherein the image registration of the infrared image and the visible light image specifically comprises:
respectively extracting an edge map of the infrared image and an edge map of the visible light image;
aligning the edge graph of the infrared image and the edge graph of the visible light image to obtain an aligned edge graph;
and respectively carrying out image conversion on the infrared image and the visible light image according to the aligned edge images to obtain the aligned infrared image and visible light image.
3. The method for detecting man-in-the-water based on fusion of visible light and thermal imaging data according to claim 2, wherein the fusion network comprises a first convolutional layer, a dense block, a fusion layer and a plurality of cascaded convolutional layers connected in sequence;
the first convolution layer is used for respectively extracting the depth characteristics of the aligned visible light image and infrared image and outputting the depth characteristics of the visible light image and the infrared image;
the dense block comprises a visible light branch and an infrared branch; the visible light branch comprises three convolution layers which are connected in sequence, and the infrared branch comprises three convolution layers which are connected in sequence; the depth characteristics of the visible light image are respectively used as the input of three convolution layers of the visible light branch, and in the visible light branch, the output of each convolution layer is cascaded into the input of all the convolution layers behind the convolution layer; the depth characteristics of the infrared image are respectively used as the input of three convolution layers of the infrared branch, and in the infrared branch, the output of each convolution layer is cascaded into the input of all the convolution layers behind the convolution layer;
the fusion layer is used for fusing a visible light image characteristic diagram output by the visible light branch and an infrared image characteristic diagram output by the infrared branch by applying L1 norm and softmax operation to output a fusion characteristic diagram;
the plurality of cascaded convolutional layers are used for forming a decoder and converting the fused feature map into a fused picture.
4. The method for drowning person detection based on fusion of visible light and thermal imaging data according to claim 3, characterized in that the fusion network has a loss function LfusBy the pixel loss function LpAnd structural similarity loss function LssimThe weighting results in:
Lfus=λLssim+Lp
Lp=‖O-I‖2
Lssim=1-SSIM(O,I)
wherein L ispRepresenting the euclidean distance between the output image O and the input image I, SSIM (O, I) representing the structural similarity between the output image O and the input image I, the structural similarity comprising three components: correlation, luminance loss and contrast distortion, λ 1000.
5. The method for detecting man-in-water based on fusion of visible light and thermal imaging data as claimed in claim 4, wherein the detection network is a Convolutional Neural Network (CNN), a backbone network thereof adopts modified dark net-53, a last full connection layer is removed, and a convolution is used to realize down-sampling to replace a pooling layer, so as to form a full convolution network using a plurality of residual skip layers;
detecting that the input of the network is a fused picture; the treatment process comprises the following steps: dividing the fused picture into S multiplied by S unit cells, and if the center of an object falls on a certain unit cell, the unit cell is responsible for predicting the object; predicting a plurality of bounding box values for each cell, predicting a confidence coefficient for each bounding box, and performing prediction analysis by taking each cell as a unit;
the output of the detection network is three feature maps with different scales, so that targets with different sizes are detected by adopting multiple scales, and finally, predicted bounding boxes, classification and confidence coefficients are output to identify people falling into water.
6. The drowning person detection method based on visible light and thermal imaging data fusion of claim 5, characterized in that the loss function L of the detection networkdecL introducing error to bounding boxboxError L by categoryclsError L due to sum confidenceobjThe sum of (1):
Ldec=Lbox+Lcls+Lobj
Figure FDA0002660784580000021
Figure FDA0002660784580000022
Figure FDA0002660784580000023
wherein S represents the number of horizontal unit grids, the number of the horizontal unit grids is the same as that of the vertical unit grids, B represents box,
Figure FDA0002660784580000031
indicates whether the ith anchor box of the ith grid is responsible for the object, wiAnd hiFor the predicted width and height of the ith mesh,
Figure FDA0002660784580000032
and
Figure FDA0002660784580000033
width and height of the true ith grid; x is the number ofiAnd yiTo predict the center coordinates of the ith grid,
Figure FDA0002660784580000034
and
Figure FDA0002660784580000035
the central coordinate of the real ith grid is obtained; lambda [ alpha ]coord、λclass、λnobjAnd λobjAre all parameters; p is a radical ofi(c) Is the predicted probability for the class c,
Figure FDA0002660784580000036
class is the true probability of the class c, the set of classes; c. CiFor the purpose of the confidence level of the prediction,
Figure FDA0002660784580000037
for the true confidence, the value is determined by whether the cell is responsible for predicting the object.
7. The method for drowning person detection based on fusion of visible light and thermal imaging data according to claim 6, characterized in that the method further comprises: the step of training the fusion network and the detection network specifically comprises the following steps:
establishing a training set, capturing visible light and infrared images by using a dual-light camera, obtaining a fused image through the registration and fusion processes, and marking the image containing the person falling into the water;
the joint loss function L for both networks is:
L=Lfus+Ldec
and training by using a training set and using the loss function and a gradient descent method to obtain the parameters of the network.
8. A drowning person detection system based on visible light and thermal imaging data fusion, the system comprising: the system comprises an infrared thermal imaging camera, an optical camera, a trained fusion network, a trained detection network, an image registration module, a fusion module and a detection module;
the image registration module is used for simultaneously acquiring an infrared image acquired by the infrared thermal imaging camera and a visible light image acquired by the optical camera and performing image registration on the infrared image and the visible light image;
the fusion module is used for inputting the registered infrared image and visible light image into a trained fusion network and outputting a fusion image;
and the detection module is used for inputting the fusion image into a trained detection network and outputting a detection result of whether the person falls into water or not.
CN202010904133.7A 2020-09-01 2020-09-01 Drowning person detection method and system based on visible light and thermal imaging data fusion Pending CN111986240A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010904133.7A CN111986240A (en) 2020-09-01 2020-09-01 Drowning person detection method and system based on visible light and thermal imaging data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010904133.7A CN111986240A (en) 2020-09-01 2020-09-01 Drowning person detection method and system based on visible light and thermal imaging data fusion

Publications (1)

Publication Number Publication Date
CN111986240A true CN111986240A (en) 2020-11-24

Family

ID=73447224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010904133.7A Pending CN111986240A (en) 2020-09-01 2020-09-01 Drowning person detection method and system based on visible light and thermal imaging data fusion

Country Status (1)

Country Link
CN (1) CN111986240A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418181A (en) * 2020-12-13 2021-02-26 西北工业大学 Personnel overboard detection method based on convolutional neural network
CN112560763A (en) * 2020-12-24 2021-03-26 国网上海市电力公司 Target detection method fusing infrared and visible light images
CN112598637A (en) * 2020-12-21 2021-04-02 华能安阳能源有限责任公司 Automatic flight method for routing inspection of blades of wind turbine generator in blade area
CN112927139A (en) * 2021-03-23 2021-06-08 广东工业大学 Binocular thermal imaging system and super-resolution image acquisition method
CN112990149A (en) * 2021-05-08 2021-06-18 创新奇智(北京)科技有限公司 Multi-mode-based high-altitude safety belt detection method, device, equipment and storage medium
CN113255797A (en) * 2021-06-02 2021-08-13 通号智慧城市研究设计院有限公司 Dangerous goods detection method and system based on deep learning model
CN113379658A (en) * 2021-06-01 2021-09-10 大连海事大学 Unmanned aerial vehicle observation target feature double-light fusion method and system
CN113724250A (en) * 2021-09-26 2021-11-30 新希望六和股份有限公司 Animal target counting method based on double-optical camera
CN114359776A (en) * 2021-11-25 2022-04-15 国网安徽省电力有限公司检修分公司 Flame detection method and device integrating light imaging and thermal imaging
CN114937309A (en) * 2022-05-12 2022-08-23 吉林大学 Pedestrian detection method based on fusion of visible light image and infrared image, model, electronic device and computer readable medium
CN115690578A (en) * 2022-10-26 2023-02-03 中国电子科技集团公司信息科学研究院 Image fusion method and target identification method and device
CN115994911A (en) * 2023-03-24 2023-04-21 山东上水环境科技集团有限公司 Natatorium target detection method based on multi-mode visual information fusion
CN118196909A (en) * 2024-05-16 2024-06-14 杭州巨岩欣成科技有限公司 Swimming pool struggling behavior identification method, device, computer equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107134144A (en) * 2017-04-27 2017-09-05 武汉理工大学 A kind of vehicle checking method for traffic monitoring
CN108447036A (en) * 2018-03-23 2018-08-24 北京大学 A kind of low light image Enhancement Method based on convolutional neural networks
CN108805070A (en) * 2018-06-05 2018-11-13 合肥湛达智能科技有限公司 A kind of deep learning pedestrian detection method based on built-in terminal
CN109188421A (en) * 2018-07-25 2019-01-11 江苏科技大学 A kind of maritime search and rescue system and method for unmanned rescue boat
CN110210539A (en) * 2019-05-22 2019-09-06 西安电子科技大学 The RGB-T saliency object detection method of multistage depth characteristic fusion
CN110473154A (en) * 2019-07-31 2019-11-19 西安理工大学 A kind of image de-noising method based on generation confrontation network
CN110795991A (en) * 2019-09-11 2020-02-14 西安科技大学 Mining locomotive pedestrian detection method based on multi-information fusion
CN110929577A (en) * 2019-10-23 2020-03-27 桂林电子科技大学 Improved target identification method based on YOLOv3 lightweight framework
CN110956094A (en) * 2019-11-09 2020-04-03 北京工业大学 RGB-D multi-mode fusion personnel detection method based on asymmetric double-current network
US20200143205A1 (en) * 2017-08-10 2020-05-07 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection
CN111210464A (en) * 2019-12-30 2020-05-29 中国船舶重工集团公司第七一一研究所 System and method for alarming people falling into water based on convolutional neural network and image fusion
CN111259814A (en) * 2020-01-17 2020-06-09 杭州涂鸦信息技术有限公司 Living body detection method and system
CN111539247A (en) * 2020-03-10 2020-08-14 西安电子科技大学 Hyper-spectrum face recognition method and device, electronic equipment and storage medium thereof

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107134144A (en) * 2017-04-27 2017-09-05 武汉理工大学 A kind of vehicle checking method for traffic monitoring
US20200143205A1 (en) * 2017-08-10 2020-05-07 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection
CN108447036A (en) * 2018-03-23 2018-08-24 北京大学 A kind of low light image Enhancement Method based on convolutional neural networks
CN108805070A (en) * 2018-06-05 2018-11-13 合肥湛达智能科技有限公司 A kind of deep learning pedestrian detection method based on built-in terminal
CN109188421A (en) * 2018-07-25 2019-01-11 江苏科技大学 A kind of maritime search and rescue system and method for unmanned rescue boat
CN110210539A (en) * 2019-05-22 2019-09-06 西安电子科技大学 The RGB-T saliency object detection method of multistage depth characteristic fusion
CN110473154A (en) * 2019-07-31 2019-11-19 西安理工大学 A kind of image de-noising method based on generation confrontation network
CN110795991A (en) * 2019-09-11 2020-02-14 西安科技大学 Mining locomotive pedestrian detection method based on multi-information fusion
CN110929577A (en) * 2019-10-23 2020-03-27 桂林电子科技大学 Improved target identification method based on YOLOv3 lightweight framework
CN110956094A (en) * 2019-11-09 2020-04-03 北京工业大学 RGB-D multi-mode fusion personnel detection method based on asymmetric double-current network
CN111210464A (en) * 2019-12-30 2020-05-29 中国船舶重工集团公司第七一一研究所 System and method for alarming people falling into water based on convolutional neural network and image fusion
CN111259814A (en) * 2020-01-17 2020-06-09 杭州涂鸦信息技术有限公司 Living body detection method and system
CN111539247A (en) * 2020-03-10 2020-08-14 西安电子科技大学 Hyper-spectrum face recognition method and device, electronic equipment and storage medium thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冉鑫等: "基于可见光视频图像处理的水上弱小目标检测方法", 《上海海事大学学报》 *
谢春宇等: "基于深度学习的红外与可见光图像融合方法", 《指挥信息系统与技术》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418181B (en) * 2020-12-13 2023-05-02 西北工业大学 Personnel falling water detection method based on convolutional neural network
CN112418181A (en) * 2020-12-13 2021-02-26 西北工业大学 Personnel overboard detection method based on convolutional neural network
CN112598637A (en) * 2020-12-21 2021-04-02 华能安阳能源有限责任公司 Automatic flight method for routing inspection of blades of wind turbine generator in blade area
CN112560763A (en) * 2020-12-24 2021-03-26 国网上海市电力公司 Target detection method fusing infrared and visible light images
CN112927139A (en) * 2021-03-23 2021-06-08 广东工业大学 Binocular thermal imaging system and super-resolution image acquisition method
CN112927139B (en) * 2021-03-23 2023-06-02 广东工业大学 Binocular thermal imaging system and super-resolution image acquisition method
CN112990149A (en) * 2021-05-08 2021-06-18 创新奇智(北京)科技有限公司 Multi-mode-based high-altitude safety belt detection method, device, equipment and storage medium
CN113379658A (en) * 2021-06-01 2021-09-10 大连海事大学 Unmanned aerial vehicle observation target feature double-light fusion method and system
CN113379658B (en) * 2021-06-01 2024-03-15 大连海事大学 Unmanned aerial vehicle observation target feature double-light fusion method and system
CN113255797A (en) * 2021-06-02 2021-08-13 通号智慧城市研究设计院有限公司 Dangerous goods detection method and system based on deep learning model
CN113255797B (en) * 2021-06-02 2024-04-05 通号智慧城市研究设计院有限公司 Dangerous goods detection method and system based on deep learning model
CN113724250A (en) * 2021-09-26 2021-11-30 新希望六和股份有限公司 Animal target counting method based on double-optical camera
CN114359776A (en) * 2021-11-25 2022-04-15 国网安徽省电力有限公司检修分公司 Flame detection method and device integrating light imaging and thermal imaging
CN114359776B (en) * 2021-11-25 2024-04-26 国网安徽省电力有限公司检修分公司 Flame detection method and device integrating light and thermal imaging
CN114937309A (en) * 2022-05-12 2022-08-23 吉林大学 Pedestrian detection method based on fusion of visible light image and infrared image, model, electronic device and computer readable medium
CN115690578A (en) * 2022-10-26 2023-02-03 中国电子科技集团公司信息科学研究院 Image fusion method and target identification method and device
CN115994911A (en) * 2023-03-24 2023-04-21 山东上水环境科技集团有限公司 Natatorium target detection method based on multi-mode visual information fusion
CN118196909A (en) * 2024-05-16 2024-06-14 杭州巨岩欣成科技有限公司 Swimming pool struggling behavior identification method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111986240A (en) Drowning person detection method and system based on visible light and thermal imaging data fusion
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN111783590A (en) Multi-class small target detection method based on metric learning
Kim et al. GAN-based synthetic data augmentation for infrared small target detection
CN112733950A (en) Power equipment fault diagnosis method based on combination of image fusion and target detection
CN109325395A (en) The recognition methods of image, convolutional neural networks model training method and device
CN110796009A (en) Method and system for detecting marine vessel based on multi-scale convolution neural network model
CN111814661A (en) Human behavior identification method based on residual error-recurrent neural network
CN109919026B (en) Surface unmanned ship local path planning method
CN113159466B (en) Short-time photovoltaic power generation prediction system and method
CN110263768A (en) A kind of face identification method based on depth residual error network
CN110334703B (en) Ship detection and identification method in day and night image
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN115393404A (en) Double-light image registration method, device and equipment and storage medium
CN115294655A (en) Method, device and equipment for countermeasures generation pedestrian re-recognition based on multilevel module features of non-local mechanism
CN115861756A (en) Earth background small target identification method based on cascade combination network
CN116994135A (en) Ship target detection method based on vision and radar fusion
Lari et al. Automated building extraction from high-resolution satellite imagery using spectral and structural information based on artificial neural networks
CN112069997B (en) Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net
CN117576461A (en) Semantic understanding method, medium and system for transformer substation scene
CN117333948A (en) End-to-end multi-target broiler behavior identification method integrating space-time attention mechanism
CN117372697A (en) Point cloud segmentation method and system for single-mode sparse orbit scene
CN111898671A (en) Target identification method and system based on fusion of laser imager and color camera codes
CN115471782B (en) Unmanned ship-oriented infrared ship target detection method and device
WO2023222643A1 (en) Method for image segmentation matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201124

RJ01 Rejection of invention patent application after publication