CN114511879A - Multisource fusion human body target detection method based on VIS-IR image - Google Patents

Multisource fusion human body target detection method based on VIS-IR image Download PDF

Info

Publication number
CN114511879A
CN114511879A CN202210072239.4A CN202210072239A CN114511879A CN 114511879 A CN114511879 A CN 114511879A CN 202210072239 A CN202210072239 A CN 202210072239A CN 114511879 A CN114511879 A CN 114511879A
Authority
CN
China
Prior art keywords
image
vis
detection
human body
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210072239.4A
Other languages
Chinese (zh)
Inventor
顾晶晶
陈俊义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202210072239.4A priority Critical patent/CN114511879A/en
Publication of CN114511879A publication Critical patent/CN114511879A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a multisource fusion human body target detection method based on VIS-IR images, which comprises the following steps of: acquiring an infrared image (IR image) and a visible light image (VIS image) of the same scene; detecting a human body target on the IR image, and recording four-point coordinates of a detection frame framing the target; carrying out human body target detection on the VIS image, and recording four-point coordinates of a detection frame framing the target; carrying out image registration on the IR image and the VIS image, and mapping the human body target coordinates detected by the IR image onto the VIS image; and filtering the repeated detection frames, and adding the rest detection frames into the VIS image to complete the whole target detection process. The invention is based on VIS-IR images, and improves the detection capability of human body targets in complex environment by a method of firstly carrying out independent detection and then carrying out fusion.

Description

Multisource fusion human body target detection method based on VIS-IR image
Technical Field
The invention belongs to the technical field of target detection, and particularly relates to a multisource fusion human body target detection method based on VIS-IR images.
Background
The visible light image has the characteristics of high imaging resolution, rich target detail information and the like, and can be more concerned by the scientific research field and civilian use under the non-special condition. Target detection based on deep learning is a big hotspot of current scientific research. The current classical target detection methods for visible light mainly include single-stage (YOLO, SSD, RetinaNet, etc.) and multi-stage methods (Fast RCNN, etc.). Among these methods, the YOLOv4 method is the most versatile and effective target detection method in the YOLO series.
The infrared imaging technology has the characteristics of long working distance, strong anti-interference capability, high measurement precision, no weather influence, capability of working day and night, strong smoke penetration capability and the like, so once the infrared imaging technology is put forward, the infrared imaging technology can be widely concerned in the scientific research field and civil use, and the market demand for detecting infrared targets is increased. Due to the imaging blur, poor resolution, low signal-to-noise ratio, low contrast ratio of the infrared image and the wireless relation between the image gray level distribution and the target reflection characteristic. It is therefore difficult to perform target detection on infrared images using the mainstream deep neural network. However, the conventional digital image target detection method does not need to acquire a large amount of training data in advance to train a detection model, and the detection capability is not low, so that the method is still a commonly used detection method at present.
It is very difficult to realize high-precision detection of human targets in complex environments by means of a single data source. On one hand, the visible light image is difficult to be applied to detection of human body targets in severe environments such as rain, snow, fog, dense obstacles and the like. On the other hand, the edge of the infrared thermal imaging is fuzzy, the characteristics are difficult to extract, and the thermal imaging of the barrier targets such as animals, street lamps, vehicles and the like is similar to the human body target, has high brightness and is easy to be confused with the human body; the generation of infrared thermal imaging is not only dependent on the temperature of the object, but also affected by external factors such as the surface characteristics of the object and the radiation wavelength.
In order to improve the detection capability of a human body target in a complex environment, the current common practice is to fuse the information of a visible light image and an infrared image so as to realize high-precision human body identification in the complex environment. Typical multi-source image fusion methods include feature level fusion and discriminator level fusion. The feature level fusion refers to extracting features of images in different modes respectively, then performing fusion in modes of splicing the features and the like, and finally training a predictor on the fused features. The disadvantage of feature level fusion is that operations such as feature extraction, feature transformation, feature fusion and the like consume a large amount of training computing resources and computing time. The discriminator level fusion refers to fusion on the discriminator prediction scores. The method comprises the steps of training a plurality of discriminators, wherein each discriminator has a prediction score, and weighting, summing and fusing results of all models.
Disclosure of Invention
The invention aims to provide a multisource fusion human body target detection method based on VIS-IR images, aiming at the problems in the prior art.
The technical solution for realizing the purpose of the invention is as follows: a multisource fusion human body target detection method based on VIS-IR images, comprising the following steps:
step1, acquiring an infrared image (IR image) and a visible light image (VIS image) of the same scene;
step2, detecting the human body target of the IR image, and recording four-point coordinates of a detection frame framing the target;
step3, detecting the human body target of the VIS image, and recording four-point coordinates of a detection frame framing the target;
step 4, carrying out image registration on the IR image and the VIS image, and mapping the human body target coordinates detected by the IR image to the VIS image;
and 5, filtering the repeated detection frames, and adding the rest detection frames into the VIS image to complete the whole target detection process.
Further, the step2 of detecting the human body target on the IR image and recording four-point coordinates of a detection frame framing the target specifically includes:
step2-1, processing the IR image by using a method based on the most value normalization so as to map the IR image to an electronic display device;
step2-2, filtering the IR image;
2-3, segmenting the human body target through an edge detection operator;
2-4, extracting a human body target based on a Fourier descriptor;
and 2-5, classifying and detecting the human body target based on an Adaboost algorithm, and recording four-point coordinates of a detection frame framing the image target.
Further, the step2-1 specifically comprises:
recording the pixel value of the infrared image as f (x, y), wherein x and y are the positions of the pixel value in the transverse direction and the longitudinal direction respectively;
step 2-1-1, counting a pixel gray value histogram p (k) of the IR image, wherein k is 0,1,2, 3.
Step 2-1-2, pixel gray is respectively carried out from the minimum pixel gray value and the maximum pixel gray value of the histogram to the middle
Counting the number of the values in an accumulated manner until the sum of the minimum pixel gray value is greater than a preset threshold value S1, stopping accumulating the gray value of the pixel when the sum of the maximum pixel gray value is greater than a preset threshold value S2, and recording the minimum accumulated pixel gray value as fmin(x, y) a maximum cumulative pixel value of fmax(x,y);
Step 2-1-3, normalizing the IR image pixel values:
Figure BDA0003482532480000031
further, the specific process of step 4 includes:
step 4-1, contour extraction
Carrying out edge detection on the IR image by adopting a Sobel operator, and carrying out edge detection on the processed VIS image by adopting a Canny operator;
step 4-2, corner point detection
Performing corner control on the edge control points in the step 4-1, setting and extracting a gray value threshold of the edge control points, and if the gray value of the edge control points obtained in the step 4-1 is greater than the threshold, considering the control points as corners;
step 4-3, clustering the corner points
Clustering the corner points obtained in the step 4-2, firstlyRandomly selecting three angular points from the angular points as a type 1 initial point gather, a type 2 initial point gather and a type 3 initial point gather, respectively calculating Euclidean distances between other angular points and the three initial point gathers, and setting the angular point with the minimum distance as a similar point gather; the coordinates of all the points are then averaged
Figure BDA0003482532480000032
Wherein (x)i,yi) Is the coordinate of the ith point of convergence, and n is the number of the points of convergence; thus, the coordinates (x) of the class 1,2,3 points of the infrared image are obtained11,y11),(x12,y12),(x13,y13) Class 1,2,3 coordinates (x) of visible light image21,y21),(x22,y22),(x23,y23);
Step 4-4, image automatic registration
Will (x)11,y11)、(x12,y12)、(x13,y13) As a registered dataset, (x)21,y21)、(x22,y22)、(x23,y23) As the datasets to be registered, the two sets of datasets were then registered using the cp2tform function in MATLAB software.
Further, step 5 is specifically implemented by an IOU filter fusion algorithm, and includes:
step 5-1, calculating the intersection ratio IOU of the two detection frames in the step2 and the step3, specifically: calculating the area ratio IOU of the intersection part of the detection frame of the IR image and the detection frame of the VIS image and the union part of the areas of the two frames;
step 5-2, judging whether the IOU is larger than or equal to a preset threshold value, if so, regarding the two detection frames as the detection frames of the same target, only reserving one of the detection frames to be drawn in the VIS image, and fusing the corresponding coordinate information and confidence information into the detection information of the VIS image; otherwise, drawing both the two detection frames in the VIS image, and fusing the corresponding coordinate information and the corresponding confidence coefficient information into the detection information of the VIS image.
A multi-source fusion human body target detection system based on VIS-IR images, the system comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an infrared image (IR image) and a visible light image (VIS image) of the same scene;
the first target detection module is used for detecting a human body target of the IR image and recording four-point coordinates of a detection frame framing the target;
the second target detection module is used for detecting the human body target of the VIS image and recording four-point coordinates of a detection frame framing the target;
the registration module is used for carrying out image registration on the IR image and the VIS image and mapping the human body target coordinate detected by the IR image to the VIS image;
and the filtering module is used for filtering the repeated detection frames and adding the rest detection frames into the VIS image to complete the whole target detection process.
Compared with the prior art, the invention has the following remarkable advantages: 1) the infrared image can detect human targets in a sheltered environment, which are difficult to detect by visible images such as jungles and darkness, but the detection effect of the human targets under the conditions of better illumination condition and less shelters is obviously weaker than that of the visible images due to poorer image pixel points. The invention combines the advantages of the two images, and can improve the detection effect of the human body target in the same scene. 2) The invention uses the image registration technology, and ensures that the IOU filtering fusion algorithm can be realized even if the infrared image acquisition position point is different from the visible light image acquisition position point, thereby improving the generalization of the invention. 3) The invention belongs to a method for fusing levels of discriminators, and can improve the identification precision of human body targets in complex environments on the premise of consuming a small amount of computing resources and computing time.
The present invention is described in further detail below with reference to the attached drawing figures.
Drawings
FIG. 1 is a schematic frame diagram of the multi-source fusion human body target detection method based on VIS-IR images.
FIG. 2 is a block diagram of IR image processing according to the present invention.
FIG. 3 is a diagram of the VIS image processing framework of the present invention.
FIG. 4 is a diagram of multi-source fusion human target detection results in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
In one embodiment, with reference to fig. 1 to 3, the invention provides a multisource fusion human body target detection method based on VIS-IR images, which comprises the following steps:
step1, acquiring an infrared image (IR image) and a visible light image (VIS image) of the same scene;
step2, detecting the human body target of the IR image, and recording four-point coordinates of a detection frame framing the target;
step3, detecting the human body target of the VIS image, and recording four-point coordinates of a detection frame framing the target;
step 4, carrying out image registration on the IR image and the VIS image, and mapping the human body target coordinates detected by the IR image to the VIS image;
and 5, filtering the repeated detection frames, and adding the rest detection frames into the VIS image to complete the whole target detection process.
Further, in one embodiment, the step2 of detecting the human body target on the IR image and recording coordinates of four points of a detection frame framing the target specifically includes:
step2-1, processing the IR image by using a method based on the most value normalization so as to map the IR image to an electronic display device;
and 2-2, performing median filtering processing on the IR image, taking a submatrix window (the window size is usually 3 x 3 or 5 x 5 area) taking a target pixel as the center of a pixel matrix of one image, sequencing the gray values of the pixels in the window, taking the sequenced middle value as a new gray value of the target pixel, and sequentially iterating the steps until the processing of all image data is completed. For example, for a pixel point (x, y), f (x, y) and g (x, y) are assumed to be the gray value of an original image pixel and the gray value of a processed image pixel respectively; w is a sub-matrix window, and k and l are respectively the horizontal and vertical coordinates of the sub-windows in the window W. The mathematical expression is as follows:
g(x,y)=med(f(x-k,y-l)),(k,l∈W)
the med () function is used for sorting the pixel gray values in the k × l sub-matrix window with the target pixel (x, y) as the center and taking the middle value to operate;
2-3, segmenting the human body target through an edge detection operator; specifically, a Sobel operator is used for segmenting a human body target, the Sobel operator is a discrete differential operator and is used for calculating a gray level approximate value of an image brightness function, an algorithm adopted by the method is that weighted average is firstly carried out and then differential operation is carried out, and calculation methods of the operators are respectively as follows:
Δxf(x,y)=[f(x-1,y+1)+2f(x,y+1)+f(x+1,y+1)]-[f(x-1,y-1)+2f(x,y-1)+f(x+1,y-1)]
Δyf(x,y)=[f(x-1,y-1)+2f(x-1,y)+f(x-1,y+1)]-[f(x+1,y-1)+2f(x+1,y)+f(x+1,y+1)]
wherein, (x, y) is the pixel point of the infrared thermography, and f (x, y) is the pixel gray value of the pixel point (x, y). Deltaxf(x,y),Δyf (x, y) represents the differential operation of the pixel gray value f (x, y) in the x-axis direction and the y-axis direction respectively;
and 2-4, extracting the human body target based on the Fourier descriptor, wherein the idea is as follows. Suppose there are N boundary points at the boundary of the human target, with (x)0,y0) Representing the boundary curve as a coordinate sequence S (n) ═ x (n), y (n) in a counterclockwise direction for the starting point]N-0, 1. To convert a two-dimensional representation into a one-dimensional representation, the target boundary curve is represented in complex form. The length of the counterclockwise rotation can be expressed as a complex function s (n):
S(n)=x(n)+j*y(n)n=0,1,2,...,N-1
discrete fourier transform of the continuous signal yields:
Figure BDA0003482532480000061
performing Fourier descriptor transformation on the complex sequence of S (n), wherein the transformation result is as follows:
Figure BDA0003482532480000062
intensive research shows that when the infrared thermography is subjected to human body target extraction, normalization processing is carried out on S (n), so that the description effect of a Fourier descriptor on the human body target can be obviously improved. The feature vector of the final fourier descriptor D is denoted v (n):
Figure BDA0003482532480000063
and 2-5, classifying and detecting the human body target based on an Adaboost algorithm, and recording four-point coordinates of a detection frame framing the image target. The Adaboost algorithm is implemented as follows:
step 1: and (5) initializing. Each training sample is given the same weight, as follows:
Figure BDA0003482532480000064
step 2: and (5) performing iterative operation. For T rounds of training, for T1, 2.
Step 2-1: weak learning algorithm is set at weight DtTraining to obtain a prediction function, as follows:
ht=X→{-1,1}
step 2-2: calculating the error rate of the prediction function as follows:
Figure BDA0003482532480000065
step 2-3: let atComprises the following steps:
Figure BDA0003482532480000066
step 2-4: updating the weight according to the error rate, as follows:
Figure BDA0003482532480000071
wherein Z istIs to make
Figure BDA0003482532480000072
The normalization factor of (1).
Step 3: after the T round training is completed, the final prediction function is:
Figure BDA0003482532480000073
further, in one embodiment, step2-1 specifically includes:
recording the pixel value of the infrared image as f (x, y), wherein x and y are the positions of the pixel value in the transverse direction and the longitudinal direction respectively;
step 2-1-1, counting a pixel gray value histogram p (k) of the IR image, where k is 0,1,2,3, and L-1, where k represents a gray value and L is a gray scale order;
step 2-1-2, respectively carrying out accumulation statistics on the number of pixel gray values from the minimum pixel gray value and the maximum pixel gray value of the histogram to the middle until the sum of the accumulation from the minimum pixel gray value is greater than a preset threshold value S1, the sum of the accumulation from the maximum pixel gray value is greater than a preset threshold value S2, stopping accumulating the pixel gray values, and recording the minimum accumulation pixel gray value as fmin(x, y) a maximum accumulated pixel value of fmax(x,y);
Step 2-1-3, normalizing the IR image pixel values:
Figure BDA0003482532480000074
further, in one embodiment, the specific process of step 4 includes:
step 4-1, contour extraction
Carrying out edge detection on the IR image by adopting a Sobel operator, and carrying out edge detection on the processed VIS image by adopting a Canny operator;
step 4-2, corner detection
Performing corner control on the edge control points in the step 4-1, setting and extracting a gray value threshold of the edge control points, and if the gray value of the edge control points obtained in the step 4-1 is greater than the threshold, considering the control points as corners;
step 4-3, clustering the corner points
Clustering the angular points obtained in the step 4-2, firstly, randomly selecting three angular points from the angular points to be respectively used as a type 1 initial gathering point, a type 2 initial gathering point and a type 3 initial gathering point, then respectively calculating Euclidean distances between other angular points and the three initial gathering points, and setting the angular point with the minimum distance as a similar gathering point; the coordinates of all the points are then averaged
Figure BDA0003482532480000081
Wherein (x)i,yi) Is the coordinate of the ith point of convergence, and n is the number of the points of convergence; the coordinates (x) of the points of class 1,2, and 3 in the infrared image are obtained11,y11),(x12,y12),(x13,y13) Category 1,2,3 coordinates (x) of the visible light image21,y21),(x22,y22),(x23,y23);
Step 4-4, image automatic registration
Will (x)11,y11)、(x12,y12)、(x13,y13) As a registered dataset, (x)21,y21)、(x22,y22)、(x23,y23) As the dataset to be registered, cp2t in MATLAB software was then utilizedThe form function registers the two sets of data sets.
Further, in one embodiment, the step 5 is specifically implemented by an IOU filter fusion algorithm, and includes:
step 5-1, calculating the intersection ratio IOU of the two detection frames in the step2 and the step3, specifically: calculating the area ratio IOU of the intersection part of the detection frame of the IR image and the detection frame of the VIS image and the union part of the areas of the two frames;
step 5-2, judging whether the IOU is greater than or equal to a preset threshold (preferably, the threshold is 0.75), if so, regarding the two detection frames as detection frames of the same target, only keeping one detection frame to be drawn in the VIS image, and fusing the corresponding coordinate information and confidence information into the detection information of the VIS image; otherwise, drawing both the two detection frames in the VIS image, and fusing the corresponding coordinate information and the confidence information into the detection information of the VIS image (as shown in fig. 4).
In one embodiment, a VIS-IR image-based multi-source fused human target detection system is provided, the system comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an infrared image (IR image) and a visible light image (VIS image) of the same scene;
the first target detection module is used for detecting a human body target of the IR image and recording four-point coordinates of a detection frame framing the target;
the second target detection module is used for detecting the human body target of the VIS image and recording four-point coordinates of a detection frame framing the target;
the registration module is used for carrying out image registration on the IR image and the VIS image and mapping the human body target coordinate detected by the IR image to the VIS image;
and the filtering module is used for filtering the repeated detection frames and adding the rest detection frames into the VIS image to complete the whole target detection process.
For specific limitations of the multi-source fused human target detection system based on VIS-IR images, reference may be made to the above limitations of the multi-source fused human target detection method based on VIS-IR images, which are not described herein again. All modules in the multi-source fusion human body target detection system based on the VIS-IR images can be completely or partially realized through software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
step1, acquiring an infrared image (IR image) and a visible light image (VIS image) of the same scene;
step2, detecting the human body target of the IR image, and recording four-point coordinates of a detection frame framing the target;
step3, detecting the human body target of the VIS image, and recording four-point coordinates of a detection frame framing the target;
step 4, carrying out image registration on the IR image and the VIS image, and mapping the human body target coordinates detected by the IR image to the VIS image;
and 5, filtering the repeated detection frames, and adding the rest detection frames into the VIS image to complete the whole target detection process.
For specific definition of each step, reference may be made to the above definition of the multi-source fusion human body target detection method based on VIS-IR images, which is not described herein again.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
step1, acquiring an infrared image (IR image) and a visible light image (VIS image) of the same scene;
step2, detecting the human body target of the IR image, and recording four-point coordinates of a detection frame framing the target;
step3, detecting the human body target of the VIS image, and recording four-point coordinates of a detection frame framing the target;
step 4, carrying out image registration on the IR image and the VIS image, and mapping the human body target coordinates detected by the IR image to the VIS image;
and 5, filtering the repeated detection frames, and adding the rest detection frames into the VIS image to complete the whole target detection process.
For specific definition of each step, reference may be made to the above definition of the multi-source fusion human body target detection method based on VIS-IR images, which is not described herein again.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.

Claims (9)

1. A multisource fusion human body target detection method based on VIS-IR images is characterized by comprising the following steps:
step1, acquiring an infrared image (IR image) and a visible light image (VIS image) of the same scene;
step2, detecting the human body target of the IR image, and recording four-point coordinates of a detection frame framing the target;
step3, detecting the human body target of the VIS image, and recording four-point coordinates of a detection frame framing the target;
step 4, carrying out image registration on the IR image and the VIS image, and mapping the human body target coordinates detected by the IR image to the VIS image;
and 5, filtering the repeated detection frames, and adding the rest detection frames into the VIS image to complete the whole target detection process.
2. The multi-source fusion human body target detection method based on VIS-IR images as claimed in claim 1, wherein the step2 of performing human body target detection on the IR images and recording four-point coordinates of a detection frame framing the target specifically comprises:
step2-1, processing the IR image by using a method based on the most value normalization so as to map the IR image to an electronic display device;
step2-2, filtering the IR image;
2-3, segmenting the human body target through an edge detection operator;
2-4, extracting a human body target based on a Fourier descriptor;
and 2-5, classifying and detecting the human body target based on an Adaboost algorithm, and recording four-point coordinates of a detection frame framing the image target.
3. The multi-source fusion human body target detection method based on VIS-IR images as claimed in claim 2, wherein the step2-1 specifically comprises:
recording the pixel value of the infrared image as f (x, y), wherein x and y are the positions of the pixel value in the transverse direction and the longitudinal direction respectively;
step 2-1-1, counting a pixel gray value histogram p (k) of the IR image, where k is 0,1,2,3, and L-1, where k represents a gray value and L is a gray scale order;
step 2-1-2, respectively carrying out accumulation statistics on the number of pixel gray values from the minimum pixel gray value and the maximum pixel gray value of the histogram to the middle until the sum of the accumulation from the minimum pixel gray value is greater than a preset threshold value S1, the sum of the accumulation from the maximum pixel gray value is greater than a preset threshold value S2, stopping accumulating the pixel gray values, and recording the minimum accumulation pixel gray value as fmin(x, y) a maximum accumulated pixel value of fmax(x,y);
Step 2-1-3, normalizing the IR image pixel values:
Figure FDA0003482532470000021
4. the multi-source fusion human body target detection method based on VIS-IR images as claimed in claim 3, wherein step2-2 specifically performs median filtering.
5. The multi-source fusion human body target detection method based on VIS-IR images as claimed in claim 4, wherein the edge detection operator of step2-3 adopts Sobel operator.
6. The multi-source fusion human body target detection method based on VIS-IR images as claimed in claim 5, wherein step3 specifically adopts a single-stage detection algorithm, namely Yolov4 algorithm, to perform human body target detection on visible light images.
7. The method for detecting the multisource fusion human body target based on the VIS-IR image according to claim 6, wherein the specific process of the step 4 comprises the following steps:
step 4-1, contour extraction
Carrying out edge detection on the IR image by adopting a Sobel operator, and carrying out edge detection on the processed VIS image by adopting a Canny operator;
step 4-2, corner detection
Performing corner control on the edge control points in the step 4-1, setting and extracting a gray value threshold of the edge control points, and if the gray value of the edge control points obtained in the step 4-1 is greater than the threshold, considering the control points as corners;
step 4-3, clustering the corner points
Clustering the angular points obtained in the step 4-2, namely, firstly, randomly selecting three angular points from the angular points to be used as a type 1 initial convergence point, a type 2 initial convergence point and a type 3 initial convergence point respectively, then respectively calculating Euclidean distances between other angular points and the three initial convergence points, and setting the angular point with the minimum distance as a similar convergence point; the coordinates of all the points are then averaged
Figure FDA0003482532470000022
Wherein (x)i,yi) Is the coordinate of the ith point of convergence, and n is the number of the points of convergence; from this, red is obtainedClass 1,2,3 coordinates (x) of outer image11,y11),(x12,y12),(x13,y13) Class 1,2,3 coordinates (x) of visible light image21,y21),(x22,y22),(x23,y23);
Step 4-4, image automatic registration
Will (x)11,y11)、(x12,y12)、(x13,y13) As a registered dataset, (x)21,y21)、(x22,y22)、(x23,y23) As the datasets to be registered, the two sets of datasets were then registered using the cp2tform function in MATLAB software.
8. The method for detecting the multisource fusion human body target based on the VIS-IR image according to claim 7, wherein the step 5 is realized by an IOU filtering fusion algorithm, and comprises the following steps:
step 5-1, calculating the intersection ratio IOU of the two detection frames in the step2 and the step3, specifically: calculating the area ratio IOU of the intersection part of the detection frame of the IR image and the detection frame of the VIS image and the union part of the areas of the two frames;
step 5-2, judging whether the IOU is larger than or equal to a preset threshold value, if so, regarding the two detection frames as the detection frames of the same target, only reserving one of the detection frames to be drawn in the VIS image, and fusing the corresponding coordinate information and confidence information into the detection information of the VIS image; otherwise, drawing both the two detection frames in the VIS image, and fusing the corresponding coordinate information and the corresponding confidence coefficient information into the detection information of the VIS image.
9. VIS-IR image-based multi-source fusion human target detection system based on the method of any one of claims 1 to 8, characterized in that the system comprises:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an infrared image (IR image) and a visible light image (VIS image) of the same scene;
the first target detection module is used for detecting a human body target of the IR image and recording four-point coordinates of a detection frame framing the target;
the second target detection module is used for detecting the human body target of the VIS image and recording four-point coordinates of a detection frame framing the target;
the registration module is used for carrying out image registration on the IR image and the VIS image and mapping the human body target coordinate detected by the IR image to the VIS image;
and the filtering module is used for filtering the repeated detection frames and adding the rest detection frames into the VIS image to finish the whole target detection process.
CN202210072239.4A 2022-01-21 2022-01-21 Multisource fusion human body target detection method based on VIS-IR image Pending CN114511879A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210072239.4A CN114511879A (en) 2022-01-21 2022-01-21 Multisource fusion human body target detection method based on VIS-IR image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210072239.4A CN114511879A (en) 2022-01-21 2022-01-21 Multisource fusion human body target detection method based on VIS-IR image

Publications (1)

Publication Number Publication Date
CN114511879A true CN114511879A (en) 2022-05-17

Family

ID=81549211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210072239.4A Pending CN114511879A (en) 2022-01-21 2022-01-21 Multisource fusion human body target detection method based on VIS-IR image

Country Status (1)

Country Link
CN (1) CN114511879A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115880292A (en) * 2023-02-22 2023-03-31 和普威视光电股份有限公司 Method, device, terminal and storage medium for detecting sea and lake surface targets

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115880292A (en) * 2023-02-22 2023-03-31 和普威视光电股份有限公司 Method, device, terminal and storage medium for detecting sea and lake surface targets

Similar Documents

Publication Publication Date Title
CN107680054B (en) Multi-source image fusion method in haze environment
CN110097044B (en) One-stage license plate detection and identification method based on deep learning
Kong et al. General road detection from a single image
CN111079556A (en) Multi-temporal unmanned aerial vehicle video image change area detection and classification method
Zhang et al. Vehicle recognition algorithm based on Haar-like features and improved Adaboost classifier
Li et al. Road lane detection with gabor filters
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
CN112906583B (en) Lane line detection method and device
CN107944354B (en) Vehicle detection method based on deep learning
US20190114753A1 (en) Video Background Removal Method
CN105160649A (en) Multi-target tracking method and system based on kernel function unsupervised clustering
Naufal et al. Preprocessed mask RCNN for parking space detection in smart parking systems
CN109685827B (en) Target detection and tracking method based on DSP
CN109711256B (en) Low-altitude complex background unmanned aerial vehicle target detection method
CN107992856B (en) High-resolution remote sensing building shadow detection method under urban scene
CN112560717B (en) Lane line detection method based on deep learning
CN113608663B (en) Fingertip tracking method based on deep learning and K-curvature method
CN112200746A (en) Defogging method and device for traffic scene image in foggy day
CN109635649B (en) High-speed detection method and system for unmanned aerial vehicle reconnaissance target
CN107045630B (en) RGBD-based pedestrian detection and identity recognition method and system
CN114511879A (en) Multisource fusion human body target detection method based on VIS-IR image
FAN et al. Robust lane detection and tracking based on machine vision
Bisht et al. Integration of hough transform and inter-frame clustering for road lane detection and tracking
KR102171384B1 (en) Object recognition system and method using image correction filter
CN108985216B (en) Pedestrian head detection method based on multivariate logistic regression feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination