CN114511879A

CN114511879A - Multisource fusion human body target detection method based on VIS-IR image

Info

Publication number: CN114511879A
Application number: CN202210072239.4A
Authority: CN
Inventors: 顾晶晶; 陈俊义
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2022-01-21
Filing date: 2022-01-21
Publication date: 2022-05-17

Abstract

The invention discloses a multisource fusion human body target detection method based on VIS-IR images, which comprises the following steps of: acquiring an infrared image (IR image) and a visible light image (VIS image) of the same scene; detecting a human body target on the IR image, and recording four-point coordinates of a detection frame framing the target; carrying out human body target detection on the VIS image, and recording four-point coordinates of a detection frame framing the target; carrying out image registration on the IR image and the VIS image, and mapping the human body target coordinates detected by the IR image onto the VIS image; and filtering the repeated detection frames, and adding the rest detection frames into the VIS image to complete the whole target detection process. The invention is based on VIS-IR images, and improves the detection capability of human body targets in complex environment by a method of firstly carrying out independent detection and then carrying out fusion.

Description

Multisource fusion human body target detection method based on VIS-IR image

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to a multisource fusion human body target detection method based on VIS-IR images.

Background

The visible light image has the characteristics of high imaging resolution, rich target detail information and the like, and can be more concerned by the scientific research field and civilian use under the non-special condition. Target detection based on deep learning is a big hotspot of current scientific research. The current classical target detection methods for visible light mainly include single-stage (YOLO, SSD, RetinaNet, etc.) and multi-stage methods (Fast RCNN, etc.). Among these methods, the YOLOv4 method is the most versatile and effective target detection method in the YOLO series.

The infrared imaging technology has the characteristics of long working distance, strong anti-interference capability, high measurement precision, no weather influence, capability of working day and night, strong smoke penetration capability and the like, so once the infrared imaging technology is put forward, the infrared imaging technology can be widely concerned in the scientific research field and civil use, and the market demand for detecting infrared targets is increased. Due to the imaging blur, poor resolution, low signal-to-noise ratio, low contrast ratio of the infrared image and the wireless relation between the image gray level distribution and the target reflection characteristic. It is therefore difficult to perform target detection on infrared images using the mainstream deep neural network. However, the conventional digital image target detection method does not need to acquire a large amount of training data in advance to train a detection model, and the detection capability is not low, so that the method is still a commonly used detection method at present.

It is very difficult to realize high-precision detection of human targets in complex environments by means of a single data source. On one hand, the visible light image is difficult to be applied to detection of human body targets in severe environments such as rain, snow, fog, dense obstacles and the like. On the other hand, the edge of the infrared thermal imaging is fuzzy, the characteristics are difficult to extract, and the thermal imaging of the barrier targets such as animals, street lamps, vehicles and the like is similar to the human body target, has high brightness and is easy to be confused with the human body; the generation of infrared thermal imaging is not only dependent on the temperature of the object, but also affected by external factors such as the surface characteristics of the object and the radiation wavelength.

In order to improve the detection capability of a human body target in a complex environment, the current common practice is to fuse the information of a visible light image and an infrared image so as to realize high-precision human body identification in the complex environment. Typical multi-source image fusion methods include feature level fusion and discriminator level fusion. The feature level fusion refers to extracting features of images in different modes respectively, then performing fusion in modes of splicing the features and the like, and finally training a predictor on the fused features. The disadvantage of feature level fusion is that operations such as feature extraction, feature transformation, feature fusion and the like consume a large amount of training computing resources and computing time. The discriminator level fusion refers to fusion on the discriminator prediction scores. The method comprises the steps of training a plurality of discriminators, wherein each discriminator has a prediction score, and weighting, summing and fusing results of all models.

Disclosure of Invention

The invention aims to provide a multisource fusion human body target detection method based on VIS-IR images, aiming at the problems in the prior art.

The technical solution for realizing the purpose of the invention is as follows: a multisource fusion human body target detection method based on VIS-IR images, comprising the following steps:

step1, acquiring an infrared image (IR image) and a visible light image (VIS image) of the same scene;

step2, detecting the human body target of the IR image, and recording four-point coordinates of a detection frame framing the target;

step3, detecting the human body target of the VIS image, and recording four-point coordinates of a detection frame framing the target;

step 4, carrying out image registration on the IR image and the VIS image, and mapping the human body target coordinates detected by the IR image to the VIS image;

and 5, filtering the repeated detection frames, and adding the rest detection frames into the VIS image to complete the whole target detection process.

Further, the step2 of detecting the human body target on the IR image and recording four-point coordinates of a detection frame framing the target specifically includes:

step2-1, processing the IR image by using a method based on the most value normalization so as to map the IR image to an electronic display device;

step2-2, filtering the IR image;

2-3, segmenting the human body target through an edge detection operator;

2-4, extracting a human body target based on a Fourier descriptor;

and 2-5, classifying and detecting the human body target based on an Adaboost algorithm, and recording four-point coordinates of a detection frame framing the image target.

Further, the step2-1 specifically comprises:

recording the pixel value of the infrared image as f (x, y), wherein x and y are the positions of the pixel value in the transverse direction and the longitudinal direction respectively;

step 2-1-1, counting a pixel gray value histogram p (k) of the IR image, wherein k is 0,1,2, 3.

Step 2-1-2, pixel gray is respectively carried out from the minimum pixel gray value and the maximum pixel gray value of the histogram to the middle

Counting the number of the values in an accumulated manner until the sum of the minimum pixel gray value is greater than a preset threshold value S1, stopping accumulating the gray value of the pixel when the sum of the maximum pixel gray value is greater than a preset threshold value S2, and recording the minimum accumulated pixel gray value as f_min(x, y) a maximum cumulative pixel value of f_max(x,y)；

Step 2-1-3, normalizing the IR image pixel values:

further, the specific process of step 4 includes:

step 4-1, contour extraction

Carrying out edge detection on the IR image by adopting a Sobel operator, and carrying out edge detection on the processed VIS image by adopting a Canny operator;

step 4-2, corner point detection

Performing corner control on the edge control points in the step 4-1, setting and extracting a gray value threshold of the edge control points, and if the gray value of the edge control points obtained in the step 4-1 is greater than the threshold, considering the control points as corners;

step 4-3, clustering the corner points

Clustering the corner points obtained in the step 4-2, firstlyRandomly selecting three angular points from the angular points as a type 1 initial point gather, a type 2 initial point gather and a type 3 initial point gather, respectively calculating Euclidean distances between other angular points and the three initial point gathers, and setting the angular point with the minimum distance as a similar point gather; the coordinates of all the points are then averaged

Wherein (x)_i，y_i) Is the coordinate of the ith point of convergence, and n is the number of the points of convergence; thus, the coordinates (x) of the class 1,2,3 points of the infrared image are obtained₁₁,y₁₁)，(x₁₂,y₁₂)，(x₁₃,y₁₃) Class 1,2,3 coordinates (x) of visible light image₂₁,y₂₁)，(x₂₂,y₂₂)，(x₂₃,y₂₃)；

Step 4-4, image automatic registration

Will (x)₁₁,y₁₁)、(x₁₂,y₁₂)、(x₁₃,y₁₃) As a registered dataset, (x)₂₁,y₂₁)、(x₂₂,y₂₂)、(x₂₃,y₂₃) As the datasets to be registered, the two sets of datasets were then registered using the cp2tform function in MATLAB software.

Further, step 5 is specifically implemented by an IOU filter fusion algorithm, and includes:

step 5-1, calculating the intersection ratio IOU of the two detection frames in the step2 and the step3, specifically: calculating the area ratio IOU of the intersection part of the detection frame of the IR image and the detection frame of the VIS image and the union part of the areas of the two frames;

step 5-2, judging whether the IOU is larger than or equal to a preset threshold value, if so, regarding the two detection frames as the detection frames of the same target, only reserving one of the detection frames to be drawn in the VIS image, and fusing the corresponding coordinate information and confidence information into the detection information of the VIS image; otherwise, drawing both the two detection frames in the VIS image, and fusing the corresponding coordinate information and the corresponding confidence coefficient information into the detection information of the VIS image.

A multi-source fusion human body target detection system based on VIS-IR images, the system comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an infrared image (IR image) and a visible light image (VIS image) of the same scene;

the first target detection module is used for detecting a human body target of the IR image and recording four-point coordinates of a detection frame framing the target;

the second target detection module is used for detecting the human body target of the VIS image and recording four-point coordinates of a detection frame framing the target;

the registration module is used for carrying out image registration on the IR image and the VIS image and mapping the human body target coordinate detected by the IR image to the VIS image;

and the filtering module is used for filtering the repeated detection frames and adding the rest detection frames into the VIS image to complete the whole target detection process.

Compared with the prior art, the invention has the following remarkable advantages: 1) the infrared image can detect human targets in a sheltered environment, which are difficult to detect by visible images such as jungles and darkness, but the detection effect of the human targets under the conditions of better illumination condition and less shelters is obviously weaker than that of the visible images due to poorer image pixel points. The invention combines the advantages of the two images, and can improve the detection effect of the human body target in the same scene. 2) The invention uses the image registration technology, and ensures that the IOU filtering fusion algorithm can be realized even if the infrared image acquisition position point is different from the visible light image acquisition position point, thereby improving the generalization of the invention. 3) The invention belongs to a method for fusing levels of discriminators, and can improve the identification precision of human body targets in complex environments on the premise of consuming a small amount of computing resources and computing time.

The present invention is described in further detail below with reference to the attached drawing figures.

Drawings

FIG. 1 is a schematic frame diagram of the multi-source fusion human body target detection method based on VIS-IR images.

FIG. 2 is a block diagram of IR image processing according to the present invention.

FIG. 3 is a diagram of the VIS image processing framework of the present invention.

FIG. 4 is a diagram of multi-source fusion human target detection results in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

In one embodiment, with reference to fig. 1 to 3, the invention provides a multisource fusion human body target detection method based on VIS-IR images, which comprises the following steps:

Further, in one embodiment, the step2 of detecting the human body target on the IR image and recording coordinates of four points of a detection frame framing the target specifically includes:

and 2-2, performing median filtering processing on the IR image, taking a submatrix window (the window size is usually 3 x 3 or 5 x 5 area) taking a target pixel as the center of a pixel matrix of one image, sequencing the gray values of the pixels in the window, taking the sequenced middle value as a new gray value of the target pixel, and sequentially iterating the steps until the processing of all image data is completed. For example, for a pixel point (x, y), f (x, y) and g (x, y) are assumed to be the gray value of an original image pixel and the gray value of a processed image pixel respectively; w is a sub-matrix window, and k and l are respectively the horizontal and vertical coordinates of the sub-windows in the window W. The mathematical expression is as follows:

g(x,y)＝med(f(x-k,y-l)),(k,l∈W)

the med () function is used for sorting the pixel gray values in the k × l sub-matrix window with the target pixel (x, y) as the center and taking the middle value to operate;

2-3, segmenting the human body target through an edge detection operator; specifically, a Sobel operator is used for segmenting a human body target, the Sobel operator is a discrete differential operator and is used for calculating a gray level approximate value of an image brightness function, an algorithm adopted by the method is that weighted average is firstly carried out and then differential operation is carried out, and calculation methods of the operators are respectively as follows:

Δ_xf(x,y)＝[f(x-1,y+1)+2f(x,y+1)+f(x+1,y+1)]-[f(x-1,y-1)+2f(x,y-1)+f(x+1,y-1)]

Δ_yf(x,y)＝[f(x-1,y-1)+2f(x-1,y)+f(x-1,y+1)]-[f(x+1,y-1)+2f(x+1,y)+f(x+1,y+1)]

wherein, (x, y) is the pixel point of the infrared thermography, and f (x, y) is the pixel gray value of the pixel point (x, y). Delta_xf(x,y)，Δ_yf (x, y) represents the differential operation of the pixel gray value f (x, y) in the x-axis direction and the y-axis direction respectively;

and 2-4, extracting the human body target based on the Fourier descriptor, wherein the idea is as follows. Suppose there are N boundary points at the boundary of the human target, with (x)₀,y₀) Representing the boundary curve as a coordinate sequence S (n) ═ x (n), y (n) in a counterclockwise direction for the starting point]N-0, 1. To convert a two-dimensional representation into a one-dimensional representation, the target boundary curve is represented in complex form. The length of the counterclockwise rotation can be expressed as a complex function s (n):

S(n)＝x(n)+j*y(n)n＝0,1,2,...,N-1

discrete fourier transform of the continuous signal yields:

performing Fourier descriptor transformation on the complex sequence of S (n), wherein the transformation result is as follows:

intensive research shows that when the infrared thermography is subjected to human body target extraction, normalization processing is carried out on S (n), so that the description effect of a Fourier descriptor on the human body target can be obviously improved. The feature vector of the final fourier descriptor D is denoted v (n):

and 2-5, classifying and detecting the human body target based on an Adaboost algorithm, and recording four-point coordinates of a detection frame framing the image target. The Adaboost algorithm is implemented as follows:

step 1: and (5) initializing. Each training sample is given the same weight, as follows:

step 2: and (5) performing iterative operation. For T rounds of training, for T1, 2.

Step 2-1: weak learning algorithm is set at weight D_tTraining to obtain a prediction function, as follows:

h_t＝X→{-1,1}

step 2-2: calculating the error rate of the prediction function as follows:

step 2-3: let a_tComprises the following steps:

step 2-4: updating the weight according to the error rate, as follows:

wherein Z is_tIs to make

The normalization factor of (1).

Step 3: after the T round training is completed, the final prediction function is:

further, in one embodiment, step2-1 specifically includes:

step 2-1-1, counting a pixel gray value histogram p (k) of the IR image, where k is 0,1,2,3, and L-1, where k represents a gray value and L is a gray scale order;

step 2-1-2, respectively carrying out accumulation statistics on the number of pixel gray values from the minimum pixel gray value and the maximum pixel gray value of the histogram to the middle until the sum of the accumulation from the minimum pixel gray value is greater than a preset threshold value S1, the sum of the accumulation from the maximum pixel gray value is greater than a preset threshold value S2, stopping accumulating the pixel gray values, and recording the minimum accumulation pixel gray value as f_min(x, y) a maximum accumulated pixel value of f_max(x,y)；

Step 2-1-3, normalizing the IR image pixel values:

further, in one embodiment, the specific process of step 4 includes:

step 4-1, contour extraction

step 4-2, corner detection

step 4-3, clustering the corner points

Clustering the angular points obtained in the step 4-2, firstly, randomly selecting three angular points from the angular points to be respectively used as a type 1 initial gathering point, a type 2 initial gathering point and a type 3 initial gathering point, then respectively calculating Euclidean distances between other angular points and the three initial gathering points, and setting the angular point with the minimum distance as a similar gathering point; the coordinates of all the points are then averaged

Wherein (x)_i，y_i) Is the coordinate of the ith point of convergence, and n is the number of the points of convergence; the coordinates (x) of the points of class 1,2, and 3 in the infrared image are obtained₁₁,y₁₁)，(x₁₂,y₁₂)，(x₁₃,y₁₃) Category 1,2,3 coordinates (x) of the visible light image₂₁,y₂₁)，(x₂₂,y₂₂)，(x₂₃,y₂₃)；

Step 4-4, image automatic registration

Will (x)₁₁,y₁₁)、(x₁₂,y₁₂)、(x₁₃,y₁₃) As a registered dataset, (x)₂₁,y₂₁)、(x₂₂,y₂₂)、(x₂₃,y₂₃) As the dataset to be registered, cp2t in MATLAB software was then utilizedThe form function registers the two sets of data sets.

Further, in one embodiment, the step 5 is specifically implemented by an IOU filter fusion algorithm, and includes:

step 5-2, judging whether the IOU is greater than or equal to a preset threshold (preferably, the threshold is 0.75), if so, regarding the two detection frames as detection frames of the same target, only keeping one detection frame to be drawn in the VIS image, and fusing the corresponding coordinate information and confidence information into the detection information of the VIS image; otherwise, drawing both the two detection frames in the VIS image, and fusing the corresponding coordinate information and the confidence information into the detection information of the VIS image (as shown in fig. 4).

In one embodiment, a VIS-IR image-based multi-source fused human target detection system is provided, the system comprising:

For specific limitations of the multi-source fused human target detection system based on VIS-IR images, reference may be made to the above limitations of the multi-source fused human target detection method based on VIS-IR images, which are not described herein again. All modules in the multi-source fusion human body target detection system based on the VIS-IR images can be completely or partially realized through software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

For specific definition of each step, reference may be made to the above definition of the multi-source fusion human body target detection method based on VIS-IR images, which is not described herein again.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.

Claims

1. A multisource fusion human body target detection method based on VIS-IR images is characterized by comprising the following steps:

2. The multi-source fusion human body target detection method based on VIS-IR images as claimed in claim 1, wherein the step2 of performing human body target detection on the IR images and recording four-point coordinates of a detection frame framing the target specifically comprises:

step2-2, filtering the IR image;

2-3, segmenting the human body target through an edge detection operator;

2-4, extracting a human body target based on a Fourier descriptor;

3. The multi-source fusion human body target detection method based on VIS-IR images as claimed in claim 2, wherein the step2-1 specifically comprises:

Step 2-1-3, normalizing the IR image pixel values:

4. the multi-source fusion human body target detection method based on VIS-IR images as claimed in claim 3, wherein step2-2 specifically performs median filtering.

5. The multi-source fusion human body target detection method based on VIS-IR images as claimed in claim 4, wherein the edge detection operator of step2-3 adopts Sobel operator.

6. The multi-source fusion human body target detection method based on VIS-IR images as claimed in claim 5, wherein step3 specifically adopts a single-stage detection algorithm, namely Yolov4 algorithm, to perform human body target detection on visible light images.

7. The method for detecting the multisource fusion human body target based on the VIS-IR image according to claim 6, wherein the specific process of the step 4 comprises the following steps:

step 4-1, contour extraction

step 4-2, corner detection

step 4-3, clustering the corner points

Clustering the angular points obtained in the step 4-2, namely, firstly, randomly selecting three angular points from the angular points to be used as a type 1 initial convergence point, a type 2 initial convergence point and a type 3 initial convergence point respectively, then respectively calculating Euclidean distances between other angular points and the three initial convergence points, and setting the angular point with the minimum distance as a similar convergence point; the coordinates of all the points are then averaged

Wherein (x)_i，y_i) Is the coordinate of the ith point of convergence, and n is the number of the points of convergence; from this, red is obtainedClass 1,2,3 coordinates (x) of outer image₁₁,y₁₁)，(x₁₂,y₁₂)，(x₁₃,y₁₃) Class 1,2,3 coordinates (x) of visible light image₂₁,y₂₁)，(x₂₂,y₂₂)，(x₂₃,y₂₃)；

Step 4-4, image automatic registration

8. The method for detecting the multisource fusion human body target based on the VIS-IR image according to claim 7, wherein the step 5 is realized by an IOU filtering fusion algorithm, and comprises the following steps:

9. VIS-IR image-based multi-source fusion human target detection system based on the method of any one of claims 1 to 8, characterized in that the system comprises:

and the filtering module is used for filtering the repeated detection frames and adding the rest detection frames into the VIS image to finish the whole target detection process.