CN111597967B

CN111597967B - Infrared image multi-target pedestrian identification method

Info

Publication number: CN111597967B
Application number: CN202010404659.9A
Authority: CN
Inventors: 杨光临; 孙越
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2020-05-12
Filing date: 2020-05-12
Publication date: 2023-04-07
Anticipated expiration: 2040-05-12
Also published as: CN111597967A

Abstract

The invention provides an infrared image multi-target pedestrian recognition method, and belongs to the field of infrared image target recognition. The method comprises the steps of firstly, constructing an infrared complex sample data set, and then obtaining an infrared image data set with improved visual effect and clearer pedestrian target based on a WDSR algorithm; then k-means clustering analysis is carried out on the boundary box of the pedestrians in the infrared complex sample data set to obtain a boundary box clustering result, and then an output scale is added for the boundary box clustering result on the basis of a YOLOv3 network to construct an infrared image multi-target pedestrian recognition network; and finally, performing infrared pedestrian recognition by using the trained infrared image multi-target pedestrian recognition network. The invention can reduce missing detection and error identification of the pedestrian, and improve the accuracy of pedestrian identification.

Description

Infrared image multi-target pedestrian identification method

Technical Field

The invention provides an infrared image multi-target pedestrian recognition method, and particularly relates to the field of infrared image target recognition.

Technical Field

The principle of infrared imaging is thermal imaging technology, can in time discern the pedestrian under special environment such as night, cloudy day, has the advantage of all-weather work. The pedestrian identification research based on the infrared image can be applied to a plurality of military and civil fields such as safety monitoring of old people, automatic driving, intelligent transportation, equipment exploration and the like, and has stronger social significance. However, the infrared pedestrian target has the problems of fuzzy edge, unobvious characteristics and the like, and the accuracy of pedestrian identification and positioning monitoring is influenced. The invention provides an infrared image multi-target pedestrian recognition method, which improves the accuracy of infrared pedestrian recognition.

In the pedestrian recognition research based on machine vision, the basic idea of the convolutional neural network is applied to infrared image pedestrian recognition, the experiment cost is low, and good accuracy can be obtained. Wherein the You Only Look one (YoLO) series algorithm is a novel method with high accuracy at present, and the expression of the YoLO v3 algorithm is most prominent ^[1] . However, the traditional YOLOv3 target recognition algorithm is suitable for large-size target application scenes, has a good recognition effect on clearer RGB images, and is suitable for red with fuzzy edgesThe pedestrian target recognition accuracy of the outer image, which is small in size and has shielding is not high ^[2] . The pedestrian target lacks sufficient appearance characteristic information due to fuzzy infrared image edges, low resolution and poor image quality, the learning effect of the model and the accuracy of infrared image pedestrian recognition are influenced, and the super-resolution algorithm can effectively solve the problems. Wide channel Activation Super-Resolution algorithm (WDSR) for efficiency and Accurate Image Super-Resolution ^[3] The method is one of the image super-resolution algorithms which show the best performance at present, but the improvement effect of the method on the infrared image visual effect and the pedestrian target definition is still to be verified.

Disclosure of Invention

In order to overcome the defects, the invention provides the infrared image multi-target pedestrian recognition method, which can complete a multi-target and multi-state pedestrian recognition task, reduce the phenomena of missing detection and false recognition of pedestrians, and effectively improve the accuracy of infrared image pedestrian recognition.

The technical scheme provided by the invention is as follows:

an infrared image multi-target pedestrian identification method comprises the following steps:

1) Constructing an infrared complex sample data set;

2) Constructing an infrared image data set with clear pedestrian targets based on a WDSR algorithm;

3) Performing k-means clustering analysis on the boundary box of the pedestrian in the infrared complex sample data set to obtain a boundary box clustering result, and then increasing an output scale aiming at the boundary box clustering result on the basis of a YOLOv3 network to construct an infrared image multi-target pedestrian recognition network;

4) Training the infrared image multi-target pedestrian recognition network constructed in the step 3) by using the infrared image data set obtained in the step 2) as a training set; and performing infrared pedestrian recognition by using the trained infrared image multi-target pedestrian recognition network.

The step 1) specifically comprises the following steps: first, a complex sample is defined as: a pedestrian target with small size, incomplete shape and fuzzy characteristics; and then selecting an infrared complex sample image from a CVC-14 infrared image data set provided by a computer vision center to form an infrared complex sample data set, wherein the data set provides the labeling results of all pedestrians.

The step 2) specifically comprises the following steps:

21 Using a CVC-14 infrared image data set, converting a high-resolution image in an infrared complex sample data set into a low-resolution image by using a bicubic interpolation downsampling method, and establishing a mapping relation between the high-resolution data set and the low-resolution data set;

22 Taking a low-resolution data set as input and a high-resolution data set with a mapping relation as output, and training an infrared image network based on WDSR;

23 Inputting the image of the infrared complex sample data set in the step 1), training to complete an infrared image network based on a WDSR algorithm, and obtaining an infrared image data set with higher resolution.

Performing K-means cluster analysis on the pedestrian bounding box in the infrared image data set in the step 3), selecting K =4, and obtaining the sizes of the bounding boxes respectively as follows: (4,13), (23,29), (15,42), (55,135).

According to the result of the bounding box clustering, the output scale increased in the step 3) is as follows: and performing 2-time upsampling operation on the output result of the 8-time downsampling layer, and outputting the fused result after being fused with the 4-time downsampling feature map.

The step 4) specifically comprises the following steps:

41 Randomly dividing the infrared image data set with clear pedestrian targets obtained in the step 2) into a training set and a testing set,

42 Training the infrared image multi-target pedestrian recognition network constructed in the step 3) by using the training set in the step 41) until the loss function tends to be stable;

43 Inputting the test set in the step 41) into the trained infrared image multi-target pedestrian recognition network obtained in the step 42) to obtain an infrared image multi-target pedestrian recognition result.

Compared with the prior art, the invention has the beneficial effects that: firstly, constructing an infrared complex sample data set, and then inputting the data set into an infrared image network based on a WDSR algorithm to obtain an infrared image data set with improved visual effect and clearer pedestrian target; then, combining a basic target detection network of YOLOv3, constructing an infrared image multi-target pedestrian recognition network, and training the infrared image multi-target pedestrian recognition network; and finally, performing infrared pedestrian recognition by using the trained infrared image multi-target pedestrian recognition network.

The advantages of the invention are mainly shown in the following aspects:

the infrared image multi-target pedestrian identification method can identify pedestrians in time in special environments such as at night, cloudy days and the like, and has the advantage of all-weather work;

the invention can effectively improve the quality of the infrared image and highlight the details of the pedestrian target;

compared with the traditional method, the pedestrian detection missing and false identification phenomena are reduced, and the identification accuracy is improved.

Drawings

FIG. 1 is a design drawing of an infrared image multi-target pedestrian identification scheme;

FIG. 2 shows the super-resolution experimental results of the use of 4 methods for the butterfly in the RGB image;

FIG. 3 is a super-resolution experimental result of infrared images using 4 methods;

FIG. 4 is a diagram of the structure of an infrared image multi-target pedestrian recognition network of the present invention;

FIG. 5 is a result of an occluded target experiment for infrared image pedestrian identification using three networks, respectively; wherein (a) is a YOLOv3 network; (b) an infrared image multi-target pedestrian identification network; (c) WDSR + infrared image multi-target pedestrian identification network;

fig. 6 is a small-sized target experiment result of infrared image pedestrian recognition using three kinds of networks, respectively; wherein (a) is a yollov 3 network; (b) infrared image multi-target pedestrian recognition network; and (c) the WDSR + infrared image multi-target pedestrian recognition network.

Detailed Description

The invention will be further described below by way of examples of implementation with reference to the accompanying drawings, without limiting the scope of the invention in any way.

The scheme design of the infrared image multi-target pedestrian recognition method is shown in the attached figure 1. In the embodiment of the present invention, the method provided by the present invention specifically includes the following steps:

1) Constructing an infrared complex sample data set:

the invention constructs an infrared complex sample data set on the basis of a CVC-14 infrared image data set provided by a computer vision center. Because the infrared image has the problems of fuzzy pedestrian edge, poor definition, occlusion and the like, the method defines the complex sample as follows: the pedestrian target is small in size, incomplete in shape and fuzzy in characteristics. According to the definition of the invention for the complex sample, the infrared complex sample data set under various backgrounds is screened out.

2) Based on WDSR algorithm, obtaining infrared image data set with clear pedestrian target:

and training an infrared image enhancement network based on a WDSR algorithm. Firstly, 800 clear infrared images are selected from a CVC-14 data set, a high-resolution image in the infrared image data set is processed into a low-resolution image by a bicubic interpolation down-sampling method, an input-output image pair of a WDSR network is formed, and the infrared image network based on a WDSR algorithm is trained.

In the testing process, firstly, a bicubic interpolation method is used for carrying out downsampling to reduce the picture to 1/2 of the original size, then, a nearest neighbor interpolation method, a bilinear interpolation method, a bicubic interpolation method and a trained infrared image network based on a WDSR algorithm are respectively used for carrying out an image recognition experiment, and the image is enlarged to the original size.

The evaluation index of super-resolution uses Peak signal-to-noise ratio (PSNR) to measure the performance of the model by the difference between the high-resolution image output by the model and the high-resolution reference image. Assuming that the image resolution is m × n, the output of the model is represented by P (m, n), and the high-resolution image for reference is represented by Q (m, n), the Mean-square error (MSE) of P (m, n) and Q (m, n) can be represented by equation 3:

PSNR values can be calculated using MSE, the calculation formula is as follows:

wherein Q _m Representing the number of grey levels of a pixel of the image, e.g. an 8-bit pixel, Q _m ＝255。

Fig. 2 is a super-resolution experimental result of 4 methods for the butterfly in the RGB image, and fig. 3 is a super-resolution experimental result of 4 methods for the infrared image, wherein detailed diagrams with the same size in the figures are respectively captured and enlarged in the experimental result to show the experimental result more clearly. Table 1 shows the super-resolution experimental PSNR values for the "butterfly" and the infrared images using 4 methods, respectively.

Table 1 PSNR comparison of super-resolution algorithms

Experimental results show that the WDSR algorithm has the best performance among the four super-resolution-based image identification methods, the PSNR can reach 25.97dB in a visible light image, and the result in an infrared image is 24.25dB. Therefore, the infrared image enhancement method based on the WDSR algorithm can solve the problem of fuzzy edges of the pedestrian target in the infrared image to a certain extent and effectively improve the definition and visual effect of the pedestrian target.

And inputting the infrared complex sample data set into a trained infrared image network based on a WDSR algorithm to obtain an enhanced infrared image with improved image quality and clearer pedestrian target.

3) Performing k-means cluster analysis on the pedestrian boundary frame according to the infrared complex sample constructed in the step 1) to obtain a boundary frame cluster result of the infrared complex sample, and then increasing an output scale according to the boundary frame cluster result on the basis of a YOLOv3 network to construct an infrared image multi-target pedestrian recognition network:

and clustering the sample data set aiming at the infrared complex sample constructed by the invention. Selecting K =1,2, 15, and performing K-means clustering on the samples, wherein a distance measurement formula is as follows:

d (box; centroid) =1-IoU (box; centroid) (formula 1)

In equation 1, ioU represents the intersection ratio of the clustering bounding box and the labeling bounding box.

The invention therefore chooses K =4, i.e. 4 bounding boxes. That is, the average IoU tends to be stable when K =4, and since the position of the bounding box cannot be determined, only the width and height of the bounding box are recorded, and the calculation formula is shown in formula 2.

According to the clustering result, the sizes of the 4 bounding boxes are obtained as follows: (4,13), (23,29), (15,42), (55,135). The clustering result of the invention can meet the size characteristics of the pedestrians in the infrared image and has certain robustness to the shielded pedestrians.

The YOLOv3 algorithm uses 8-fold down-sampling for small target detection, but when the target size is less than 8pixel, the feature extraction and target detection cannot be accurately performed. On the basis of a YOLOv3 network, the target detection is carried out by using a 4-time down-sampling layer in a Darknet-53 network structure, and more small target information is obtained. The specific method comprises the following steps: and 2 times of up-sampling operation is used for the output result of the 8 times of down-sampling layer in the YOLOv3 network, and the up-sampling operation is fused with the 4 times of down-sampling feature map and then output.

The input image size of the YOLOv3 network is 416 pixels by 416 pixels, and due to the problems that the image is deformed due to cropping and scaling, the target is incomplete, the size of a small target in the image is reduced, and the like, the performance of target detection is directly influenced. The size of the input image of the infrared image multi-target pedestrian recognition network is 512 pixels by 512 pixels. The structure of the infrared image multi-target pedestrian recognition network is shown in the attached figure 4.

4) Training the constructed infrared image multi-target pedestrian recognition network by using the infrared image data set with clear pedestrian targets obtained in the step 2) as a training set; and carrying out pedestrian identification on the trained infrared image multi-target pedestrian identification network.

The experimental environmental parameters of the present invention are shown in table 2.

TABLE 2 Infrared image target identification experiment platform

500 infrared images are selected, and 2518 infrared pedestrian targets are selected; randomly selecting 400 infrared images as a training set, wherein 1961 infrared pedestrian targets are selected; the remaining 100 infrared images are a test set, of which 557 infrared pedestrian targets are present.

And respectively training the YOLOv3 network and the infrared image multi-target pedestrian recognition network by using the infrared complex sample data set, and simultaneously training the infrared image multi-target pedestrian recognition model by using the enhanced infrared sample data set. Training 32 infrared images at each iteration; the size of the input image is 512 pixels by 512 pixels; impulse coefficient is 0.9; the attenuation coefficient is 0.0005, the learning rate is 0.001, and the maximum iteration number is set to 50000; to expand the data set, a saturation and exposure transformation of the image is performed once per iteration; the learning rate adjustment strategy is a step-by-step strategy, and overfitting is prevented. And obtaining three trained pedestrian recognition networks.

And carrying out a pedestrian identification experiment on a single infrared image by using a traditional YOLOv3 network, an infrared image multi-target pedestrian identification network and a WDSR + infrared image multi-target pedestrian identification network. Fig. 5 is an experimental result when a pedestrian is occluded, and fig. 6 is a recognition result of a small target. It can be seen intuitively that the YOLOv3 network has poor effect in infrared image pedestrian recognition, and has the phenomena of missing detection and false detection, and the infrared image multi-target pedestrian recognition network and the WDSR + infrared image multi-target pedestrian recognition network can recognize partially blocked targets and targets with smaller sizes. The scheme provided by the invention is more suitable for an application scene of infrared image pedestrian recognition in a natural environment, and has better robustness to a complex environment.

The experimental results are shown in table 3, which respectively show three networks: pedestrian identification accuracy and recall rate of a Yolov3 network, an infrared image multi-target pedestrian identification network and a WDSR + infrared image multi-target pedestrian identification network. The accuracy rate of the traditional YOLOv3 algorithm in infrared image pedestrian recognition is only 69.5%, the recall rate is 70.38%, the recognition accuracy rate of the infrared image multi-target pedestrian recognition network is 87.5%, the recall rate is 89.23%, the recognition accuracy rate of the WDSR + infrared image multi-target pedestrian recognition network can reach 90.69%, and the recall rate is 92.64%. Experimental results show that the pedestrian recognition method based on the infrared image can effectively solve the problem of edge blurring of the infrared image, the pedestrian recognition accuracy is improved by 21.19%, and the recall rate is improved by 22.26%.

TABLE 3 pedestrian recognition experiment results of three algorithms

In addition, the AP result of the YOLOv3 algorithm in infrared image pedestrian recognition is 68.73%, the AP result of the infrared image multi-target pedestrian recognition network is 88.19%, and the AP result of the WDSR + infrared image multi-target pedestrian recognition network can reach 91.37%, which is improved by 22.64% compared with the traditional YOLOv3 algorithm.

It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.

Reference documents:

[1]Redmon J,and Farhadi A.YOLOv3:An Incremental Improvement[J].arXiv:1804.02767,2018.

[2] tankangxia, pingcheng, qin Wen tiger, infrared image pedestrian detection method based on YOLO model [ J ] laser and infrared, 48 (11): 1436-1442,2018.

[3]Yu J,Fan Y,Yang J,et al.Wide Activation for Efficient and Accurate Image Super-Resolution[J].arXiv:1808.08718v2,2018.

[4]H.Krishna,and C.V.Jawahar,"Improving Small Object Detection,"2017 4th IAPR Asian Conference on Pattern Recognition(ACPR),Nanjing,pp.340-345,2017.

Claims

1. An infrared image multi-target pedestrian identification method comprises the following steps:

1) Constructing an infrared complex sample data set; first, a complex sample is defined as: pedestrian targets with small size, incomplete shape and fuzzy characteristics; then selecting an infrared complex sample image from a CVC-14 infrared image data set provided by a computer vision center to form an infrared complex sample data set, wherein the data set comprises a pedestrian target labeling result;

2) Constructing an infrared image data set with clear pedestrian targets based on a WDSR algorithm; the method specifically comprises the following steps:

22 Taking the low-resolution data set as input and the high-resolution data set with mapping relation as output, and training the infrared image network based on WDSR;

23 Inputting an image of the infrared complex sample data set in the step 1), training to complete an infrared image network based on a WDSR algorithm, and obtaining an infrared image data set with higher resolution;

3) Performing K-means cluster analysis on the boundary box of the pedestrian in the infrared complex sample data set, selecting K =4, and obtaining the sizes of the boundary boxes respectively as follows: (4, 13), (23, 29), (15, 42), (55, 135), then on the basis of a Yolov3 network, adding an output scale for a bounding box clustering result, using 2 times of up-sampling operation on the output result of an 8 times down-sampling layer, fusing the output result with a 4 times down-sampling feature map, and outputting to construct an infrared image multi-target pedestrian recognition network;

4) Training the infrared image multi-target pedestrian recognition network constructed in the step 3) by using the infrared image data set obtained in the step 2) as a training set; the method for recognizing the infrared pedestrians with the multiple targets by using the trained infrared image multiple target pedestrian recognition network specifically comprises the following steps: