CN110837769B

CN110837769B - Image processing and deep learning embedded far infrared pedestrian detection method

Info

Publication number: CN110837769B
Application number: CN201910745838.6A
Authority: CN
Inventors: 郑永森; 王国华; 李进业; 周殿清; 周伟滨; 林琳; 李卓思
Original assignee: Zhongshan Sanzhuo Intelligent Technology Co ltd
Current assignee: Zhongshan Sanzhuo Intelligent Technology Co ltd
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2023-08-29
Anticipated expiration: 2039-08-13
Also published as: CN110837769A

Abstract

The invention discloses an embedded far infrared pedestrian detection method for image processing and deep learning, which utilizes a rapid local double-threshold and local sliding window technology to obtain a pedestrian candidate region; constructing a classifier based on the joint classification of the Alxnet network and the VGGnet depth network to classify the candidate region, and obtaining a pedestrian detection frame; on the basis, the fast local double-threshold segmentation result is used as an observation value to carry out Kalman tracking on the detection result. The system comprises: the pedestrian tracking system comprises a candidate region generation module for acquiring a pedestrian candidate region by a rapid local double-threshold and local sliding window technology, a candidate region classification module for classifying the candidate region based on a classifier for joint classification of Alexnet and VGGnet depth networks, an Alexnet and VGGnet depth network training and an offline training module for training Alexnet and VGGnet network weights based on a support vector machine, and a pedestrian tracking module for carrying out Kalman tracking on detection results by taking the rapid local double-threshold segmentation results as observation values. The pedestrian detection system and the pedestrian detection method can better consider the pedestrian detection accuracy and the real-time performance.

Description

Image processing and deep learning embedded far infrared pedestrian detection method

Technical Field

The invention belongs to the field of computer vision and mode recognition, image processing and computer vision aided driving systems, and particularly relates to an embedded far infrared pedestrian detection method for image processing and deep learning.

Background

When driving at ordinary times, the visual field and visibility of a driver are easily affected during night driving, bad weather and strong light and light changes. If the visual field and the visibility of the driver can be improved through the sensor device and pedestrians on the road can be detected, traffic accidents can be effectively prevented. The research of the vehicle-mounted far infrared pedestrian detection algorithm is the key for achieving the effects. Because far infrared imaging can effectively achieve the effects of inhibiting night, severe weather and strong light according to temperature difference, the research on a vehicle-mounted pedestrian detection method based on thermal imaging is a key for effectively guaranteeing the safety of road pedestrians in the driving process, and has great research and social values.

Wang Xiaolei (InfraRed detection research [ J ] university of North China university (Nature science edition), 2019,40 (1): 73-80.) is used for obtaining segmentation results through a selective search algorithm, and then the segmentation results are combined by using priori knowledge to obtain candidate regions, and on the basis, the Adaboost classifier based on the integral channel characteristics realizes far infrared pedestrian detection. Although the method has better real-time effect, the traditional feature extraction method is used for extracting the infrared pedestrian features, the deep learning method is not adopted for extracting the image features, so that the system precision is lower.

Dan Yongbiao et al (infrared pedestrian detection method based on aggregate channel characteristics [ J ]. Infrared, 2018, v.39 (05): 44-50.) in the classification stage, far infrared pedestrian detection was achieved using an Adaboost classifier. Because only one classifier is used for completing detection, higher precision is difficult to achieve in complex and various vehicle-mounted outdoor scenes. The invention proposes to use multiple classifiers for joint decision, and the weights of the classifiers are not determined manually, but are learned by a support vector machine.

Wang Dianwei (improved YOLOv3 infrared video image pedestrian detection algorithm [ J ]. Xian university of post and email newspaper 2018,23 (04): 52-56.) the present end-to-end depth target detection network YOLOv3 is improved by performing dimensional cluster analysis on target candidate frames of an infrared image dataset, adjusting a classification network pre-training process and performing multi-scale network training, so that higher accuracy is obtained, however, the defects that YOLOv3 is not accurate enough to position pedestrians and the remote target detection accuracy is too low still remain difficult to avoid. Therefore, the method has poor detection effect on the long-distance pedestrian target when the vehicle speed is high, and also has low accuracy on estimating the distance between the pedestrian and the vehicle.

The patent infrared pedestrian detection method based on the image block deep learning characteristics (Chinese patent grant bulletin number: CN106096561A, grant bulletin day: 2016, 11, 09) acquires a convolutional neural network group by sliding and extracting small image blocks on positive and negative samples of an infrared pedestrian data set, clustering the small image blocks and training a convolutional neural network for each type of image blocks. And in the test, the obtained neural network group is utilized to realize the classification of the candidate region so as to finish the infrared pedestrian detection. Although the method has higher precision, the calculation cost is higher because the obtained convolutional neural network group comprises a plurality of depth networks. In the embedded type, real-time performance is difficult to ensure.

The patent discloses a night pedestrian detection method based on infrared pedestrian brightness statistical characteristics (Chinese patent grant bulletin number: CN104778453A, grant bulletin day: 2015: 07 month 15) which constructs a brightness histogram characteristic for distinguishing voting interval division, connects the brightness histogram characteristic with gradient direction histogram characteristic in series, combines the two characteristics to form a final characteristic descriptor, and classifies candidate areas by utilizing Adaboost in combination with a decision tree to finish pedestrian detection. Although the algorithm has good real-time performance, the system has poor precision because the deep learning technology is not used for extracting the characteristics.

In summary, although research on the vehicle-mounted pedestrian detection method based on thermal imaging has achieved a certain result, in order to meet the requirements of practical applications, further improvement in detection accuracy and real-time performance is urgently needed, and an algorithm is required to be implemented in an embedded system instead of a simulation algorithm in a personal computer.

Disclosure of Invention

The embodiment of the invention aims to provide an embedded far infrared pedestrian detection method for image processing and deep learning, and aims to solve the problems that the identification accuracy of the existing vehicle-mounted pedestrian detection method for a vehicle-mounted far infrared camera cannot meet the accuracy requirement of practicality, the real-time performance needs to be further improved and an algorithm is not normally operated in embedded equipment.

The embedded far infrared pedestrian detection method for image processing and deep learning is characterized in that a pedestrian candidate region is obtained by utilizing a rapid local double-threshold and local sliding window technology, then the candidate region is subjected to joint classification by adopting a deep learning double classifier based on a support vector machine learning weight, and a segmentation result is used as an observation value to carry out Kalman tracking on the detection result, so that pedestrian detection is completed, and the method specifically comprises the following steps:

step one, obtaining a pedestrian candidate area by utilizing a rapid local double-threshold and local sliding window technology;

secondly, carrying out joint classification on the candidate areas by adopting a deep learning double classifier based on the learning weight of the support vector machine;

step three, carrying out Kalman tracking on the detection result by taking the segmentation result as an observation value;

the method is characterized in that the step one is that the selective search algorithm is combined with a local sliding window technology to obtain a preliminary candidate region, and then the selective search algorithm performs local sliding window on the basis of the preliminary candidate region so as to obtain a final candidate region, so that the defect that the current selective search algorithm cannot obtain all pedestrian candidate regions in various scenes is overcome; the local sliding window technology refers to that the sitting angular coordinates of each rectangular frame obtained by selective search are respectively according to 10 multiplied by 20 pixels by taking the left upper angular coordinates as the sitting angular coordinates of the sliding window ² 24×48 pixels ² 32×64 pixels ² The local window size of 48 x 96 pixels is windowed to obtain the final infrared pedestrian candidate region. The method is characterized in that the deep learning double classifier joint classification in the second step refers to classification of candidate areas by an Alexnet network and a VGGnet network through weight joint; the learning weights based on the support vector machine refer to weights occupied by an Alexnet network and a VGGnet network respectively, which are obtained by learning the support vector machine.

The method for detecting embedded far infrared pedestrians by image processing and deep learning of claim 1, further characterized in that the segmentation result of the step three refers to the segmentation result obtained by the local self-adaptive double-threshold segmentation obtained in the step one; the Kalman tracking value for the detection result by taking the segmentation result as the observation value means that the observation value required by the Kalman tracking algorithm is provided by the segmentation result.

Compared with the existing pedestrian detection technology based on the vehicle-mounted far infrared camera, the vehicle-mounted far infrared pedestrian detection method based on the selective search and machine learning double-branch classification has the following advantages and effects: the candidate region is obtained by carrying out four scale local sliding windows on the basis of the local double-threshold segmentation result, so that the defect of infrared image segmentation by the current local double-threshold segmentation is overcome, and a pedestrian candidate region with higher quality can be obtained; the method has the advantages that the candidate areas are subjected to joint classification by adopting the deep learning double classifier based on the learning weight of the support vector machine, and compared with the existing single classifier and single feature extraction method, the method can fully utilize the advantages of the feature extraction and classification of different classifiers, and a more robust classification result is obtained through joint decision; meanwhile, the weights occupied by the two deep networks in the invention are obtained by learning a support vector machine when the two deep networks carry out joint decision; furthermore, in the tracking stage, the invention provides a segmentation result obtained by utilizing the rapid local double threshold value as an observation value of far infrared pedestrian tracking, so that the accuracy of pedestrian tracking is remarkably improved; in addition, the system can run in real time in an embedded system under various outdoor traffic scenes, and tests in actual scenes and various weather scenes show that the system has high accuracy and meets the requirements of actual application.

Drawings

FIG. 1 is a schematic illustration of an embedded far infrared pedestrian detection method for image processing and deep learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an embedded far infrared pedestrian detection method for image processing and deep learning according to an embodiment of the present invention;

in the figure: A. a candidate region generation module; B. a candidate region classification training module; C. a pedestrian tracking module; D. and the classifier offline training module.

FIG. 3 is a diagram of an embodiment of a deep learning dual classifier structure based on support vector machine learning weights provided by an embodiment of the present invention;

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The principles of the invention will be further described with reference to the drawings and specific examples.

As shown in fig. 1, an embedded far infrared pedestrian detection method for image processing and deep learning according to an embodiment of the present invention includes the following steps:

s101, obtaining a pedestrian candidate region by utilizing a rapid local double-threshold and local sliding window technology;

s102, carrying out joint classification on candidate areas by adopting a deep learning double classifier based on a learning weight of a support vector machine;

s103, carrying out Kalman tracking on the detection result by taking the segmentation result as an observation value;

step S101, after a preliminary candidate region is obtained by using a quick local double-threshold and local sliding window technology, a local sliding window is performed on the basis of the preliminary candidate region, so as to obtain a final candidate region, and the defect that all pedestrian candidate regions cannot be obtained in various scenes by using the current quick local double-threshold algorithm is overcome; the rapid local double-threshold algorithm means that a high threshold and a low threshold are calculated through 24 pixels on the same horizontal line of each pixel, so that image segmentation is realized, and a preliminary pedestrian candidate area is obtained through a 4-communication area marking algorithm; the local sliding window technology refers to that the sitting angular coordinates of each rectangular frame obtained by selective search are respectively according to 10 multiplied by 20 pixels by taking the left upper angular coordinates as the sitting angular coordinates of the sliding window ² 24×48 pixels ² 32×64 pixels ² 48×96 pixels ² Sliding a window to obtain a final infrared pedestrian candidate region.

Step S102, the deep learning double classifier joint classification refers to classifying candidate areas through weight combination by an Alexnet network and a VGGnet network; the learning weights based on the support vector machine refer to weights occupied by an Alexnet network and a VGGnet network respectively, which are obtained by learning the support vector machine.

The segmentation result in step S103 refers to the segmentation result obtained by the local adaptive dual-threshold segmentation in step a; the Kalman tracking value for the detection result by taking the segmentation result as the observation value means that the observation value required by the Kalman tracking algorithm is provided by the segmentation result.

As shown in fig. 2, an embedded far infrared pedestrian detection method for image processing and deep learning in the embodiment of the invention mainly comprises a candidate region generation module a; a candidate region classification training module B; a pedestrian tracking module C; and the classifier off-line training module D.

And the candidate region selecting module A is used for quickly and accurately acquiring the pedestrian candidate region by combining a quick local double-threshold segmentation algorithm with a local sliding window technology.

And the candidate region classification module B is connected with the candidate region selection module A and the classifier offline training module D, and performs online joint classification on the candidate region by using a double classifier and decision weights obtained based on deep learning.

And the pedestrian tracking module C is used for tracking pedestrian targets obtained according to the deep learning classification by taking a segmentation result obtained by a local double-threshold algorithm as an observation value, so that a detection frame for pedestrians is more stable.

And the classifier offline training module D is used for collecting samples, offline training Alexnet and VGGnet deep learning network classifiers and offline determining weights of the two classifiers when in joint decision.

Specific examples of the invention:

the overall flow of the method is shown in fig. 1, and the main body of the method comprises three parts: 1. obtaining a pedestrian candidate region by utilizing a rapid local double-threshold and local sliding window technology; 2. carrying out joint classification on the candidate areas by adopting a deep learning double classifier based on the learning weight of the support vector machine; 3. and taking the segmentation result as an observation value to carry out Kalman tracking on the detection result. All algorithms of the present invention are implemented in the Nvidia JetsonTX2 embedded computer of inflight corporation.

1. Obtaining pedestrian candidate areas by using a rapid local double-threshold and local sliding window technology

The candidate region generation method obtains a candidate region with lower precision based on a rapid local double-threshold algorithm special for far infrared pedestrian segmentation at present, and obtains a final far infrared pedestrian candidate region by utilizing the left upper corner coordinates of all low-precision candidate regions and a local sliding window technology on the basis of the candidate region. Through the two main steps, the pedestrian candidate region is obtained by utilizing the rapid local double-threshold and local sliding window technology. The candidate region generation stage of the present invention mainly includes two steps, the first step: performing a rapid local double-threshold algorithm on the original infrared image to obtain a low-precision candidate region, and performing the second step: and acquiring the infrared pedestrian candidate region by utilizing a local sliding window technology according to the left upper corner coordinates of the low-precision candidate region. 1.1 performing a fast local double-threshold segmentation algorithm on an original infrared image to obtain a low-precision candidate region

The rapid local double-threshold segmentation algorithm drives pixels in infrared pedestrians, and on the same horizontal line, the average value of the pixels is higher than that of surrounding pixels, so that the infrared pedestrians are segmented, and the specific implementation steps are as follows: the original infrared image is taken as input, image segmentation is carried out, a segmented binary image is obtained, and a 4-communication area of the binary image is a low-precision candidate area. The specific steps for executing image segmentation are as follows: for each pixel of the image (except for the leftmost and rightmost 12 pixels), two segmentation thresholds are dynamically calculated according to equations (1) and (2), equation (1) calculates a low threshold T _L Calculating a high threshold T according to the formula (2) _H . If the pixel value of the current pixel is lower than T _L When the pixel is segmented into a background; if the gray value of the pixel is higher than T _H When the pixel is segmented into foreground; the pixel value for doing this pixel is located at [ T ] _L ,T _H ]And when the current pixel is segmented into the foreground, or else, the current pixel is segmented into the background.

T _H (i,j)＝T _L (i,j)+θ (2)

Wherein T is _L (i, j) is the low threshold of the current pixel (i, j), T _H (i, j) is the high threshold of the current pixel (i, j), L is the width of the same horizontal line of the current pixel, and θ has a value of 8.

1.2 acquiring an infrared pedestrian candidate region by utilizing a local sliding window technology according to the left upper corner coordinates of the low-precision candidate region

In the invention, the candidate area obtained by the rapid local double-threshold segmentation algorithm is a preliminary candidate area with lower precision, and on the basis, the invention proposes to perform local sliding window on the basis of the preliminary candidate area so as to obtain a final candidate area, so as to make up for the defect that all pedestrian candidate areas cannot be obtained in various scenes by the current rapid local double-threshold segmentation algorithm; specifically, for the sitting angular position of each rectangular frame obtained by the selective search, the sitting angular position of the sliding window is set to be the left upper angular position, and the sitting angular position is respectively set to 10×20 pixels ² 24×48 pixels ² 32×64 pixels ² The local window size of 48 x 96 pixels is windowed to obtain the final infrared pedestrian candidate region. Preparation is made for subsequent candidate region feature extraction.

2. The candidate areas are subjected to joint classification by adopting a deep learning dual classifier based on the learning weight of the support vector machine

The classifier for carrying out joint classification based on the deep learning double classifier for supporting the learning weight of the vector machine comprises two parts, namely training sample preparation and offline training of the double classifier, decision weight of the deep learning double classifier for supporting the learning of the vector machine and joint online detection of the double classifier.

2.1 training sample preparation and Dual classifier offline training

1) Training sample preparation

And collecting data of expressway, national road, urban area and suburban area scenes by means of vehicle-mounted far infrared shooting, and obtaining video for 300 hours. From which random sampling is performed to obtain pictures. 100 ten thousand original infrared images are obtained, all pedestrians appearing in the original infrared images are manually marked, all positive samples of all pedestrians appearing in the original infrared images are obtained, the positive samples of 50 ten thousand marked pictures form a data set Dataset1, and the positive samples of the other 50 ten thousand marked pictures form a data set Dataset2; in 10 ten thousand far infrared images which do not contain pedestrians, acquiring a non-pedestrian sample by the method for acquiring a candidate region in the first step, namely acquiring a red candidate region by using a rapid double-threshold segmentation algorithm and a local sliding window technology to form a non-pedestrian data set Dataset3; taking out all pedestrian pictures in the Dataset1 and forming a Dataset4 together with all non-pedestrian pictures of the Dataset3; all the pedestrian pictures in the Dataset2 are taken out and together with all the non-pedestrian pictures of the Dataset3, a Dataset5 is formed.

2) Offline training of double classifiers

The double classifier of the invention refers to an Alexnet deep convolutional neural network and a VGGnet deep convolutional neural network. In the Dataset4, the Alexnet depth network and VGGnet depth network were trained by fine tuning, respectively, using the Alexnet and VGGnet depth networks that had been trained in the ImageNet Dataset. Wherein the super parameter is set as follows: (1) selecting an adaptive optimization algorithm Adam by an optimization algorithm; (2) the learning rate is set to 0.01; (3) batch size set to 32; (4) the image is a single-channel gray scale image; (5) not employing dropout technology; (6) data enhancement of the original picture includes: translation transformation and left-right overturn transformation; (7) The image input sizes are scaled to 224 x 224 size using bilinear interpolation algorithms. The VGGnet of the invention is VGG19net network, and the specific network structure is shown in Table 1.

TABLE 1 VGGnet network Structure diagram (VGG 19-net)

Where "conv" represents the convolution operation, "relu" represents the classification layer where the linear rectification function is the activation function, "fc" is the fully connected operation, and "prob" is the function of the classifier with softmax.

Table 2 Alexnet network structure diagram.

2.2 support vector machine learning decision weights for Dual classifiers and Dual classifier Joint Online detection

The classifier results of the two classifiers Alexnet and VGGnet are combined for classification to finish classification of all candidate areas, and fusion is carried out in a weighting mode. The specific weight value obtaining method is obtained through learning of a nonlinear support vector machine. More specifically, for any sample S of Dataset5, classification is performed using a trained Alexnet classifier, assuming that the output Score of the classifier is Score ₁ The method comprises the steps of carrying out a first treatment on the surface of the Classifying with a trained VGGnet classifier, assuming that the output Score of the classifier is Score ₂ . Will (Score) ₁ ，Score ₂ ) Forming new characteristics, representing the new characteristics of the sample S, training a linear support vector machine classifier together with the original labels of the sample S, thereby obtaining decision weights w when the double-classifier Alexnet and VGGnet classifier are used for joint classification ₁ And w ₂ And a bias b. And (3) completing joint classification of the candidate areas according to a formula (3).

Score＝w ₁ ×Score ₁ +w ₂ ×Score ₂ +b (3)

Wherein Score is the final output result of the dual classifier joint classification, when Score >0, the joint classification result is pedestrian, otherwise is non-pedestrian.

3. Carrying out Kalman tracking on the detection result by taking the segmentation result as an observation value

The Kalman tracking algorithm corrects the prediction estimation of the state variables by using the observation data to obtain the optimal estimation of the state variables, and when the Kalman tracking algorithm is used for multi-target tracking of pedestrians, the possible positions of all pedestrians in the next frame of images can be directly given, and the detection positions of the pedestrians in the next frame of images can be positioned by performing similarity matching on the pedestrian targets of the previous frame and the images of the prediction positions, so that the possible situation of missed detection of the pedestrians is compensated. In the process, considering that the observed value of Kalman has a larger influence on the accuracy of a tracking algorithm, the invention can generally obtain a more accurate segmentation result according to a local double-threshold segmentation algorithm, and proposes the segmentation result as the observed value of the traditional Kalman algorithm so as to obtain a more accurate Kalman predicted value. Specifically, the center position of a pedestrian target obtained through multi-frame verification (a candidate area is detected as a pedestrian side by three frames in succession and is regarded as a pedestrian target) and the height and width of a detection frame are tracked, so that the state vector of the pedestrian is expressed as formula (4).

X _t ＝(x _t ,y _t ,h _t ,w _t ,Δx _t ,Δy _t ,Δh _t ,Δw _t ) ^T (4)

Wherein, (x) _t ,y _t ) Representing the center position coordinates of the pedestrian detection frame of the t-th frame, (h) _t ,w _t ) Representing the height and width of the pedestrian detection frame of the t frame; (Deltax) _t ,Δy _t ) Representing the change in the center point of the detection frame, (Δh) _t ,Δw _t ) Representing the variation in height and width in the detection frame. Since the frame rate of the video is 25 frames per second, the motion of the rectangular frames of pedestrians in two adjacent frames can be regarded as uniform motion, the kalman state transition matrix Ω is expressed as formula (5), and the system measurement matrix H is expressed as formula (6).

The invention uses the rapid double-threshold segmentation result as the observation value of the traditional Kalman algorithm, and uses the nearest neighbor matching method of the formula (6) for matching in order to find the observation value corresponding to the detection result. When the matching cannot be performed according to the formula (6), the predicted value of the Kalman is directly used as an observed value, and the updating of the Kalman tracker is completed.

|x ₁ -x ₂ |<T ₁ &&|y ₁ -y ₂ |<T ₁ &&|w ₁ -w ₂ |<T ₁ &&|h ₁ -h ₂ |<T ₁ (7)

Wherein w is ₁ ，h ₁ Respectively representing the width and height of a certain detection frame rectangle, and the central point coordinate of the detection frame rectangle is (x) ₁ ,y ₁ )，w ₂ ，h ₂ Representing the width and height of a rectangle of a certain rapid local double-threshold segmentation result respectively, and the central point coordinate of the rectangle is (x) ₂ ,y ₂ )，T ₁ And T ₂ (value 7) represents the nearest neighbor distance threshold in the transverse and longitudinal directions, respectively.

Claims

1. The embedded far infrared pedestrian detection method for image processing and deep learning is characterized in that a pedestrian candidate region is obtained by utilizing a rapid local double-threshold and local sliding window technology, then the candidate region is subjected to joint classification by adopting a deep learning double classifier based on a support vector machine learning weight, and a segmentation result is used as an observation value to carry out Kalman tracking on the detection result, so that pedestrian detection is completed, and the method specifically comprises the following steps: step one, obtaining a pedestrian candidate area by utilizing a rapid local double-threshold and local sliding window technology; secondly, carrying out joint classification on the candidate areas by adopting a deep learning double classifier based on the learning weight of the support vector machine; step three, carrying out Kalman tracking on the detection result by taking the segmentation result as an observation value; step one, a quick local double-threshold and local sliding window technology is used, namely after a quick local double-threshold algorithm obtains a preliminary candidate region, a local sliding window is carried out on the basis of the preliminary candidate region, so that a final candidate region is obtained, and the defect that all pedestrian candidate regions cannot be obtained in various scenes by the current quick local double-threshold algorithm is overcome; the rapid local double-threshold algorithm means that a high threshold and a low threshold are calculated through the nearest 24 pixels on the same horizontal line of each pixel, so that image segmentation is realized, and a preliminary pedestrian candidate area is obtained through a 4-communication area marking algorithm; the local sliding window technique refers to that each moment obtained by selective searchThe upper-left angular coordinates of the frame are respectively 10×20 pixels ² 24×48 pixels ² 32×64 pixels ² 48×96 pixels ² Sliding a window to obtain a final infrared pedestrian candidate region; step two, the deep learning double classifier joint classification refers to classifying candidate areas through weight combination by an Alexnet network and a VGGnet network; the learning weights based on the support vector machine refer to weights occupied by an Alexnet network and a VGGnet network respectively, which are obtained by learning the support vector machine.

2. The method for detecting embedded far infrared pedestrians in image processing and deep learning according to claim 1, wherein the segmentation result in the third step is a segmentation result obtained by local self-adaptive double-threshold segmentation in the first step; the Kalman tracking value for the detection result by taking the segmentation result as the observation value means that the observation value required by the Kalman tracking algorithm is provided by the segmentation result.