CN110837769A

CN110837769A - Embedded far infrared pedestrian detection method based on image processing and deep learning

Info

Publication number: CN110837769A
Application number: CN201910745838.6A
Authority: CN
Inventors: 郑永森; 王国华; 李进业; 周殿清; 周伟滨; 林琳; 李卓思
Original assignee: Guangzhou Sanmu Intelligent Technology Co Ltd
Current assignee: Zhongshan Sanzhuo Intelligent Technology Co ltd
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2020-02-25
Anticipated expiration: 2039-08-13
Also published as: CN110837769B

Abstract

The invention discloses an embedded far infrared pedestrian detection method based on image processing and deep learning, which comprises the steps of obtaining a pedestrian candidate area by utilizing a rapid local double-threshold and local sliding window technology; constructing a classifier based on the combined classification of the Alxnet network and the VGGnet depth network to classify the candidate regions, and obtaining a pedestrian detection frame; on the basis, the fast local dual-threshold segmentation result is used as an observation value to carry out Kalman tracking on the detection result. The system comprises: the pedestrian tracking system comprises a candidate region generating module for acquiring a pedestrian candidate region by using a rapid local dual-threshold and local sliding window technology, a candidate region classifying module for classifying the candidate region by using a classifier based on Alexenet and VGGnet depth network joint classification, an offline training module for Alexenet and VGGnet depth network training and Alexenet and VGGnet network weight training based on a support vector machine, and a pedestrian tracking module for performing Kalman tracking on a detection result by using a rapid local dual-threshold segmentation result as an observation value. The pedestrian detection system can better give consideration to the accuracy rate and the real-time performance of pedestrian detection, can run in an embedded mode, and can be used for realizing the auxiliary driving system based on the pedestrian detection of the vehicle-mounted camera at night.

Description

Embedded far infrared pedestrian detection method based on image processing and deep learning

Technical Field

The invention belongs to the field of auxiliary driving systems of computer vision and pattern recognition, image processing and computer vision, and particularly relates to an embedded far infrared pedestrian detection method of image processing and deep learning.

Background

When the automobile is driven at ordinary times, the visual field and the visibility of a driver are easily influenced when the automobile is driven at night and is in bad weather and when strong light and light change. If the field of vision and visibility of a driver can be improved by the sensor device and pedestrians on a road can be detected, occurrence of traffic accidents can be effectively prevented. The research of the vehicle-mounted far infrared pedestrian detection algorithm is the key for achieving the effects. Because far infrared can effectively achieve the effects of night, severe weather and strong light inhibition according to temperature difference imaging, the research on the vehicle-mounted pedestrian detection method based on thermal imaging is a key for effectively guaranteeing the safety of pedestrians on roads in the driving process, and has great research and social values.

The method comprises the steps that a dividing result is obtained by a bud of royal jelly (infrared pedestrian detection research [ J ] based on candidate region enumeration, Huaibei university scholarship (Nature science edition), 2019,40(1):73-80.) through a selective search algorithm, then the dividing result is combined by utilizing priori knowledge to obtain a candidate region, and on the basis, an Adaboost classifier based on integral channel characteristics realizes far infrared pedestrian detection. The method has the advantages that although a good real-time effect is achieved, the infrared pedestrian feature extraction is carried out by using the traditional feature extraction method, the image feature extraction is not carried out by using a deep learning method, and the system precision is low.

Stone YongBiao et al (infrared pedestrian detection method [ J ] infrared, 2018, v.39(05): 44-50) based on aggregate channel characteristics) use Adaboost classifier to realize far infrared pedestrian detection in the classification stage. As only one classifier is used for completing detection, high precision is difficult to achieve in complex and various vehicle-mounted outdoor scenes. The invention provides a method for performing joint decision by using a plurality of classifiers, and the weight of each classifier is not artificially determined but is obtained by learning through a support vector machine.

The royal viagra et al (improved YOLOv3 infrared video image pedestrian detection algorithm [ J ]. the academic newspaper of western-ampere post and telecommunications university, 2018,23(04):52-56.) improve the current end-to-end depth target detection network YOLOv3 by performing dimension clustering analysis on target candidate frames of an infrared image data set, adjusting a classification network pre-training process and performing multi-scale network training, thereby obtaining higher precision, however, the defects that the YOLOv3 per se has inaccurate positioning on pedestrians and excessively low precision of remote target detection are still difficult to avoid. Therefore, the method has poor detection effect on the long-distance pedestrian target when the speed is high, and has low estimation precision on the distance between the pedestrian and the vehicle.

The patent discloses an infrared pedestrian detection method based on image block deep learning features (Chinese patent authorization publication No. CN106096561A, authorization publication date: 2016, 11, and 09, 2016), which extracts small image blocks by sliding on positive and negative samples of an infrared pedestrian data set, then clusters the small image blocks, trains a convolutional neural network for each type of image block, and accordingly obtains a convolutional neural network group. And during testing, classifying the candidate regions by using the obtained neural network group to finish infrared pedestrian detection. Although the method is high in precision, the obtained convolutional neural network group contains a plurality of deep networks, so that the calculation cost is high. In the embedded type, real-time property is difficult to guarantee.

A night pedestrian detection method based on infrared pedestrian brightness statistical characteristics (Chinese patent grant publication No. CN104778453A, grant publication date: 2015: 07/15) constructs a brightness histogram characteristic for distinguishing voting interval division, connects gradient direction histogram characteristics in series, combines the two characteristics to form a final characteristic descriptor, and classifies candidate regions by utilizing Adaboost in combination with a decision tree to complete pedestrian detection. Although the algorithm has good real-time performance, the accuracy of the system is poor because the deep learning technology is not utilized to extract the features.

In summary, although the research on the thermal imaging-based vehicle-mounted pedestrian detection method has achieved certain results, in order to meet the requirements of practical application, further improvements in detection accuracy and real-time performance are urgently needed, and an algorithm implemented in an embedded system rather than a simulation algorithm in a personal computer is needed.

Disclosure of Invention

The embodiment of the invention aims to provide an embedded far-infrared pedestrian detection method based on image processing and deep learning, and aims to solve the problems that the identification accuracy of the existing vehicle-mounted pedestrian detection method based on a vehicle-mounted far-infrared camera cannot meet the requirement of practical accuracy, the real-time performance needs to be further improved, and the algorithm is not usually operated in embedded equipment.

The embedded far-infrared pedestrian detection method is characterized in that a rapid local double-threshold and local sliding window technology is used for obtaining a pedestrian candidate region, then a deep learning double-classifier based on the learning weight of a support vector machine is used for carrying out joint classification on the candidate region, and a segmentation result is used as an observation value to carry out Kalman tracking on a detection result so as to complete pedestrian detection, and specifically comprises the following steps:

acquiring a pedestrian candidate area by using a rapid local double threshold and local sliding window technology;

step two, performing combined classification on the candidate regions by adopting a deep learning double classifier based on the learning weight of the support vector machine;

thirdly, performing Kalman tracking on the detection result by taking the segmentation result as an observation value;

further, the embedded far-infrared pedestrian detection method based on image processing and deep learning according to claim 1, characterized in that the selective search algorithm in step one is combined with a local sliding window technique, and after the selective search algorithm obtains a preliminary candidate region, local sliding window is performed on the basis of the preliminary candidate region, so as to obtain a final candidate region, thereby making up for the defect that the current selective search algorithm cannot obtain all pedestrian candidate regions in various scenes; the local sliding window technology refers to that the sitting corner coordinate of each rectangular frame obtained by selective search is respectively 10 multiplied by 20 pixels by taking the upper left corner coordinate as the sitting corner coordinate of the sliding window²24 x 48 pixels²32 x 64 pixels²The local window size of 48 × 96 pixels is subjected to sliding window to obtain a final infrared pedestrian candidate region.

Further, the embedded far-infrared pedestrian detection method based on image processing and deep learning of claim 1, wherein the deep learning dual-classifier joint classification in the second step is to classify candidate regions by combining weights through an Alexnet network and a VGGnet network; the learning weight based on the support vector machine means that the weight occupied by the Alexnet network and the VGGnet network is obtained by learning of the support vector machine.

Further, the method for detecting pedestrians under embedded far infrared with image processing and deep learning as claimed in claim 1, wherein the segmentation result in the third step is the segmentation result obtained by the local adaptive dual-threshold segmentation in the first step; taking the segmentation result as an observation value to perform Kalman tracking on the detection result means that the observation value required by the Kalman tracking algorithm is provided by the segmentation result.

Compared with the existing pedestrian detection technology based on the vehicle-mounted far infrared camera, the vehicle-mounted far infrared pedestrian detection method based on the selective search and the machine learning double-branch classification has the following advantages and effects: the candidate region is obtained by performing local sliding windows of four scales on the basis of the local dual-threshold segmentation result, so that the defect of infrared image segmentation performed by the current local dual-threshold segmentation is overcome, and a pedestrian candidate region with higher quality can be obtained; the deep learning double classifiers based on the learning weight of the support vector machine are designed for the combined classification of the candidate regions, and compared with the existing single classifier and single feature extraction method, the method can fully utilize the respective advantages of different classifiers in feature extraction and classification, and obtain a more robust classification result through combined decision; meanwhile, the two deep networks of the invention are obtained by learning the weights occupied by the two deep networks during the joint decision making; furthermore, in the tracking stage, the invention provides a segmentation result obtained by utilizing a quick local double threshold value as an observation value of far infrared pedestrian tracking, so that the accuracy of pedestrian tracking is remarkably improved; in addition, the system can run in real time in an embedded system under various outdoor traffic scenes, and tests under actual scenes and various weather scenes show that the system has high accuracy and meets the requirements of actual application.

Drawings

Fig. 1 is an embedded far-infrared pedestrian detection method with image processing and deep learning according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an embedded far-infrared pedestrian detection method based on image processing and deep learning according to an embodiment of the present invention;

in the figure: A. a candidate region generation module; B. a candidate region classification training module; C. a pedestrian tracking module; D. and a classifier offline training module.

FIG. 3 is a diagram of an embodiment of a deep learning dual classifier structure based on support vector machine learning weights according to an embodiment of the present invention;

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The application of the principles of the present invention will be further described with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, an embedded far infrared pedestrian detection method with image processing and deep learning according to an embodiment of the present invention includes the following steps:

s101, acquiring a pedestrian candidate area by using a rapid local double threshold and local sliding window technology;

s102, performing combined classification on the candidate regions by adopting a deep learning double classifier based on the learning weight of the support vector machine;

s103, performing Kalman tracking on the detection result by taking the segmentation result as an observation value;

step S101, the rapid local dual-threshold and local sliding window technology is used, after a preliminary candidate area is obtained by a rapid local dual-threshold algorithm, local sliding window is carried out on the basis of the preliminary candidate area, so that a final candidate area is obtained, and the defect that all pedestrian candidate areas cannot be obtained in various scenes by the conventional rapid local dual-threshold algorithm is overcome; the fast local dual-threshold algorithm is that a high threshold and a low threshold are calculated through 24 pixels on the same horizontal line of each pixel, so that image segmentation is realized, and a preliminary pedestrian candidate area is obtained through a 4-link area marking algorithm; the local sliding window technology refers to that the sitting corner coordinate of each rectangular frame obtained by selective search is respectively 10 multiplied by 20 pixels by taking the upper left corner coordinate as the sitting corner coordinate of the sliding window²24 x 48 pixels²32 x 64 pixels²48 x 96 pixels²The size of the local window is slid to obtain a final infrared pedestrian candidate area.

The deep learning dual-classifier joint classification in the step S102 refers to classifying candidate regions through weight joint by an Alexnet network and a VGGnet network; the learning weight based on the support vector machine means that the weight occupied by the Alexnet network and the VGGnet network is obtained by learning of the support vector machine.

The segmentation result obtained in the step S103 is the segmentation result obtained by the local adaptive dual-threshold segmentation obtained in the step one; taking the segmentation result as an observation value to perform Kalman tracking on the detection result means that the observation value required by the Kalman tracking algorithm is provided by the segmentation result.

As shown in fig. 2, an embedded far-infrared pedestrian detection method of image processing and deep learning according to an embodiment of the present invention mainly includes a candidate region generation module a; a candidate region classification training module B; a pedestrian tracking module C; and a classifier off-line training module D.

And the candidate region selection module A combines a rapid local dual-threshold segmentation algorithm with a local sliding window technology to rapidly and accurately acquire the pedestrian candidate region.

And the candidate region classification module B is connected with the candidate region selection module A and the classifier offline training module D, and performs online joint classification on the candidate regions by using the double classifiers obtained based on deep learning and the decision weight.

And the pedestrian tracking module C is used for tracking the pedestrian target obtained by deep learning classification by taking the segmentation result obtained by the local dual-threshold algorithm as an observation value, so that the detection frame of the pedestrian is more stable.

And the classifier offline training module D is used for collecting samples, offline training Alexnet and VGGnet deep learning network classifiers and offline determining the weight of the two classifiers in the joint decision process.

The specific embodiment of the invention:

the overall flow of the method of the invention is shown in figure 1, and the main body of the method of the invention comprises three parts: 1. acquiring a pedestrian candidate area by using a rapid local double threshold and local sliding window technology; 2. performing joint classification on the candidate regions by adopting a deep learning double classifier based on the learning weight of the support vector machine; 3. and taking the segmentation result as an observation value to carry out Kalman tracking on the detection result. All algorithms of the present invention are implemented in Nvidia Jetson TX2 embedded computer, engida.

1. Pedestrian candidate area acquisition by using fast local dual-threshold and local sliding window technology

The candidate area generation method is characterized in that a candidate area with lower precision is obtained based on a rapid local dual-threshold algorithm specially used for far infrared pedestrian segmentation at present, and on the basis of the candidate area, the final far infrared pedestrian candidate area is obtained by utilizing coordinates of the upper left corners of all the candidate areas with low precision through a local sliding window technology. Through the two main steps, the pedestrian candidate area is obtained by utilizing the rapid local double-threshold and local sliding window technology. Therefore, the candidate region generation stage of the invention mainly comprises two steps, the first step is: executing a fast local dual-threshold algorithm on the original infrared image to obtain a low-precision candidate region, and secondly: and acquiring the infrared pedestrian candidate region by using a local sliding window technology according to the upper left corner coordinate of the low-precision candidate region. 1.1 executing fast local dual-threshold segmentation algorithm to the original infrared image to obtain low-precision candidate region

The fast local dual-threshold segmentation algorithm drives pixels inside the infrared pedestrians, and the pixels are higher than the average value of surrounding pixels on the same horizontal line, so that the infrared pedestrians are segmented, and the specific execution steps are as follows: and (3) performing image segmentation by taking the original infrared image as input to obtain a segmented binary image, wherein a 4-pass region of the binary image is a low-precision candidate region. The specific steps for performing image segmentation are as follows: for each pixel of the image (except for the 12 leftmost and rightmost pixels), two segmentation thresholds are dynamically calculated according to equations (1) and (2), and the low threshold T is calculated according to equation (1)_LEquation (2) calculates the high threshold value T_H. If the pixel value of the current pixel is lower than T_LThen, the pixel is divided into background; if the gray level of the pixel is higher than T_HThen, the pixel is segmented into the foreground; the pixel value of the pixel is set to [ T ]_L,T_H]In between, it is determined whether the current pixel is segmented into the foreground or the background according to the segmentation result of the left side of the pixel, specifically, when the left side of the pixel is segmented into the foreground, the current pixel is also segmented into the foreground, otherwise, the current pixel is segmented into the background.

T_H(i,j)＝T_L(i,j)+θ (2)

Wherein, T_L(i, j) is the low threshold for the current pixel (i, j), T_H(i, j) is the high threshold of the current pixel (i, j), L is the width of the current pixel along the same horizontal line, and θ has a value of 8.

1.2 according to the coordinates of the upper left corner of the low-precision candidate region, acquiring the infrared pedestrian candidate region by using a local sliding window technology

In the invention, the candidate region obtained by the rapid local dual-threshold segmentation algorithm is a preliminary candidate region with lower precision, and on the basis, the invention provides that the local sliding window is carried out on the basis of the preliminary candidate region so as to obtain a final candidate region, so that the defect that all pedestrian candidate regions cannot be obtained in various scenes by the conventional rapid local dual-threshold segmentation algorithm is overcome; specifically, for the sitting corner coordinate of each rectangular frame obtained by selective search, the sitting corner coordinate of the sliding window is taken as the upper left corner coordinate, and the sitting corner coordinate is respectively calculated according to 10 × 20 pixels²24 x 48 pixels²32 x 64 pixels²The local window size of 48 × 96 pixels is subjected to sliding window to obtain a final infrared pedestrian candidate region. And preparing for subsequent candidate region feature extraction.

2. Joint classification is carried out on candidate regions by adopting deep learning double classifiers based on learning weights of support vector machine

The classifier for performing combined classification by the deep learning double classifiers based on the learning weight of the support vector machine comprises two parts, namely training sample preparation and off-line training of the double classifiers, decision weight of the deep learning double classifiers based on the learning weight of the support vector machine and combined on-line detection of the double classifiers.

2.1 training sample preparation and Dual classifier offline training

1) Training sample preparation

The data of scenes of expressways, national roads, urban areas and suburbs are collected by a vehicle-mounted far infrared camera shooting mode, and videos which are as long as 300 hours are obtained. Random sampling is carried out to obtain pictures. Obtaining 100 million original infrared images in total, and manually labeling all pedestrians appearing in the original infrared images, wherein positive samples of all pedestrians appearing in the original infrared images comprise a data set Dataset1, and positive samples of the other 50 million labeled images comprise a data set Dataset 2; in 10 ten thousand far infrared images without pedestrians, a non-pedestrian sample is obtained by the method for obtaining the candidate region in the first step of the patent of the invention, namely, the red candidate region is obtained by using a fast dual-threshold segmentation algorithm and a local sliding window technology, and a non-pedestrian data set Dataset3 is formed; all pedestrian pictures in the Dataset1 are taken out and form a data set Dataset4 together with all non-pedestrian pictures of Dataset 3; all pedestrian pictures in Dataset2 are taken out and form a data set Dataset5 together with all non-pedestrian pictures in Dataset 3.

2) Dual classifier offline training

The dual classifiers of the patent of the invention refer to Alexnet deep convolutional neural network and VGGnet deep convolutional neural network. The Alexnet and VGGnet depth networks were trained by fine tuning in Dataset4, respectively, using the Alexnet and VGGnet depth networks that had been trained in the ImageNet Dataset. The hyper-parameters are set as follows: (1) the optimization algorithm selects a self-adaptive optimization algorithm Adam; (2) the learning rate was set to 0.01; (3) the batch size is set to 32; (4) the image is a single-channel gray scale image; (5) the dropout technology is not adopted; (6) the data enhancement of the original picture comprises: translation transformation and left-right turning transformation; (7) the image input size is scaled to 224 x 224 size using a bilinear interpolation algorithm. The VGGnet of the invention is a VGG19net network, and the specific network structure is shown in Table 1.

TABLE 1 VGGnet network Structure (VGG19-net)

Wherein "conv" represents convolution operation, "relu" represents linear rectification function as activation function, "fc" is full-connection operation, "prob" is classification layer of function with softmax as classifier.

Table 2 Alexnet network architecture diagram.

2.2 support vector machine learning decision weights for Dual classifiers and Dual-classifier Joint on-line detection

And jointly classifying by using dual classifiers Alexnet and VGGnet to finish classification of all candidate regions, and fusing classifier results of the dual classifiers Alexnet and the VGGnet in a weighting mode. The specific weight value obtaining method is obtained through the learning of a nonlinear support vector machine. More specifically, any sample S of Dataset5 is classified by using Alexnet classifier obtained through training, and the output Score of the classifier is assumed to be Score₁(ii) a Classifying by using the VGGnet classifier obtained by training, and assuming that the output Score of the classifier is Score₂. Will (Score)₁，Score₂) Forming new features, representing the new features of the sample S, and training a linear support vector machine classifier together with the original label of the sample S, thereby obtaining the decision weight w when the dual classifiers Alexenet and VGGnet are jointly classified₁And w₂And an offset b. The joint classification of the candidate regions is done according to equation (3).

Score＝w₁×Score₁+w₂×Score₂+b (3)

Wherein, the Score is the final output result of the dual-classifier joint classification, and when the Score is greater than 0, the joint classification result is a pedestrian, otherwise, the joint classification result is a non-pedestrian.

3. Taking the segmentation result as an observation value to carry out Kalman tracking on the detection result

The Kalman tracking algorithm corrects the prediction estimation of the state variable by using observation data to obtain the optimal estimation of the state variable, when the Kalman tracking algorithm is used for multi-target tracking of pedestrians, the positions of all the pedestrians possibly appearing in the next frame of image can be directly given, and the detection positions of the pedestrians in the next frame of image can be positioned by performing similarity matching on the pedestrian target of the previous frame and the image of the predicted position, so that the possible missing detection condition of the pedestrians is made up. In the process, considering that the Kalman observed value has a large influence on the accuracy of the tracking algorithm, the method can generally obtain a more accurate segmentation result according to a local dual-threshold segmentation algorithm, and provides the segmentation result as the observed value of the traditional Kalman algorithm so as to obtain a more accurate Kalman predicted value. Specifically, the center position of the pedestrian target and the height and width of the detection frame obtained through multi-frame verification (three consecutive frames of a certain candidate region are detected as the pedestrian target) are tracked, so that the state vector of the pedestrian is expressed as formula (4).

X_t＝(x_t,y_t,h_t,w_t,Δx_t,Δy_t,Δh_t,Δw_t)^T(4)

Wherein (x)_t,y_t) Coordinates of the center position of the pedestrian detection frame representing the t-th frame, (h)_t,w_t) Representing the height and width of the pedestrian detection frame of the t-th frame; (Δ x)_t,Δy_t) Represents the change of the center point of the detection frame (delta h)_t,Δw_t) Representing variations in height and width in the detection frame. Because the frame rate of the video is 25 frames per second, the motion of the rectangular frames of two adjacent frames of pedestrians can be regarded as uniform motion, the kalman state transition matrix Ω is expressed as formula (5), and the system measurement matrix H is expressed as formula (6).

The invention uses the fast dual-threshold segmentation result as the observed value of the traditional Kalman algorithm, and uses the nearest neighbor matching method of formula (6) to match in order to find the observed value corresponding to the detection result. And when the Kalman tracker cannot be matched according to the formula (6), directly taking the Kalman predicted value as an observation value to complete the updating of the Kalman tracker.

|x₁-x₂|<T₁&&|y₁-y₂|<T₁&&|w₁-w₂|<T₁&&|h₁-h₂|<T₁(7)

Wherein, w₁，h₁Respectively representing the width and height of a certain detection frame rectangle, and the coordinate of the center point of the detection frame rectangle is (x)₁,y₁)， w₂，h₂Respectively representing the width and height of a rectangle of a certain fast local dual-threshold segmentation result, and the coordinate of the center point of the rectangle is (x)₂,y₂)，T₁And T₂(value 7) represents the nearest neighbor distance threshold in the horizontal and vertical directions, respectively.

Claims

1. The embedded far-infrared pedestrian detection method is characterized in that a rapid local double-threshold and local sliding window technology is used for obtaining a pedestrian candidate region, then a deep learning double-classifier based on the learning weight of a support vector machine is used for carrying out joint classification on the candidate region, and a segmentation result is used as an observation value to carry out Kalman tracking on a detection result so as to complete pedestrian detection, and specifically comprises the following steps:

and step three, taking the segmentation result as an observation value to perform Kalman tracking on the detection result.

2. The method for detecting the embedded far infrared pedestrian according to the image processing and deep learning of the claim 1, wherein the step one of utilizing the fast local dual-threshold and local sliding window technique means that after the fast local dual-threshold algorithm obtains the preliminary candidate region, the local sliding window is performed on the basis of the preliminary candidate region, so as to obtain the final candidate region, thereby making up for the defect that the current fast local dual-threshold algorithm cannot obtain all pedestrian candidate regions in various scenes; the fast local dual-threshold algorithm refers to passing through the nearest neighbors 24 on the same horizontal line for each pixelCalculating a high threshold and a low threshold by each pixel so as to realize image segmentation, and obtaining a preliminary pedestrian candidate region by a 4-link region marking algorithm; the local sliding window technology refers to that the sitting corner coordinate of each rectangular frame obtained by selective search is respectively 10 multiplied by 20 pixels by taking the upper left corner coordinate as the sitting corner coordinate of the sliding window²24 x 48 pixels²32 x 64 pixels²48 x 96 pixels²The size of the local window is slid to obtain a final infrared pedestrian candidate area.

3. The image processing and deep learning embedded far-infrared pedestrian detection method of claim 1, wherein the deep learning dual-classifier joint classification in the second step is to classify candidate regions by a weight joint through an Alexnet network and a VGGnet network; the learning weight based on the support vector machine means that the weight occupied by the Alexnet network and the VGGnet network is obtained by learning of the support vector machine.

4. The method as claimed in claim 1, wherein the segmentation result obtained in step three is obtained by local adaptive dual-threshold segmentation obtained in step one; taking the segmentation result as an observation value to perform Kalman tracking on the detection result means that the observation value required by the Kalman tracking algorithm is provided by the segmentation result.