CN107563284B

CN107563284B - Pedestrian tracking method and device

Info

Publication number: CN107563284B
Application number: CN201710620228.4A
Authority: CN
Inventors: 高万林; 陈治昌; 于丽娜; 仲贞; 张国锋; 于丽敏
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2017-07-26
Filing date: 2017-07-26
Publication date: 2020-10-30
Anticipated expiration: 2037-07-26
Also published as: CN107563284A

Abstract

The embodiment of the invention provides a pedestrian tracking and device, and the method comprises the following steps: determining a candidate region in a current frame image according to a region, including pedestrian image characteristics, in a previous frame image corresponding to the current frame image, and acquiring a candidate region image corresponding to the candidate region; acquiring a depth channel characteristic map corresponding to the candidate region image, and mapping each sub-region image in the candidate region image to the depth channel characteristic map to obtain a sub-characteristic map corresponding to each sub-region image; and obtaining a prediction result corresponding to each sub-feature map according to a preset algorithm, and determining the position of the pedestrian in the candidate region image according to the prediction result. The pedestrian tracking method provided by the embodiment of the invention can improve the accuracy of pedestrian tracking.

Description

Pedestrian tracking method and device

Technical Field

The embodiment of the invention relates to the technical field of target tracking, in particular to a pedestrian tracking method and device.

Background

In recent years, artificial intelligence taking machine learning as a leading factor is rapidly developed, and great breakthrough is made in many fields. Computer vision is a very important research hotspot and difficulty in the field of artificial intelligence, and detection of targets in the field of computer vision is a research focus, wherein the most active research direction is about tracking of pedestrians. Pedestrian tracking is to estimate the position and motion parameters of a pedestrian given the initial position of the pedestrian in the first frame of image in the sequence of images. In real life, pedestrian tracking is widely used in many fields. Such as the video surveillance field, the intelligent robot field, the automobile assistant driving field and the automatic driving field, wherein the pedestrian tracking is an indispensable technology. On the intelligent road of society step forward, pedestrian's pursuit can play bigger and bigger effect.

However, due to different postures of the human body, the clothes are diversified, the image is easily influenced by illumination and environment, and when some parts of the human body are shielded, the tracking result of the pedestrian is inaccurate. Therefore, how to improve the accuracy and robustness of pedestrian tracking becomes a big problem to be solved urgently.

Disclosure of Invention

To solve the problems in the prior art, embodiments of the present invention provide a pedestrian tracking method and apparatus.

In a first aspect, an embodiment of the present invention provides a pedestrian tracking method, where the method includes:

determining a candidate region in a current frame image according to a region, including pedestrian image characteristics, in a previous frame image corresponding to the current frame image, and acquiring a candidate region image corresponding to the candidate region;

acquiring a depth channel characteristic map corresponding to the candidate region image, and mapping each sub-region image in the candidate region image to the depth channel characteristic map to obtain a sub-characteristic map corresponding to each sub-region image;

and obtaining a prediction result corresponding to each sub-feature map according to a preset algorithm, and determining the position of the pedestrian in the candidate region image according to the prediction result.

In a second aspect, an embodiment of the present invention provides a pedestrian tracking device, including:

the first acquisition module is used for determining a candidate region in a current frame image according to a region, including pedestrian image characteristics, of a previous frame image corresponding to the current frame image, and acquiring a candidate region image corresponding to the candidate region;

a second obtaining module, configured to obtain a depth channel feature map corresponding to the candidate region image, and map each sub-region image in the candidate region image to the depth channel feature map to obtain a sub-feature map corresponding to each sub-region image;

and the positioning module is used for acquiring a prediction result corresponding to each sub-feature map according to a preset algorithm and determining the position of the pedestrian in the candidate region image according to the prediction result.

In a third aspect, an embodiment of the present invention provides a pedestrian tracking device, where the device includes a memory and a processor, where the processor and the memory complete communication with each other through a bus; the memory stores program instructions executable by the processor, the processor invoking the program instructions to enable the pedestrian tracking method described above to be performed.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the pedestrian tracking method described above.

According to the pedestrian tracking method and device provided by the embodiment of the invention, the candidate region is determined, the depth channel characteristic map of the candidate region image corresponding to the candidate region is obtained, the depth channel characteristic is combined with a preset algorithm, the position of a pedestrian in the candidate region image is determined, and the accuracy of pedestrian tracking can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of a pedestrian tracking method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a pedestrian tracking device according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a pedestrian tracking device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a pedestrian tracking method according to an embodiment of the present invention, and as shown in fig. 1, the method includes:

step 11, determining a candidate region in a current frame image according to a region, including a pedestrian image feature, in a previous frame image corresponding to the current frame image, and acquiring a candidate region image corresponding to the candidate region;

step 12, obtaining a depth channel feature map corresponding to the candidate region image, and mapping each sub-region image in the candidate region image to the depth channel feature map to obtain a sub-feature map corresponding to each sub-region image;

and step 13, obtaining a prediction result corresponding to each sub-feature map according to a preset algorithm, and determining the position of the pedestrian in the candidate region image according to the prediction result.

Specifically, the previous frame image is first acquired from the video data, and since the video data is composed of a set of image data having a time series correlation, by determining the previous frame image, it is possible to realize tracking of a pedestrian in the current frame image. In the embodiment of the present invention, the region including the image feature of the pedestrian in the previous frame image may be marked with a rectangular frame, for example, in the previous frame image, so as to determine the state of the pedestrian.

According to the space-time correlation between frames in the video data, the pedestrian appearing in the image of the previous frame appears in the nearby area in the subsequent frame, and the candidate area in the image of the current frame can be obtained by utilizing the characteristic, wherein the candidate area refers to the area in which the pedestrian is likely to appear in the image of the current frame. Since the current frame image has a spatio-temporal correlation with the previous frame image, a region near a region including a feature of a pedestrian image in the previous frame image can be taken as a candidate region in the current frame image. And then, acquiring a corresponding candidate region image according to the candidate region. Since the candidate region is usually much smaller than the whole image, the complexity of the calculation can be reduced and the calculation speed can be increased.

Then, the candidate area image is input into a convolutional neural network, a depth channel feature map corresponding to the candidate area image is obtained, and the depth channel feature map obtained by processing the convolutional neural network maintains the spatial characteristics of each position in the candidate area image, so that the spatial coordinate information of the candidate area image can be calculated by using a mapping method, and the corresponding spatial coordinate information in the depth channel feature map is obtained. By adopting the method, each sub-region image in the candidate region image can be mapped to the depth channel feature map, and the sub-feature map corresponding to each sub-region image is obtained.

Next, obtaining a prediction result corresponding to each sub-feature map according to a preset algorithm, specifically, inputting all sub-feature maps into an online boosting method, where each sub-feature map has a corresponding prediction result, screening all prediction results to obtain a target prediction result, where position information corresponding to the target prediction result is a position where a pedestrian is most likely to appear, and obtaining corresponding position information according to the target prediction result, where the position information is a position of the pedestrian in the candidate area image.

According to the pedestrian tracking method provided by the embodiment of the invention, the candidate area is determined, the depth channel characteristic map of the candidate area image corresponding to the candidate area is obtained, the depth channel characteristic is combined with the preset algorithm, the position of the pedestrian in the candidate area image is determined, and the accuracy of pedestrian tracking can be improved.

Optionally, on the basis of the above embodiment, the obtaining of the depth channel feature map corresponding to the candidate region image specifically includes:

calculating a first depth feature map of the candidate region image;

extracting channel features of the candidate region images, and calculating a second depth feature map of the channel features;

and generating a depth channel feature map of the candidate region image according to the first depth feature map and the second depth feature map.

Specifically, the obtaining of the depth channel feature map corresponding to the candidate region image mentioned in the above embodiment specifically includes:

firstly, a first depth feature map of the candidate area image is calculated by adopting a depth convolutional neural network, a four-layer depth convolutional neural network structure is adopted in the embodiment of the invention, the first layer and the second layer both comprise two convolutional layers and a maximum pooling layer, the third layer comprises three convolutional layers and a maximum pooling layer, the fourth layer only comprises three convolutional layers, and the first depth feature map of the candidate area image is directly output after the fourth layer is convolved. Wherein, the size of each convolution layer convolution kernel is 3, the step length is 1, the size of the maximum pooling layer convolution kernel is 2, and the step length is 2. The number of output channels of the first layer is 64, the number of output channels of the second layer is 128, the number of output channels of the third layer is 256, the number of output channels of the fourth layer is 512, and finally a first depth feature map of 512 channels is output, wherein the size of the first depth feature map is 1/8 of the candidate area image.

Secondly, extracting the channel characteristics of the candidate region image, calculating a second depth characteristic map of the channel characteristics, specifically, preprocessing the input candidate region image, performing a series of linear and nonlinear transformation on the candidate region image in a preprocessing stage, wherein each transformation can obtain a channel, for example, an LUV channel, a gradient intensity channel, a gradient histogram channel and the like are very effective channels; then, inputting the channel features into a three-layer convolutional neural network, wherein each layer comprises two convolutional layers and a maximum pooling layer, the convolutional kernel size of each convolutional layer is 3, the step size is 1, the maximum pooling layer convolutional kernel size is 2, the step size is 2, the number of output channels of the first layer is 64, the number of output channels of the second layer is 128, the number of output channels of the third layer is 25, and finally outputting a second depth feature map of 256 channels, and the size of the second depth feature map is 1/8 of the channel features.

Finally, the depth characteristic graph of the candidate area image and the depth characteristic graph of the channel characteristic are connected in series to obtain the depth channel characteristic graph of the candidate area image

According to the pedestrian tracking method provided by the embodiment of the invention, the first depth feature map of the candidate area image and the second depth feature map of the candidate area image channel feature are calculated, and the depth channel feature map of the candidate area image is generated according to the first depth feature map and the second depth feature map, so that the pedestrian tracking method is more scientific and reasonable.

Optionally, on the basis of the foregoing embodiments, the determining, according to the prediction result, the position of the pedestrian in the candidate region image specifically includes:

obtaining a first prediction result which is larger than a preset threshold value from the prediction results;

and obtaining a second prediction result according to a non-maximum value suppression method and the first prediction result, and determining the position information corresponding to the second prediction result as the position of the pedestrian in the candidate region image.

Specifically, the determining the position of the pedestrian in the candidate region image according to the prediction result mentioned in the above embodiment specifically includes:

first, a first prediction result larger than a preset threshold value is obtained from the prediction results. And inputting the sub-feature maps into an online boosting method, wherein each sub-feature map can obtain a corresponding prediction result, and the value of the prediction result represents the probability of the pedestrian appearing in the sub-region corresponding to the sub-feature map. In order to improve the accuracy of the pedestrian tracking result, a threshold value is preset, and a first prediction result which is larger than the threshold value is screened out from all prediction results.

In order to obtain a more accurate pedestrian position, the embodiment of the invention adopts a non-maximum value suppression method to calculate the first prediction result to obtain a second prediction result, and determines the position information corresponding to the second prediction result as the position of the pedestrian in the candidate area image.

The online boosting tracking algorithm regards the tracking problem as a classification problem of a target and a background, and can utilize the characteristic of strong self-adaptive selection distinction of the online boosting tracking algorithm in the tracking process to change a classifier in a self-adaptive manner according to the change of the target to track. The final on-line boosting tracker consists of N bands alpha_iSelector h of_i(x) Form a strong classifier H (x),

the strong classifier is used to detect the position of the target in the next frame. Each selector corresponds to a feature pool, each selector is composed of M weak classifiers, and each weak classifier corresponds to a feature. When a new video frame arrives, each selector updates all weak classifiers, selects the weak classifier with the minimum error, and linearly accumulates to form a strong classifier.

And after a prediction result is obtained, selecting some positive and negative samples and sending the positive and negative samples into a classifier for training, selecting partial sub-feature graphs which are higher than a threshold value and partial sub-feature graphs which are lower than the threshold value and putting the partial sub-feature graphs into an online boosting algorithm for training, and updating all weak classifiers. The main idea of the online boosting target tracking algorithm is to introduce a selector. The selector is randomly initialized while the selector assigns a number of weak classifiers. The weak classifier of each selector updates the learning when a new training sample arrives. Each weak classifier classifies the sample and the selector selects the weak classifier with the best classification.

If the online boosting algorithm is used for not obtaining the first prediction result exceeding the threshold value in all the obtained sub-feature graphs, the initial online boosting classifier which is not updated is used for judging all the sub-feature graphs again, if the sub-feature graphs exceeding the threshold value exist, the previous process is repeated, the optimal result is obtained, and the weak classifier is updated. And if the sub-feature map exceeding the threshold value does not exist, the pedestrian tracking is finished.

According to the pedestrian tracking method provided by the embodiment of the invention, the first prediction result which is larger than the preset threshold value is obtained from the prediction results, the second prediction result is obtained according to the non-maximum value inhibition method and the first prediction result, and the position information corresponding to the second prediction result is determined as the position of the pedestrian in the candidate area image, so that the accuracy of the result of the pedestrian tracking can be improved.

Optionally, on the basis of the foregoing embodiments, the sub-region images are obtained through a sliding window in the candidate region image.

Specifically, in the candidate region, each sub-region including the pedestrian image features is obtained by using a sliding window, and the sub-region image corresponding to each sub-region is obtained according to the sub-region, so that the pedestrian tracking method provided by the embodiment of the invention is more scientific.

Optionally, on the basis of the foregoing embodiments, the method further includes:

and marking the position of the pedestrian in the candidate region image by using a preset symbol.

Specifically, after the position of the pedestrian in the candidate area image is determined, the position of the pedestrian in the candidate area image may be marked using a preset symbol, such as a rectangular bounding box, in such a way that the position of the pedestrian can be made more intuitive.

Fig. 2 is a schematic structural diagram of a pedestrian tracking device according to an embodiment of the present invention, and as shown in fig. 2, the device includes: a first obtaining module 21, a second obtaining module 22 and a positioning module 23, wherein:

the first obtaining module 21 is configured to determine a candidate region in a current frame image according to a region, which includes a pedestrian image feature, in a previous frame image corresponding to the current frame image, and obtain a candidate region image corresponding to the candidate region; the second obtaining module 22 is configured to obtain a depth channel feature map corresponding to the candidate region image, and map each sub-region image in the candidate region image into the depth channel feature map to obtain a sub-feature map corresponding to each sub-region image; the positioning module 23 is configured to obtain a prediction result corresponding to each sub-feature map according to a preset algorithm, and determine a position of the pedestrian in the candidate region image according to the prediction result.

Specifically, the first obtaining module 21 obtains a region including a pedestrian image feature in a previous frame image corresponding to a current frame image, determines a candidate region of a pedestrian in the current frame image according to a spatiotemporal correlation between an image frame and a frame in video data, and obtains a candidate region image corresponding to the candidate region; the second obtaining module 22 puts the candidate region image obtained by the first obtaining module 21 into a convolutional neural network, obtains a depth channel feature map corresponding to the candidate region image, calculates spatial coordinate information of the candidate region image in the depth channel feature map by using the spatial coordinate information of the candidate region image, and maps each sub-region image in the candidate region image onto the depth channel feature map to obtain a sub-feature map corresponding to each sub-region image; the positioning module 23 obtains the prediction result corresponding to each sub-feature map according to a preset algorithm, where the preset algorithm specifically is: inputting all the sub-feature maps into an online boosting method, calculating each sub-feature map by the online boosting method to obtain a corresponding prediction result, screening all the prediction results to obtain a target prediction result, and obtaining corresponding position information according to the target prediction result, wherein the position information is the position of the pedestrian in the candidate area image.

The functions of the pedestrian tracking device provided by the embodiment of the invention specifically refer to the method embodiment, and are not described herein again.

According to the pedestrian tracking device provided by the embodiment of the invention, the candidate area is determined, the depth channel characteristic map of the candidate area image corresponding to the candidate area is obtained, the depth channel characteristic is combined with the preset algorithm, the position of the pedestrian in the candidate area image is determined, and the accuracy of pedestrian tracking can be improved.

Optionally, on the basis of the above embodiment, the second obtaining module is specifically configured to:

calculating a first depth feature map of the candidate region image; extracting channel features of the candidate region images, and calculating a second depth feature map of the channel features; and generating a depth channel feature map of the candidate region image according to the first depth feature map and the second depth feature map.

Specifically, the second obtaining module mentioned in the above embodiment may use a depth convolutional neural network to calculate a first depth feature map of the candidate region image, for example, may use a four-layer depth convolutional neural network structure to output a first depth feature map of 512 channels, where the size of the first depth feature map is 1/8 of the candidate region image; extracting the channel feature of the candidate region image, calculating a second depth feature map of the channel feature, specifically, preprocessing the channel feature aiming at the input candidate region image, and performing a series of linear and nonlinear transformation on the candidate region image in a preprocessing stage, wherein each transformation can obtain a channel; inputting the extracted channel features into a three-layer convolutional neural network, and finally outputting a 256-channel second depth feature map, wherein the size of the second depth feature map is 1/8 of the channel features; connecting the depth characteristic map of the candidate area image and the depth characteristic map of the channel characteristic in series to obtain the depth channel characteristic map of the candidate area image

According to the pedestrian tracking device provided by the embodiment of the invention, the depth channel feature map of the candidate area image is generated by calculating the first depth feature map of the candidate area image and the second depth feature map of the candidate area image channel feature according to the first depth feature map and the second depth feature map, so that the pedestrian tracking is more scientific and more reasonable.

Optionally, on the basis of the foregoing embodiments, the positioning module is specifically configured to:

obtaining a first prediction result which is larger than a preset threshold value from the prediction results; and obtaining a second prediction result according to a non-maximum value suppression method and the first prediction result, and determining the position information corresponding to the second prediction result as the position of the pedestrian in the candidate region image.

Specifically, the positioning module mentioned in the foregoing embodiment may be configured to obtain a first prediction result that is greater than a preset threshold from the prediction results, input the sub-feature maps mentioned in the foregoing embodiment of the method into an online boosting method, where each sub-feature map may obtain a corresponding prediction result, preset a threshold, and screen out the first prediction result that is greater than the threshold from all the prediction results; and calculating the first prediction result by adopting a non-maximum value inhibition method to obtain a second prediction result, and determining the position information corresponding to the second prediction result as the position of the pedestrian in the candidate area image.

According to the pedestrian tracking device provided by the embodiment of the invention, the first prediction result which is larger than the preset threshold value is obtained from the prediction results, the second prediction result is obtained according to the non-maximum value inhibition method and the first prediction result, and the position information corresponding to the second prediction result is determined as the position of the pedestrian in the candidate area image, so that the accuracy of pedestrian tracking can be improved.

Fig. 3 is a schematic structural diagram of a pedestrian tracking device according to an embodiment of the present invention, and as shown in fig. 3, the chinese word segmentation device includes: a processor (processor)31, a memory (memory)32, and a bus 33, wherein:

the processor 31 and the memory 32 complete communication with each other through the bus 33; the processor 31 is configured to call program instructions in the memory 32 to perform the methods provided by the above-mentioned method embodiments, for example, including: determining a candidate region in a current frame image according to a region, including pedestrian image characteristics, in a previous frame image corresponding to the current frame image, and acquiring a candidate region image corresponding to the candidate region; acquiring a depth channel characteristic map corresponding to the candidate region image, and mapping each sub-region image in the candidate region image to the depth channel characteristic map to obtain a sub-characteristic map corresponding to each sub-region image; and obtaining a prediction result corresponding to each sub-feature map according to a preset algorithm, and determining the position of the pedestrian in the candidate region image according to the prediction result.

An embodiment of the present invention discloses a computer program product, which includes a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer can execute the methods provided by the above method embodiments, for example, the method includes: determining a candidate region in a current frame image according to a region, including pedestrian image characteristics, in a previous frame image corresponding to the current frame image, and acquiring a candidate region image corresponding to the candidate region; acquiring a depth channel characteristic map corresponding to the candidate region image, and mapping each sub-region image in the candidate region image to the depth channel characteristic map to obtain a sub-characteristic map corresponding to each sub-region image; and obtaining a prediction result corresponding to each sub-feature map according to a preset algorithm, and determining the position of the pedestrian in the candidate region image according to the prediction result.

Embodiments of the present invention provide a non-transitory computer-readable storage medium, which stores computer instructions, where the computer instructions cause the computer to perform the methods provided by the above method embodiments, for example, the methods include: determining a candidate region in a current frame image according to a region, including pedestrian image characteristics, in a previous frame image corresponding to the current frame image, and acquiring a candidate region image corresponding to the candidate region; acquiring a depth channel characteristic map corresponding to the candidate region image, and mapping each sub-region image in the candidate region image to the depth channel characteristic map to obtain a sub-characteristic map corresponding to each sub-region image; and obtaining a prediction result corresponding to each sub-feature map according to a preset algorithm, and determining the position of the pedestrian in the candidate region image according to the prediction result.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The above-described embodiments of the pedestrian tracking apparatus and the like are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the embodiments of the present invention, and are not limited thereto; although embodiments of the present invention have been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A pedestrian tracking method, comprising:

obtaining a prediction result corresponding to each sub-feature map according to a preset algorithm, and determining the position of the pedestrian in the candidate region image according to the prediction result;

the obtaining of the depth channel feature map corresponding to the candidate region image specifically includes:

calculating a first depth feature map of the candidate region image;

extracting the channel characteristics of the candidate region image, and calculating a second depth characteristic map of the channel characteristics, wherein the channel characteristics are obtained by performing linear and nonlinear transformation on the candidate region image, and each transformation obtains one channel;

2. The method according to claim 1, wherein the determining the position of the pedestrian in the candidate region image according to the prediction result comprises:

3. The method of claim 1, wherein each sub-region image is acquired through a sliding window in the candidate region image.

4. The method of claim 1, further comprising:

5. A pedestrian tracking device, comprising:

the positioning module is used for acquiring a prediction result corresponding to each sub-feature map according to a preset algorithm and determining the position of the pedestrian in the candidate area image according to the prediction result;

the second obtaining module is specifically configured to:

calculating a first depth feature map of the candidate region image; extracting the channel characteristics of the candidate region image, and calculating a second depth characteristic map of the channel characteristics, wherein the channel characteristics are obtained by performing linear and nonlinear transformation on the candidate region image, and each transformation obtains one channel; and generating a depth channel feature map of the candidate region image according to the first depth feature map and the second depth feature map.

6. The apparatus of claim 5, wherein the positioning module is specifically configured to:

7. The pedestrian tracking device is characterized by comprising a memory and a processor, wherein the processor and the memory are communicated with each other through a bus; the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 4.

8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.