WO2017059576A1 - Apparatus and method for pedestrian detection - Google Patents

Apparatus and method for pedestrian detection Download PDF

Info

Publication number
WO2017059576A1
WO2017059576A1 PCT/CN2015/091517 CN2015091517W WO2017059576A1 WO 2017059576 A1 WO2017059576 A1 WO 2017059576A1 CN 2015091517 W CN2015091517 W CN 2015091517W WO 2017059576 A1 WO2017059576 A1 WO 2017059576A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
testing
patches
detectors
generating
Prior art date
Application number
PCT/CN2015/091517
Other languages
French (fr)
Inventor
Xiaoou Tang
Yonglong TIAN
Ping Luo
Xiaogang Wang
Original Assignee
Beijing Sensetime Technology Development Co., Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co., Ltd filed Critical Beijing Sensetime Technology Development Co., Ltd
Priority to PCT/CN2015/091517 priority Critical patent/WO2017059576A1/en
Priority to CN201610876667.7A priority patent/CN106570453B/en
Publication of WO2017059576A1 publication Critical patent/WO2017059576A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Definitions

  • the present application generally relates to a field of pedestrian detection, more particularly, to an apparatus and a method for pedestrian detection.
  • Pedestrian detection has numerous applications in video surveillance, robotics and automotive safety. It has been studied extensively in recent years. While pedestrian detection quality has achieved steady improvements over the last several years, occlusion is still an obstacle for constructing a good pedestrian detector. For example, the current best performing detector SpatialPooling+ attains 75%average miss rate reduction over the VJ detector on no occlusion level, while only attaining 21%over VJ on heavy occlusion level1. Occlusion is frequent, i.e. around 70%of all pedestrians in street scenes is occluded in at least one frame. Current pedestrian detectors for occlusion handling can be generally grouped into two categories, training specific detectors for different occlusion types and modeling part visibility as latent variables.
  • an apparatus for pedestrian detection comprises: a first box generator for generating candidate boxes from a plurality of pedestrian training images; a training patch generator for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training unit for training one or more part detectors from the generated training part patches; a detector selecting unit for selecting complementary part detectors from all the trained part detectors; a second box generator for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generator for generating testing part patches from the candidate boxes generated by the second box generator; and a testing unit for generating detection result from the testing part patches and the selected part detectors.
  • a method for pedestrian detection comprises: generating candidate boxes from a plurality of pedestrian training images; generating training part patches from the candidate boxes generated from the plurality of pedestrian training images and ground truth boxes; training one or more part detectors from the generated training part patches; selecting complementary part detectors from all the trained part detectors; generating candidate boxes from a plurality of pedestrian testing images; generating testing part patches from the candidate boxes generated from the plurality of pedestrian testing images; and generating detection result from the testing part patches and the selected part detectors.
  • a system for pedestrian detection comprises: a memory that stores executable components; and a processor electrically coupled to the memory that executes the executable components to perform operations of the system, wherein the executable components comprise: a first box generating component configured for generating candidate boxes from a plurality of pedestrian training images; a training patch generating component configured for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training component configured for training one or more part detectors from the generated training part patches; a detector selecting component configured for selecting complementary part detectors from all the trained part detectors; a second box generating component configured for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generating component configured for generating testing part patches from the candidate boxes generated by the second box generator; and a testing component configured for generating a detection result from the testing part patches and the selected part detectors.
  • the executable components comprise: a first box generating component configured for generating candidate boxes from a plurality of pedestrian training images; a training patch generating component configured
  • the present invention has following characteristics:
  • Fig. 1 is a schematic diagram illustrating a system for pedestrian detection according to an embodiment of the present application.
  • Fig. 2 is a schematic diagram illustrating a training patch generator according to an embodiment of the present application.
  • Fig. 3 is an illustration of the training part patches according to an embodiment of the present application.
  • Fig. 4. is an example of generating training data for each part detector.
  • Fig. 5. is a schematic diagram illustrating a detector training unit according to another embodiment of the present application.
  • Fig. 6a shows how rapidly IoU will decrease on with little shifting on horizontal and vertical orientation.
  • Fig. 6b shows how to handle shifting problem in AlexNet.
  • Fig. 7. is a schematic diagram illustrating a detector selecting unit according to an embodiment of the present application.
  • Fig. 8 is an example of the selected parts and their weights.
  • Fig. 9. is a schematic diagram illustrating a testing unit according to an embodiment of the present application.
  • Fig. 10 is a schematic flowchart illustrating a method for pedestrian detection according to an embodiment of the present application.
  • Fig. 11 is a schematic flowchart illustrating a process for generating training part patches according to an embodiment of the present application.
  • Fig. 12 is a schematic flowchart illustrating a process for training part detectors according to an embodiment of the present application.
  • Fig. 13 is a schematic flowchart illustrating a process for selecting complementary part detectors according to an embodiment of the present application.
  • Fig. 14 is a schematic flowchart illustrating a process for generating detection result according to an embodiment of the present application.
  • Fig. 15 illustrates a system for pedestrian detection according to an embodiment of the present application.
  • FIG. 1 is a schematic diagram illustrating an exemplary apparatus 1000 for pedestrian detection with some disclosed embodiments.
  • the apparatus 1000 may be implemented using certain hardware, software, or a combination thereof.
  • the embodiments of the present invention may be adapted to a computer program product embodied on one or more computer readable storage media (comprising but not limited to disk storage, CD-ROM, optical memory and the like) containing computer program codes.
  • the apparatus 1000 can be run in one or more system that may include a general purpose computer, a computer cluster, a mainstream computer, a computing device dedicated for providing online contents, or a computer network comprising a group of computers operating in a centralized or distributed fashion.
  • the apparatus 1000 may comprise a first box generator 100, a training patch generator 200, a detector training unit 300, a detector selecting unit 400, a second box generator 500, a testing patch generator 600, and a testing unit 700.
  • the first box generator 100 may be configured to generate candidate boxes from a plurality of pedestrian training images. Particularly, most pedestrian patches are kept and most negative patches are filtered out simultaneously.
  • the target prediction generator 200 may be configured to generate training part patches from the candidate boxes generated by the first box generator 100 and ground truth boxes. Particularly, extensive part patches are extracted for each candidate box, such as leg, head and upper body.
  • the detector training unit 300 may be configured to train one or more part detectors from the training part patches.
  • the detector selecting unit 400 may be configured to select complementary part detectors from all the trained part detectors.
  • the output of the detector selecting unit 400 may be a combination of selected complementary part detectors.
  • Each of the complementary part detectors may be selected based on the weight thereof in the support vector machine (SVM) .
  • the complementary part detectors may be those having the largest weight in the SVM.
  • the second box generator 500 may be configured to generating candidate boxes from a plurality of pedestrian testing images.
  • the testing patch generator 600 may be configured to generate testing part patches from the candidate boxes generated by the second box generator 500.
  • the testing unit 700 may be configured to generate a detection result such as a confidence score from the testing part patches and the selected part detectors.
  • occlusion has various patterns.
  • the left or right half body part may be occluded by a tree, and the lower half body part also may be occluded by car.
  • a part pool which contains various semantic body parts may be extensively constructed.
  • pedestrian can be considered as a rigid object with a 2m ⁇ m grid, where 2m and m indicate the numbers of grids in horizontal and vertical dimension, respectively.
  • Each grid is square and has equal size.
  • the grid is defined as the minimum unit, and each part prototype is constrained to be rectangle.
  • the sizes for part prototypes are defined as
  • wand h indicate the width and height of a part prototype, in terms of grids.
  • W min and H rnin are used to avoid over-local part since we focus on middle-level semantic part.
  • x andy are the coordinates of the top left grid in the part prototype and i is a unique id.
  • the first or second box generator 100 or 500 utilize static images such as training or testing images as inputs and employ a pedestrian detector to detect the pedestrians in these images.
  • a region proposal method such as “selective search” , “Edgebox” , and “LDCF” may be used to generate candidate bounding boxes.
  • the size of training or testing dataset is crucial for deep models, i.e., ConvNet.
  • ConvNet deep models
  • Caltech dataset which is now the largest pedestrian benchmark that consists of ⁇ 250k labeled frames and ⁇ 350k an-notated bounding boxes.
  • typical Reasonable training setting which uses every 30th image in the video and is composed of ⁇ 1.7k pedestrians, we utilize every frame and employ ⁇ 50k pedestrian bounding boxes as positive training patches.
  • Negative patches have ⁇ 0.5 IoU with any ground truth and are proposed by LDCF.
  • the training patch generator 200 further comprises a labeling module 201 for labeling the candidate boxes as negative or positive candidate boxes by comparison with the ground truth boxes, and an extracting module 202 for extracting negative and positive training part patches from the negative and positive candidate boxes for each body part such as leg, head, and upper body.
  • Fig. 3 is an illustration of the training part patches, namely the output of the generator 200.
  • Fig. 4 is an example of generating training data for each part detector.
  • (1) Given a part prototype, the corresponding region within a negative pedestrian proposal is used as a negative sample for this part detector. This assumption is owing to the fact that most of upright pedestrians are well aligned and that the corresponding regions in negative and positive pedestrian patches should be different. For example, if a head-shoulder part occupies the upper one third region of a negative proposal, this proposal should be regarded a positive pedestrian patch according to prior knowledge.
  • Each pedestrian is annotated with two BBs that denote the visible (B vis ) and full (B full ) part. We divide the full part (B full ) into 2m ⁇ m grids and compute the IoU between the visible part (B vis ) and each grid. Then the visible map is obtained by thresholding on the IoU value of each grid. If the visible grids of a ground truth can cover the template grids of a given part prototype, the corresponding region can be extracted as a positive sample.
  • the detector training unit 300 further comprises a mixing module 301 for mixing the positive and negative training part patches and splitting them into batches, a training module 302 for iteratively training each part detector by using the batches of part patches until each of all part detectors converges, and a parameter learning module 303 for learning parameters for handling shifting for each part detector.
  • a mixing module 301 for mixing the positive and negative training part patches and splitting them into batches
  • a training module 302 for iteratively training each part detector by using the batches of part patches until each of all part detectors converges
  • a parameter learning module 303 for learning parameters for handling shifting for each part detector.
  • each body part would shift from its fixed template position, and different parts of the same pedestrian may shift towards different orientations.
  • the positive training samples for each part detector are well aligned while the testing proposals may shift at all orientations. Thus, handling shifting for both the full body and parts is necessary.
  • the input size of our fully convolutional ConvNet can be changed.
  • the original input size of which is 227 ⁇ 227.
  • the fully convolutional AlexNet is able to receive an expanded input size because the convolution and pooling operations are unrelated to input size. Since the step size of receptive field for the classification layer is 32, the expanded input should be (227 + 32n) ⁇ (227 + 32n) in order to keep the forward procedure applicable, where n indicates expanded step size and is a none negative integer.
  • the expanded cropping patch is (X min ’ , Y min ’ , w’ , h’ ) , where
  • P i,j is a penalty term with respect to relative shifting distance from the proposed part box and is defined as
  • a is the single orientation shifting penalty weight
  • b is a geometrical distance penalty weight
  • n 2 for all part prototypes and search the values of a, b for each part prototype by a 6-fold cross validation on training set.
  • the detector selecting unit 400 further comprises a weight learning module 401 for learning combination weights of all part detectors, a selection module 402 for selecting one or more part detectors according to the combination weights, and a relearning module 403 for relearning the combination weights of the selected part detectors.
  • the output of its ConvNet detector may be directly used as the visible score instead of stacking a linear SVM on the top as the RCNN framework. It is found that appending a SVM detector for mining hard negatives does not show significant improvement over directly using the ConvNet output, especially for GoogLeNet. This may due to the fact the training proposals generated by LDCF are already hard negatives. Thus, the SVM training stage is safely removed to save time of feature extraction.
  • a linear SVM is employed to learn complementarity over the 45 part detector scores.
  • 6 parts with highest value of the SVM weight is simply selected, yielding approximate performance. It is also shown that the performance improvement mainly benefits from the part complementarity.
  • Fig. 8 is an illustration of the selected parts and their weights.
  • the testing patch generator 600 further comprises an extracting module for extracting testing part patches from the candidate boxes generated by the second box generator 500 as the generated testing patches for each body part corresponding to the selected part detectors.
  • the testing unit 700 further comprises an evaluation module 701 and a result generation module 702.
  • the evaluation module 701 may be configured to evaluate a score of each body part using the corresponding part detector from the testing part patches, the selected part detectors and the relearned combination weights.
  • the result generation module 702 may be configured to generate a detection score by combining the score of each body part in a weighted manner.
  • Fig. 10 is a schematic flowchart illustrating a method 2000 for pedestrian detection according to an embodiment of the present application.
  • the method 2000 may be described in detail with respect to Fig. 10.
  • candidate boxes are generated from a plurality of pedestrian training images, for example, by employing a region proposal method such as Selective Search, Edgebox, and LDCF.
  • a region proposal method such as Selective Search, Edgebox, and LDCF.
  • training part patches are generating from ground truth boxes and the candidate boxes, which are generated from the plurality of pedestrian training images.
  • the step S220 of training part patches comprises following steps.
  • the candidate boxes are labeling as negative or positive candidate boxes by comparison with the ground truth boxes.
  • negative and positive training part patches are extracted as the training part patches from the negative and positive candidate boxes.
  • step S230 at which part detectors are trained from the training part patches.
  • the step S230 of training part detectors comprises following steps.
  • the positive and negative training part patches are mixed and splitted into batches.
  • each part detector is iteratively trained by using these batches until all part detectors converge respectively.
  • parameters are learned for handling shifting.
  • the step S240 of selecting complementary part detectors comprises a step S241 of learning combination weights of all part detectors, a step S242 of selecting one or more part detectors according to the combination weights, and a step of S243 of relearning the combination weights of the selected part detectors.
  • step S250 at which corresponding candidate boxes are generated from a plurality of pedestrian testing images.
  • step S260 testing part patches are generated from the candidate boxes generated from the plurality of pedestrian testing images.
  • the step S260 of generating testing part patches further comprises extracting testing part patches from the candidate boxes generated from the plurality of pedestrian testing images as the generated testing part patches for each body part corresponding to the selected part detectors.
  • step S270 a detection result is generated from the testing part patches and the selected part detectors.
  • the step S270 of generating detection result comprises following steps.
  • a score of each body part is evaluated using the corresponding part detector from the testing part patches, the selected part detectors and the relearned combination weights.
  • a detection result is generated by combining the score of each body part in a weighted manner.
  • Fig. 15 shows a system 3000 for pedestrian detection.
  • the system 3000 comprises: a memory 310 that stores executable components; and a processor 320 electrically coupled to the memory 310 that executes the executable components to perform operations of the system 3000.
  • the executable components comprise: a first box generating component 311 configured for generating candidate boxes from a plurality of pedestrian training images; a training patch generating component 312 configured for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training component 313 configured for training one or more part detectors from the generated training part patches; a detector selecting component 314 configured for selecting complementary part detectors from all the trained part detectors; a second box generating component 315 configured for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generating component 316 configured for generating testing part patches from the candidate boxes generated by the second box generator; and a testing component 317 configured for generating a detection result from the testing part patches and the selected part detectors.
  • the present application is from “Deep Learning Strong Parts for Pedestrian Detection” , and is intended to addresses the problem of detecting pedestrians in a single image, aiming at constructing a pedestrian detector that can handle occlusion at different levels.
  • the input is a single static image, and the output consists of detected bounding boxes and confidence scores.

Abstract

Disclosed is an apparatus for pedestrian detection. The system comprises: a first box generator for generating candidate boxes from a plurality of pedestrian training images; a training patch generator for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training unit for training part detectors from the training part patches; a detector selecting unit for selecting complementary part detectors from all the trained part detectors; a second box generator for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generator for generating testing part patches from the candidate boxes generated by the second box generator; and a testing unit for generating a detection result from the testing part patches and the selected part detectors. A method and a system for pedestrian detection are also disclosed.

Description

APPARATUS AND METHOD FOR PEDESTRAIN DETECTION Technical Field
The present application generally relates to a field of pedestrian detection, more particularly, to an apparatus and a method for pedestrian detection.
Background
Pedestrian detection has numerous applications in video surveillance, robotics and automotive safety. It has been studied extensively in recent years. While pedestrian detection quality has achieved steady improvements over the last several years, occlusion is still an obstacle for constructing a good pedestrian detector. For example, the current best performing detector SpatialPooling+ attains 75%average miss rate reduction over the VJ detector on no occlusion level, while only attaining 21%over VJ on heavy occlusion level1. Occlusion is frequent, i.e. around 70%of all pedestrians in street scenes is occluded in at least one frame. Current pedestrian detectors for occlusion handling can be generally grouped into two categories, training specific detectors for different occlusion types and modeling part visibility as latent variables. In the first category, constructing specific detector requires the prior knowledge of occlusion types. The second kind of approaches divide pedestrian template into several parts and infer the visibility with latent variables. Though these methods achieve promising results, manually selecting parts may not be the optimal solution and may fail when handling pedestrian detection in other scenarios beyond street, such as crowded scenes and market surveillance, where occlusion types may change. Thus there is a requirement for utilizing extensive part detectors to handle pedestrian occlusion at different levels and thus improving pedestrian detection.
Summary
According to an embodiment of the present application, disclosed is an apparatus for pedestrian detection. The system comprises: a first box generator for generating candidate boxes from a plurality of pedestrian training images; a training patch generator for generating  training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training unit for training one or more part detectors from the generated training part patches; a detector selecting unit for selecting complementary part detectors from all the trained part detectors; a second box generator for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generator for generating testing part patches from the candidate boxes generated by the second box generator; and a testing unit for generating detection result from the testing part patches and the selected part detectors.
According to another embodiment of the present application, disclosed is a method for pedestrian detection. The method comprises: generating candidate boxes from a plurality of pedestrian training images; generating training part patches from the candidate boxes generated from the plurality of pedestrian training images and ground truth boxes; training one or more part detectors from the generated training part patches; selecting complementary part detectors from all the trained part detectors; generating candidate boxes from a plurality of pedestrian testing images; generating testing part patches from the candidate boxes generated from the plurality of pedestrian testing images; and generating detection result from the testing part patches and the selected part detectors.
According to yet another embodiment of the present application, disclosed is a system for pedestrian detection. The system comprises: a memory that stores executable components; and a processor electrically coupled to the memory that executes the executable components to perform operations of the system, wherein the executable components comprise: a first box generating component configured for generating candidate boxes from a plurality of pedestrian training images; a training patch generating component configured for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training component configured for training one or more part detectors from the generated training part patches; a detector selecting component configured for selecting complementary part detectors from all the trained part detectors; a second box generating component configured for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generating component configured for generating testing part patches from the candidate boxes generated by the second box generator; and a  testing component configured for generating a detection result from the testing part patches and the selected part detectors.
The present invention has following characteristics:
1) hard negative reduction -with the assist of deep learning pedestrian attribute and scene attribute tasks, the number of hard negatives is significantly decrease;
2) weakly supervised training –this system can be trained only with weakly labeled data, i.e., the required supervision is pedestrian bounding box instead of strong part annotation such as leg and arm;
3) strong part detectors -each part detector is already a strong detector which is capable of detecting pedestrian by observing only part of a candidate box; and
4) complementary parts selection -since not all part detectors are equally weighted and necessary in different scenarios, the present system can automatically select complementary parts and decide their weights.
Brief Description of the Drawing
Exemplary non-limiting embodiments of the present invention are described below with reference to the attached drawings. The drawings are illustrative and generally not to an exact scale. The same or similar elements on different figures are referenced with the same reference numbers.
Fig. 1 is a schematic diagram illustrating a system for pedestrian detection according to an embodiment of the present application.
Fig. 2 is a schematic diagram illustrating a training patch generator according to an embodiment of the present application.
Fig. 3 is an illustration of the training part patches according to an embodiment of the present application.
Fig. 4. is an example of generating training data for each part detector.
Fig. 5. is a schematic diagram illustrating a detector training unit according to another embodiment of the present application.
Fig. 6a shows how rapidly IoU will decrease on with little shifting on horizontal and  vertical orientation.
Fig. 6b shows how to handle shifting problem in AlexNet.
Fig. 7. is a schematic diagram illustrating a detector selecting unit according to an embodiment of the present application.
Fig. 8 is an example of the selected parts and their weights.
Fig. 9. is a schematic diagram illustrating a testing unit according to an embodiment of the present application.
Fig. 10 is a schematic flowchart illustrating a method for pedestrian detection according to an embodiment of the present application.
Fig. 11 is a schematic flowchart illustrating a process for generating training part patches according to an embodiment of the present application.
Fig. 12 is a schematic flowchart illustrating a process for training part detectors according to an embodiment of the present application.
Fig. 13 is a schematic flowchart illustrating a process for selecting complementary part detectors according to an embodiment of the present application.
Fig. 14 is a schematic flowchart illustrating a process for generating detection result according to an embodiment of the present application.
Fig. 15 illustrates a system for pedestrian detection according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When appropriate, the same reference numbers are used throughout the drawings to refer to the same or like parts. Fig. 1 is a schematic diagram illustrating an exemplary apparatus 1000 for pedestrian detection with some disclosed embodiments.
It shall be appreciated that the apparatus 1000 may be implemented using certain hardware, software, or a combination thereof. In addition, the embodiments of the present invention may be adapted to a computer program product embodied on one or more computer  readable storage media (comprising but not limited to disk storage, CD-ROM, optical memory and the like) containing computer program codes.
In the case that the apparatus 1000 is implemented with software, the apparatus 1000 can be run in one or more system that may include a general purpose computer, a computer cluster, a mainstream computer, a computing device dedicated for providing online contents, or a computer network comprising a group of computers operating in a centralized or distributed fashion.
Referring to Fig. 1 again, where the apparatus 1000 is implemented by the hardware, it may comprise a first box generator 100, a training patch generator 200, a detector training unit 300, a detector selecting unit 400, a second box generator 500, a testing patch generator 600, and a testing unit 700. In the embodiment shown in Fig. 1, the first box generator 100 may be configured to generate candidate boxes from a plurality of pedestrian training images. Particularly, most pedestrian patches are kept and most negative patches are filtered out simultaneously. The target prediction generator 200 may be configured to generate training part patches from the candidate boxes generated by the first box generator 100 and ground truth boxes. Particularly, extensive part patches are extracted for each candidate box, such as leg, head and upper body. The detector training unit 300 may be configured to train one or more part detectors from the training part patches. The detector selecting unit 400 may be configured to select complementary part detectors from all the trained part detectors. The output of the detector selecting unit 400 may be a combination of selected complementary part detectors. Each of the complementary part detectors may be selected based on the weight thereof in the support vector machine (SVM) . In some embodiment, the complementary part detectors may be those having the largest weight in the SVM. The second box generator 500 may be configured to generating candidate boxes from a plurality of pedestrian testing images. The testing patch generator 600 may be configured to generate testing part patches from the candidate boxes generated by the second box generator 500. The testing unit 700 may be configured to generate a detection result such as a confidence score from the testing part patches and the selected part detectors.
Normally, occlusion has various patterns. For instance, the left or right half body part  may be occluded by a tree, and the lower half body part also may be occluded by car. Thus, a part pool which contains various semantic body parts may be extensively constructed.
In some embodiment, pedestrian can be considered as a rigid object with a 2m×m grid, where 2m and m indicate the numbers of grids in horizontal and vertical dimension, respectively. Each grid is square and has equal size. Hereinafter, the grid is defined as the minimum unit, and each part prototype is constrained to be rectangle. The sizes for part prototypes are defined as
Figure PCTCN2015091517-appb-000001
where wand h indicate the width and height of a part prototype, in terms of grids. Wmin and Hrnin are used to avoid over-local part since we focus on middle-level semantic part.
Then, for each (w, h) ∈S, sliding a h×w rectangle over the grid template will generate part prototypes at different positions. The full part pool could be expressed as follows
Figure PCTCN2015091517-appb-000002
where x andy are the coordinates of the top left grid in the part prototype and i is a unique id. Specifically, the full body part prototype is (1, 1, m, 2m, ifull. ) . Setting m with a much larger number will generate an overlarge pool, which would cause too much computation in the training and testing stage. Also, setting too small Wmin or Hmin will result in over-local part prototype, such as Wmin = 0.1 × m.
The first or  second box generator  100 or 500 utilize static images such as training or testing images as inputs and employ a pedestrian detector to detect the pedestrians in these images. For example, a region proposal method such as “selective search” , “Edgebox” , and “LDCF” may be used to generate candidate bounding boxes.
The size of training or testing dataset is crucial for deep models, i.e., ConvNet. For example, when using Caltech dataset, which is now the largest pedestrian benchmark that consists of ~250k labeled frames and ~350k an-notated bounding boxes. Instead of using the typical Reasonable training setting, which uses every 30th image in the video and is  composed of ~1.7k pedestrians, we utilize every frame and employ ~50k pedestrian bounding boxes as positive training patches. Negative patches have < 0.5 IoU with any ground truth and are proposed by LDCF.
As shown in Fig. 2, the training patch generator 200 further comprises a labeling module 201 for labeling the candidate boxes as negative or positive candidate boxes by comparison with the ground truth boxes, and an extracting module 202 for extracting negative and positive training part patches from the negative and positive candidate boxes for each body part such as leg, head, and upper body. Fig. 3 is an illustration of the training part patches, namely the output of the generator 200.
Fig. 4 is an example of generating training data for each part detector. (1) Given a part prototype, the corresponding region within a negative pedestrian proposal is used as a negative sample for this part detector. This assumption is owing to the fact that most of upright pedestrians are well aligned and that the corresponding regions in negative and positive pedestrian patches should be different. For example, if a head-shoulder part occupies the upper one third region of a negative proposal, this proposal should be regarded a positive pedestrian patch according to prior knowledge. (2) Each pedestrian is annotated with two BBs that denote the visible (Bvis) and full (Bfull) part. We divide the full part (Bfull) into 2m × m grids and compute the IoU between the visible part (Bvis) and each grid. Then the visible map is obtained by thresholding on the IoU value of each grid. If the visible grids of a ground truth can cover the template grids of a given part prototype, the corresponding region can be extracted as a positive sample.
As shown in Fig. 5, the detector training unit 300 further comprises a mixing module 301 for mixing the positive and negative training part patches and splitting them into batches, a training module 302 for iteratively training each part detector by using the batches of part patches until each of all part detectors converges, and a parameter learning module 303 for learning parameters for handling shifting for each part detector.
It is known that fine-tuning a pre-trained CNN for ImageNet classification task on object detection and segmentation data can significantly improve the performance. Particularly, the parameters learned at pre-training phase are directly used as initial values for  the fine-tuning stage. Similar strategy can be directly adopted to fine-tune the generic CNN image classification models for part recognition. The main disparity between the pre-training and fine-tuning tasks is the type of input data. Image classification task takes the full image or whole object as input, which contains rich context information while part recognition task can only observe a middle-level part patch. Evaluated deep models include AlexNet, Clarifai, and GoogLeNet, the winning models for ImageNet classification challenge in the past three years. AlexNet and Clarifai have ~60 million parameters and share a similar structure, while GoogLeNet only uses 12x fewer parameters but employs a much deeper structure. The framework in the present invention is flexible to incorporate other generic deep models
In recognition-by-proposal detection scheme, i.e. deep detectors, the location quality of proposals is a key for the recognition stage. Pedestrian detectors or proponents usually suffer from poor location quality. As is known, the best proposal method SpatialPooling+recalls 93%pedestrians with 0.5 IoU threshold while only recalls 10%with 0.9 IoU threshold. Shifting is one of the major reasons that cause low IoU. As is shown in Fig 6a, shifting a ground truth bounding box by 10%on horizontal or vertical orientation results in 0.9 IoU value, which is a proposal with high quality. However, shifting on both orientation leads to 0.68 IoU value, which is less effective for the feature extraction stage and classification stage. Except for the whole body shifting, each body part would shift from its fixed template position, and different parts of the same pedestrian may shift towards different orientations. In our framework, the positive training samples for each part detector are well aligned while the testing proposals may shift at all orientations. Thus, handling shifting for both the full body and parts is necessary.
A straight forward way to handle this problem is that we crop multiple patches around each proposal with jitter, then feed the cropped patches into the deep model and choose the highest or average score as the detection score with penalty. However, this method would increase the testing time by k times, where k is the number of cropped patches for each proposal.
To reduce the testing computation, we firstly reformulate the generic ConvNet model with fully connected layer as a fully convolutional neural network, which does not require to  fix the input size and can process multiple neighboring patches via only one forward pass. Afterwards, the input size of our fully convolutional ConvNet can be changed. Take the AlexNet as an example, the original input size of which is 227 × 227. As illustrated in Tablel, after reformulating fc6, fc7, fc8 as conv6 (1× 1 × 4096) , conv7 (1× 1 × 4096) , conv8 (1× 1 ×2), the fully convolutional AlexNet is able to receive an expanded input size because the convolution and pooling operations are unrelated to input size. Since the step size of receptive field for the classification layer is 32, the expanded input should be (227 + 32n) × (227 + 32n) in order to keep the forward procedure applicable, where n indicates expanded step size and is a none negative integer.
Given a proposed part patch (Xmin, Ymin, w, h) and n, the expanded cropping patch is (Xmin’ , Ymin’ , w’ , h’ ) , where
Figure PCTCN2015091517-appb-000003
Then we resize the patch to (227 + 32n) × (227 + 32n) and feed it into the fully convolutional AlexNet. As a result, (1 + n) × (1 + n) neighboring 227 × 227 patches are explored simultaneously while the expanded scale keeps the same as the proposal scale. The final output of conv8 can be viewed as a (1 + n) × (1 + n) score map S and each score corresponds to a 227 × 227 region. The final score of the part patch is defined as
Figure PCTCN2015091517-appb-000004
where Pi,j is a penalty term with respect to relative shifting distance from the proposed part box and is defined as
where a is the single orientation shifting penalty weight, and b is a geometrical distance penalty weight.
In this implementation, set n = 2 for all part prototypes and search the values of a, b for each part prototype by a 6-fold cross validation on training set. Fig. 6b shows an example of the full body part detector with 9 neighboring patches evaluated, where a = 2 and b = 10. Shifting handling is a kind of context modeling that keeps scale invariant while simply cropping larger region with padding and resizing to 227 × 227 would cause a scale gap between the training and testing stages
As shown in Fig. 7, the detector selecting unit 400 further comprises a weight learning module 401 for learning combination weights of all part detectors, a selection module 402 for selecting one or more part detectors according to the combination weights, and a relearning module 403 for relearning the combination weights of the selected part detectors.
For each part prototype, the output of its ConvNet detector may be directly used as the visible score instead of stacking a linear SVM on the top as the RCNN framework. It is found that appending a SVM detector for mining hard negatives does not show significant improvement over directly using the ConvNet output, especially for GoogLeNet. This may due to the fact the training proposals generated by LDCF are already hard negatives. Thus, the SVM training stage is safely removed to save time of feature extraction.
Then a linear SVM is employed to learn complementarity over the 45 part detector scores. To alleviate the testing computation cost, 6 parts with highest value of the SVM weight is simply selected, yielding approximate performance. It is also shown that the performance improvement mainly benefits from the part complementarity. Fig. 8 is an illustration of the selected parts and their weights.
The testing patch generator 600 further comprises an extracting module for extracting testing part patches from the candidate boxes generated by the second box generator 500 as the generated testing patches for each body part corresponding to the selected part detectors.
As shown in Fig. 9, the testing unit 700 further comprises an evaluation module 701 and a result generation module 702. The evaluation module 701 may be configured to evaluate a score of each body part using the corresponding part detector from the testing part  patches, the selected part detectors and the relearned combination weights. The result generation module 702 may be configured to generate a detection score by combining the score of each body part in a weighted manner.
Fig. 10 is a schematic flowchart illustrating a method 2000 for pedestrian detection according to an embodiment of the present application. Hereinafter, the method 2000 may be described in detail with respect to Fig. 10.
At step S210, candidate boxes are generated from a plurality of pedestrian training images, for example, by employing a region proposal method such as Selective Search, Edgebox, and LDCF.
At step S220, training part patches are generating from ground truth boxes and the candidate boxes, which are generated from the plurality of pedestrian training images.
As shown in Fig. 11, the step S220 of training part patches comprises following steps. To be specific, at step S221, the candidate boxes are labeling as negative or positive candidate boxes by comparison with the ground truth boxes. At step S222, for each body part, negative and positive training part patches are extracted as the training part patches from the negative and positive candidate boxes.
And then the method 2000 proceeds with step S230, at which part detectors are trained from the training part patches.
As shown in Fig. 12, the step S230 of training part detectors comprises following steps. To be specific, at step S231, the positive and negative training part patches are mixed and splitted into batches. At step S232, each part detector is iteratively trained by using these batches until all part detectors converge respectively. At step S233, for each part detector, parameters are learned for handling shifting.
And then the method 2000 proceeds with step S240 of selecting complementary part detectors from all the trained part detectors.
As shown in Fig. 13, the step S240 of selecting complementary part detectors comprises a step S241 of learning combination weights of all part detectors, a step S242 of selecting one or more part detectors according to the combination weights, and a step of S243 of relearning the combination weights of the selected part detectors.
And then the method 2000 proceeds with step S250 at which corresponding candidate boxes are generated from a plurality of pedestrian testing images.
And then the method 2000 proceeds with step S260 at which testing part patches are generated from the candidate boxes generated from the plurality of pedestrian testing images.
The step S260 of generating testing part patches further comprises extracting testing part patches from the candidate boxes generated from the plurality of pedestrian testing images as the generated testing part patches for each body part corresponding to the selected part detectors.
And then the method 2000 proceeds with step S270 at which a detection result is generated from the testing part patches and the selected part detectors.
As shown in Fig. 14, the step S270 of generating detection result comprises following steps. At step S271, a score of each body part is evaluated using the corresponding part detector from the testing part patches, the selected part detectors and the relearned combination weights. At step S272, a detection result is generated by combining the score of each body part in a weighted manner.
Fig. 15 shows a system 3000 for pedestrian detection. The system 3000 comprises: a memory 310 that stores executable components; and a processor 320 electrically coupled to the memory 310 that executes the executable components to perform operations of the system 3000. The executable components comprise: a first box generating component 311 configured for generating candidate boxes from a plurality of pedestrian training images; a training patch generating component 312 configured for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training component 313 configured for training one or more part detectors from the generated training part patches; a detector selecting component 314 configured for selecting complementary part detectors from all the trained part detectors; a second box generating component 315 configured for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generating component 316 configured for generating testing part patches from the candidate boxes generated by the second box generator; and a testing component 317 configured for generating a detection result from the testing part patches and the selected part  detectors.
The present application is from “Deep Learning Strong Parts for Pedestrian Detection” , and is intended to addresses the problem of detecting pedestrians in a single image, aiming at constructing a pedestrian detector that can handle occlusion at different levels. The input is a single static image, and the output consists of detected bounding boxes and confidence scores.
Obviously, those skilled in the art can make variations or modifications to the present invention without departing the spirit and scope of the present invention. As such, if these variations or modifications belong to the scope of the claims and equivalent technique, they may also fall into the scope of the present invention.

Claims (25)

  1. An apparatus for pedestrian detection, comprising:
    a first box generator for generating candidate boxes from a plurality of pedestrian training images;
    a training patch generator for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes;
    a detector training unit for training one or more part detectors from the generated training part patches;
    a detector selecting unit for selecting complementary part detectors from all the trained part detectors;
    a second box generator for generating candidate boxes from a plurality of pedestrian testing images;
    a testing patch generator for generating testing part patches from the candidate boxes generated by the second box generator; and
    a testing unit for generating a detection result from the testing part patches and the selected part detectors.
  2. The apparatus of claim 1, wherein the training patch generator comprises:
    a labeling module configured to label the candidate boxes as negative or positive candidate boxes by comparison with the ground truth boxes; and
    an extracting module configured to extract negative and positive training part patches, as the generated training part patches, from the negative and positive candidate boxes for each body part.
  3. The apparatus of claim 2, wherein the detector training unit comprises:
    a mixing module configured to mix the positive and negative training part patches and split them into batches;
    a training module configured to iteratively train each part detector by using the batches until each of all part detectors converges.
  4. The apparatus m of claim 2, wherein the detector training unit further comprises:
    a parameter learning module configured to learn parameters for handling shifting for each part detector.
  5. The apparatus of claim 3, wherein the detector selecting unit comprises:
    a weight learning module configured to learn combination weights of all part detectors; and
    a selection module configured to select the complementary part detectors according to the combination weights.
  6. The apparatus of claim 5, wherein the detector selecting unit further comprises:
    a relearning module configured to relearn the combination weights of the selected complementary part detectors.
  7. The apparatus of claim 5, wherein the testing patch generator further comprises:
    an extracting module configured to extract testing part patches from the candidate boxes generated by the second box generator as the generated testing patches for each body part corresponding to the selected part detectors.
  8. The apparatus of claim 7, wherein the testing unit further comprises:
    an evaluation module configured to evaluate a score of each body part using the corresponding part detector from the testing part patches, the selected part detectors and the relearned combination weights; and
    a result generation module configured to generate a detection result by combining the score of each body part in a weighted manner.
  9. A method for pedestrian detection, comprising:
    generating candidate boxes from a plurality of pedestrian training images;
    generating training part patches from the candidate boxes generated from the plurality of pedestrian training images and ground truth boxes;
    training one or more part detectors from the training part patches;
    selecting complementary part detectors from all the trained part detectors;
    generating candidate boxes from a plurality of pedestrian testing images;
    generating testing part patches from the candidate boxes generated from the plurality of pedestrian testing images; and
    generating a detection result from the testing part patches and the selected part detectors.
  10. The method of claim 8, wherein the step of generating training part patches comprises:
    labeling the candidate boxes as negative or positive candidate boxes by comparison with the ground truth boxes; and
    extracting negative and positive training part patches, as the generated training part patches, from the negative and positive candidate boxes for each body part.
  11. The method of claim 10, wherein the step of training part detectors comprises:
    mixing the positive and negative training part patches and splitting them into batches; and
    iteratively training each part detector by using the batches until each of all part detectors converges.
  12. The method of claim 11, wherein the step of training part detectors further comprises:
    for each part detector, learning parameters for handling shifting.
  13. The method of claim 11, wherein the step of selecting complementary part detectors comprises:
    learning combination weights of all part detectors; and
    selecting the complementary part detectors according to the combination weights.
  14. The method of claim 13, wherein the step of selecting complementary part detectors further comprises:
    relearning the combination weights of the of the selected complementary part detectors.
  15. The method of claim 13, wherein the step of generating testing part patches comprises:
    for each body part corresponding to the selected part detectors, extracting testing part patches from the candidate boxes generated from the plurality of pedestrian testing images as the generated testing part patches.
  16. The method of claim 15, wherein the step of generating detection result comprises:
    evaluating a score of each body part using the corresponding part detector from the testing part patches, the selected part detectors and the relearned combination weights; and
    generating a detection result by combining the score of each body part in a weighted manner.
  17. A system for pedestrian detection, comprising:
    a memory that stores executable components; and
    a processor electrically coupled to the memory that executes the executable components to perform operations of the system, wherein the executable components comprise:
    a first box generating component configured for generating candidate boxes from a plurality of pedestrian training images;
    a training patch generating component configured for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes;
    a detector training component configured for training one or more part detectors from the generated training part patches;
    a detector selecting component configured for selecting complementary part detectors from all the trained part detectors;
    a second box generating component configured for generating candidate boxes from a plurality of pedestrian testing images;
    a testing patch generating component configured for generating testing part patches from the candidate boxes generated by the second box generator; and
    a testing component configured for generating a detection result from the testing part patches and the selected part detectors.
  18. The system according to claim 17, wherein the training patch generating component further comprises:
    a labeling sub-component configured to label the candidate boxes as negative or positive candidate boxes by comparison with the ground truth boxes; and
    an extracting sub-component configured to extract negative and positive training part patches, as the generated training part patches, from the negative and positive candidate boxes for each body part.
  19. The system according to claim 18, wherein the detector training component further comprises:
    a mixing sub-component e configured to mix the positive and negative training part patches and split them into batches;
    a training sub-component configured to iteratively train each part detector by using the batches until each of all part detectors converges.
  20. The apparatus m of claim 19, wherein the detector training component further  comprises:
    a parameter learning sub-component configured to learn parameters for handling shifting for each part detector.
  21. The system according to claim 19, wherein the detector selecting component further comprises:
    a weight learning sub-component configured to learn combination weights of all part detectors; and
    a selection sub-component configured to select the complementary part detectors according to the combination weights.
  22. The system according to claim 21, wherein the detector selecting component further comprises:
    a relearning sub-component configured to relearn the combination weights of the selected complementary part detectors.
  23. The system according to claim 21, wherein the testing patch generating component further comprises:
    an extracting sub-component configured to extract testing part patches from the candidate boxes generated by the second box generator as the generated testing patches for each body part corresponding to the selected part detectors.
  24. The system according to claim 21, wherein the testing patch generating component further comprises:
    an extracting sub-component configured to extract testing part patches from the candidate boxes generated by the second box generator as the generated testing patches for each body part corresponding to the selected part detectors.
  25. The system according to claim 24, wherein the testing component further comprises:
    an evaluation sub-component configured to evaluate a score of each body part using the corresponding part detector from the testing part patches, the selected part detectors and the relearned combination weights; and
    a result generation sub-component configured to generate a detection result by combining the score of each body part in a weighted manner.
PCT/CN2015/091517 2015-10-09 2015-10-09 Apparatus and method for pedestrian detection WO2017059576A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2015/091517 WO2017059576A1 (en) 2015-10-09 2015-10-09 Apparatus and method for pedestrian detection
CN201610876667.7A CN106570453B (en) 2015-10-09 2016-09-29 Method, device and system for pedestrian detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/091517 WO2017059576A1 (en) 2015-10-09 2015-10-09 Apparatus and method for pedestrian detection

Publications (1)

Publication Number Publication Date
WO2017059576A1 true WO2017059576A1 (en) 2017-04-13

Family

ID=58487177

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/091517 WO2017059576A1 (en) 2015-10-09 2015-10-09 Apparatus and method for pedestrian detection

Country Status (2)

Country Link
CN (1) CN106570453B (en)
WO (1) WO2017059576A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020051545A1 (en) * 2018-09-07 2020-03-12 Alibaba Group Holding Limited Method and computer-readable storage medium for generating training samples for training a target detector
CN111523469A (en) * 2020-04-23 2020-08-11 苏州浪潮智能科技有限公司 Pedestrian re-identification method, system, equipment and computer readable storage medium
US10762334B2 (en) 2017-09-29 2020-09-01 Alibaba Group Holding Limited System and method for entity recognition
CN111914863A (en) * 2019-05-09 2020-11-10 顺丰科技有限公司 Target detection method and device, terminal equipment and computer readable storage medium
US11010838B2 (en) 2018-08-31 2021-05-18 Advanced New Technologies Co., Ltd. System and method for optimizing damage detection results
US11069048B2 (en) 2018-09-07 2021-07-20 Advanced New Technologies Co., Ltd. System and method for facilitating efficient damage assessments
US11080839B2 (en) 2018-08-31 2021-08-03 Advanced New Technologies Co., Ltd. System and method for training a damage identification model
US11113582B2 (en) 2018-08-31 2021-09-07 Advanced New Technologies Co., Ltd. Method and system for facilitating detection and identification of vehicle parts
US11182889B2 (en) 2017-09-29 2021-11-23 Alibaba Group Holding Limited System and method for authenticating physical objects based on captured images
US11216690B2 (en) 2018-08-31 2022-01-04 Alibaba Group Holding Limited System and method for performing image processing based on a damage assessment image judgement model
TWI761642B (en) * 2018-02-01 2022-04-21 開曼群島商創新先進技術有限公司 Method, device and electronic device for determining decision-making strategy corresponding to business
US11475660B2 (en) 2018-08-31 2022-10-18 Advanced New Technologies Co., Ltd. Method and system for facilitating recognition of vehicle parts based on a neural network
US11720572B2 (en) 2018-01-08 2023-08-08 Advanced New Technologies Co., Ltd. Method and system for content recommendation
US11790632B2 (en) 2018-08-24 2023-10-17 Advanced New Technologies Co., Ltd. Method and apparatus for sample labeling, and method and apparatus for identifying damage classification
US11972599B2 (en) 2018-09-04 2024-04-30 Advanced New Technologies Co., Ltd. Method and apparatus for generating vehicle damage image on the basis of GAN network

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11188794B2 (en) 2017-08-10 2021-11-30 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection
CN109697441B (en) 2017-10-23 2021-02-12 杭州海康威视数字技术股份有限公司 Target detection method and device and computer equipment
CN109447276B (en) * 2018-09-17 2021-11-02 烽火通信科技股份有限公司 Machine learning system, equipment and application method
CN109359558B (en) * 2018-09-26 2020-12-25 腾讯科技(深圳)有限公司 Image labeling method, target detection method, device and storage medium
CN110298302B (en) * 2019-06-25 2023-09-08 腾讯科技(深圳)有限公司 Human body target detection method and related equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090097739A1 (en) * 2007-10-10 2009-04-16 Honeywell International Inc. People detection in video and image data
US8131011B2 (en) * 2006-09-25 2012-03-06 University Of Southern California Human detection and tracking system
CN104217225A (en) * 2014-09-02 2014-12-17 中国科学院自动化研究所 A visual target detection and labeling method
US9042601B2 (en) * 2013-03-14 2015-05-26 Nec Laboratories America, Inc. Selective max-pooling for object detection

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102136075B (en) * 2011-03-04 2013-05-15 杭州海康威视数字技术股份有限公司 Multiple-viewing-angle human face detecting method and device thereof under complex scene
EP2574958B1 (en) * 2011-09-28 2017-02-22 Honda Research Institute Europe GmbH Road-terrain detection method and system for driver assistance systems
CN102609682B (en) * 2012-01-13 2014-02-05 北京邮电大学 Feedback pedestrian detection method for region of interest
CN103440487B (en) * 2013-08-27 2016-11-02 电子科技大学 A kind of natural scene text location method of local tone difference

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8131011B2 (en) * 2006-09-25 2012-03-06 University Of Southern California Human detection and tracking system
US20090097739A1 (en) * 2007-10-10 2009-04-16 Honeywell International Inc. People detection in video and image data
US9042601B2 (en) * 2013-03-14 2015-05-26 Nec Laboratories America, Inc. Selective max-pooling for object detection
CN104217225A (en) * 2014-09-02 2014-12-17 中国科学院自动化研究所 A visual target detection and labeling method

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11182889B2 (en) 2017-09-29 2021-11-23 Alibaba Group Holding Limited System and method for authenticating physical objects based on captured images
US10762334B2 (en) 2017-09-29 2020-09-01 Alibaba Group Holding Limited System and method for entity recognition
US11720572B2 (en) 2018-01-08 2023-08-08 Advanced New Technologies Co., Ltd. Method and system for content recommendation
US11978000B2 (en) 2018-02-01 2024-05-07 Advanced New Technologies Co., Ltd. System and method for determining a decision-making strategy
TWI761642B (en) * 2018-02-01 2022-04-21 開曼群島商創新先進技術有限公司 Method, device and electronic device for determining decision-making strategy corresponding to business
US11790632B2 (en) 2018-08-24 2023-10-17 Advanced New Technologies Co., Ltd. Method and apparatus for sample labeling, and method and apparatus for identifying damage classification
US11010838B2 (en) 2018-08-31 2021-05-18 Advanced New Technologies Co., Ltd. System and method for optimizing damage detection results
US11113582B2 (en) 2018-08-31 2021-09-07 Advanced New Technologies Co., Ltd. Method and system for facilitating detection and identification of vehicle parts
US11080839B2 (en) 2018-08-31 2021-08-03 Advanced New Technologies Co., Ltd. System and method for training a damage identification model
US11216690B2 (en) 2018-08-31 2022-01-04 Alibaba Group Holding Limited System and method for performing image processing based on a damage assessment image judgement model
US11475660B2 (en) 2018-08-31 2022-10-18 Advanced New Technologies Co., Ltd. Method and system for facilitating recognition of vehicle parts based on a neural network
US11748399B2 (en) 2018-08-31 2023-09-05 Advanced New Technologies Co., Ltd. System and method for training a damage identification model
US11972599B2 (en) 2018-09-04 2024-04-30 Advanced New Technologies Co., Ltd. Method and apparatus for generating vehicle damage image on the basis of GAN network
US11069048B2 (en) 2018-09-07 2021-07-20 Advanced New Technologies Co., Ltd. System and method for facilitating efficient damage assessments
WO2020051545A1 (en) * 2018-09-07 2020-03-12 Alibaba Group Holding Limited Method and computer-readable storage medium for generating training samples for training a target detector
CN111914863A (en) * 2019-05-09 2020-11-10 顺丰科技有限公司 Target detection method and device, terminal equipment and computer readable storage medium
WO2021212737A1 (en) * 2020-04-23 2021-10-28 苏州浪潮智能科技有限公司 Person re-identification method, system, and device, and computer readable storage medium
CN111523469B (en) * 2020-04-23 2022-02-18 苏州浪潮智能科技有限公司 Pedestrian re-identification method, system, equipment and computer readable storage medium
CN111523469A (en) * 2020-04-23 2020-08-11 苏州浪潮智能科技有限公司 Pedestrian re-identification method, system, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN106570453B (en) 2020-03-03
CN106570453A (en) 2017-04-19

Similar Documents

Publication Publication Date Title
WO2017059576A1 (en) Apparatus and method for pedestrian detection
US9965719B2 (en) Subcategory-aware convolutional neural networks for object detection
Zhang et al. Self-produced guidance for weakly-supervised object localization
JP6188400B2 (en) Image processing apparatus, program, and image processing method
US10860837B2 (en) Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition
US10275719B2 (en) Hyper-parameter selection for deep convolutional networks
Endres et al. Category-independent object proposals with diverse ranking
Vicente et al. Leave-one-out kernel optimization for shadow detection
CN110383291B (en) System, method, and computer-readable medium for understanding machine learning decisions
Wang et al. Probabilistic inference for occluded and multiview on-road vehicle detection
US8965115B1 (en) Adaptive multi-modal detection and fusion in videos via classification-based-learning
CN110084299B (en) Target detection method and device based on multi-head fusion attention
US11803971B2 (en) Generating improved panoptic segmented digital images based on panoptic segmentation neural networks that utilize exemplar unknown object classes
KR102655789B1 (en) Face detecting method and apparatus
WO2016179808A1 (en) An apparatus and a method for face parts and face detection
KR102138680B1 (en) Apparatus for Video Recognition and Method thereof
Abbott et al. Deep object classification in low resolution lwir imagery via transfer learning
Khellal et al. Pedestrian classification and detection in far infrared images
Le et al. Co-localization with category-consistent features and geodesic distance propagation
Juang et al. Stereo-camera-based object detection using fuzzy color histograms and a fuzzy classifier with depth and shape estimations
Smitha et al. Optimal feed forward neural network based automatic moving vehicle detection system in traffic surveillance system
CN112418358A (en) Vehicle multi-attribute classification method for strengthening deep fusion network
Wu et al. Detection algorithm for dense small objects in high altitude image
Lotfi Trajectory clustering and behaviour retrieval from traffic surveillance videos
Anitha An Efficient Region Based Object Detection method using Deep learning Algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15905674

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15905674

Country of ref document: EP

Kind code of ref document: A1