WO2017059576A1 - Apparatus and method for pedestrian detection - Google Patents
Apparatus and method for pedestrian detection Download PDFInfo
- Publication number
- WO2017059576A1 WO2017059576A1 PCT/CN2015/091517 CN2015091517W WO2017059576A1 WO 2017059576 A1 WO2017059576 A1 WO 2017059576A1 CN 2015091517 W CN2015091517 W CN 2015091517W WO 2017059576 A1 WO2017059576 A1 WO 2017059576A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- training
- testing
- patches
- detectors
- generating
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
Definitions
- the present application generally relates to a field of pedestrian detection, more particularly, to an apparatus and a method for pedestrian detection.
- Pedestrian detection has numerous applications in video surveillance, robotics and automotive safety. It has been studied extensively in recent years. While pedestrian detection quality has achieved steady improvements over the last several years, occlusion is still an obstacle for constructing a good pedestrian detector. For example, the current best performing detector SpatialPooling+ attains 75%average miss rate reduction over the VJ detector on no occlusion level, while only attaining 21%over VJ on heavy occlusion level1. Occlusion is frequent, i.e. around 70%of all pedestrians in street scenes is occluded in at least one frame. Current pedestrian detectors for occlusion handling can be generally grouped into two categories, training specific detectors for different occlusion types and modeling part visibility as latent variables.
- an apparatus for pedestrian detection comprises: a first box generator for generating candidate boxes from a plurality of pedestrian training images; a training patch generator for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training unit for training one or more part detectors from the generated training part patches; a detector selecting unit for selecting complementary part detectors from all the trained part detectors; a second box generator for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generator for generating testing part patches from the candidate boxes generated by the second box generator; and a testing unit for generating detection result from the testing part patches and the selected part detectors.
- a method for pedestrian detection comprises: generating candidate boxes from a plurality of pedestrian training images; generating training part patches from the candidate boxes generated from the plurality of pedestrian training images and ground truth boxes; training one or more part detectors from the generated training part patches; selecting complementary part detectors from all the trained part detectors; generating candidate boxes from a plurality of pedestrian testing images; generating testing part patches from the candidate boxes generated from the plurality of pedestrian testing images; and generating detection result from the testing part patches and the selected part detectors.
- a system for pedestrian detection comprises: a memory that stores executable components; and a processor electrically coupled to the memory that executes the executable components to perform operations of the system, wherein the executable components comprise: a first box generating component configured for generating candidate boxes from a plurality of pedestrian training images; a training patch generating component configured for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training component configured for training one or more part detectors from the generated training part patches; a detector selecting component configured for selecting complementary part detectors from all the trained part detectors; a second box generating component configured for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generating component configured for generating testing part patches from the candidate boxes generated by the second box generator; and a testing component configured for generating a detection result from the testing part patches and the selected part detectors.
- the executable components comprise: a first box generating component configured for generating candidate boxes from a plurality of pedestrian training images; a training patch generating component configured
- the present invention has following characteristics:
- Fig. 1 is a schematic diagram illustrating a system for pedestrian detection according to an embodiment of the present application.
- Fig. 2 is a schematic diagram illustrating a training patch generator according to an embodiment of the present application.
- Fig. 3 is an illustration of the training part patches according to an embodiment of the present application.
- Fig. 4. is an example of generating training data for each part detector.
- Fig. 5. is a schematic diagram illustrating a detector training unit according to another embodiment of the present application.
- Fig. 6a shows how rapidly IoU will decrease on with little shifting on horizontal and vertical orientation.
- Fig. 6b shows how to handle shifting problem in AlexNet.
- Fig. 7. is a schematic diagram illustrating a detector selecting unit according to an embodiment of the present application.
- Fig. 8 is an example of the selected parts and their weights.
- Fig. 9. is a schematic diagram illustrating a testing unit according to an embodiment of the present application.
- Fig. 10 is a schematic flowchart illustrating a method for pedestrian detection according to an embodiment of the present application.
- Fig. 11 is a schematic flowchart illustrating a process for generating training part patches according to an embodiment of the present application.
- Fig. 12 is a schematic flowchart illustrating a process for training part detectors according to an embodiment of the present application.
- Fig. 13 is a schematic flowchart illustrating a process for selecting complementary part detectors according to an embodiment of the present application.
- Fig. 14 is a schematic flowchart illustrating a process for generating detection result according to an embodiment of the present application.
- Fig. 15 illustrates a system for pedestrian detection according to an embodiment of the present application.
- FIG. 1 is a schematic diagram illustrating an exemplary apparatus 1000 for pedestrian detection with some disclosed embodiments.
- the apparatus 1000 may be implemented using certain hardware, software, or a combination thereof.
- the embodiments of the present invention may be adapted to a computer program product embodied on one or more computer readable storage media (comprising but not limited to disk storage, CD-ROM, optical memory and the like) containing computer program codes.
- the apparatus 1000 can be run in one or more system that may include a general purpose computer, a computer cluster, a mainstream computer, a computing device dedicated for providing online contents, or a computer network comprising a group of computers operating in a centralized or distributed fashion.
- the apparatus 1000 may comprise a first box generator 100, a training patch generator 200, a detector training unit 300, a detector selecting unit 400, a second box generator 500, a testing patch generator 600, and a testing unit 700.
- the first box generator 100 may be configured to generate candidate boxes from a plurality of pedestrian training images. Particularly, most pedestrian patches are kept and most negative patches are filtered out simultaneously.
- the target prediction generator 200 may be configured to generate training part patches from the candidate boxes generated by the first box generator 100 and ground truth boxes. Particularly, extensive part patches are extracted for each candidate box, such as leg, head and upper body.
- the detector training unit 300 may be configured to train one or more part detectors from the training part patches.
- the detector selecting unit 400 may be configured to select complementary part detectors from all the trained part detectors.
- the output of the detector selecting unit 400 may be a combination of selected complementary part detectors.
- Each of the complementary part detectors may be selected based on the weight thereof in the support vector machine (SVM) .
- the complementary part detectors may be those having the largest weight in the SVM.
- the second box generator 500 may be configured to generating candidate boxes from a plurality of pedestrian testing images.
- the testing patch generator 600 may be configured to generate testing part patches from the candidate boxes generated by the second box generator 500.
- the testing unit 700 may be configured to generate a detection result such as a confidence score from the testing part patches and the selected part detectors.
- occlusion has various patterns.
- the left or right half body part may be occluded by a tree, and the lower half body part also may be occluded by car.
- a part pool which contains various semantic body parts may be extensively constructed.
- pedestrian can be considered as a rigid object with a 2m ⁇ m grid, where 2m and m indicate the numbers of grids in horizontal and vertical dimension, respectively.
- Each grid is square and has equal size.
- the grid is defined as the minimum unit, and each part prototype is constrained to be rectangle.
- the sizes for part prototypes are defined as
- wand h indicate the width and height of a part prototype, in terms of grids.
- W min and H rnin are used to avoid over-local part since we focus on middle-level semantic part.
- x andy are the coordinates of the top left grid in the part prototype and i is a unique id.
- the first or second box generator 100 or 500 utilize static images such as training or testing images as inputs and employ a pedestrian detector to detect the pedestrians in these images.
- a region proposal method such as “selective search” , “Edgebox” , and “LDCF” may be used to generate candidate bounding boxes.
- the size of training or testing dataset is crucial for deep models, i.e., ConvNet.
- ConvNet deep models
- Caltech dataset which is now the largest pedestrian benchmark that consists of ⁇ 250k labeled frames and ⁇ 350k an-notated bounding boxes.
- typical Reasonable training setting which uses every 30th image in the video and is composed of ⁇ 1.7k pedestrians, we utilize every frame and employ ⁇ 50k pedestrian bounding boxes as positive training patches.
- Negative patches have ⁇ 0.5 IoU with any ground truth and are proposed by LDCF.
- the training patch generator 200 further comprises a labeling module 201 for labeling the candidate boxes as negative or positive candidate boxes by comparison with the ground truth boxes, and an extracting module 202 for extracting negative and positive training part patches from the negative and positive candidate boxes for each body part such as leg, head, and upper body.
- Fig. 3 is an illustration of the training part patches, namely the output of the generator 200.
- Fig. 4 is an example of generating training data for each part detector.
- (1) Given a part prototype, the corresponding region within a negative pedestrian proposal is used as a negative sample for this part detector. This assumption is owing to the fact that most of upright pedestrians are well aligned and that the corresponding regions in negative and positive pedestrian patches should be different. For example, if a head-shoulder part occupies the upper one third region of a negative proposal, this proposal should be regarded a positive pedestrian patch according to prior knowledge.
- Each pedestrian is annotated with two BBs that denote the visible (B vis ) and full (B full ) part. We divide the full part (B full ) into 2m ⁇ m grids and compute the IoU between the visible part (B vis ) and each grid. Then the visible map is obtained by thresholding on the IoU value of each grid. If the visible grids of a ground truth can cover the template grids of a given part prototype, the corresponding region can be extracted as a positive sample.
- the detector training unit 300 further comprises a mixing module 301 for mixing the positive and negative training part patches and splitting them into batches, a training module 302 for iteratively training each part detector by using the batches of part patches until each of all part detectors converges, and a parameter learning module 303 for learning parameters for handling shifting for each part detector.
- a mixing module 301 for mixing the positive and negative training part patches and splitting them into batches
- a training module 302 for iteratively training each part detector by using the batches of part patches until each of all part detectors converges
- a parameter learning module 303 for learning parameters for handling shifting for each part detector.
- each body part would shift from its fixed template position, and different parts of the same pedestrian may shift towards different orientations.
- the positive training samples for each part detector are well aligned while the testing proposals may shift at all orientations. Thus, handling shifting for both the full body and parts is necessary.
- the input size of our fully convolutional ConvNet can be changed.
- the original input size of which is 227 ⁇ 227.
- the fully convolutional AlexNet is able to receive an expanded input size because the convolution and pooling operations are unrelated to input size. Since the step size of receptive field for the classification layer is 32, the expanded input should be (227 + 32n) ⁇ (227 + 32n) in order to keep the forward procedure applicable, where n indicates expanded step size and is a none negative integer.
- the expanded cropping patch is (X min ’ , Y min ’ , w’ , h’ ) , where
- P i,j is a penalty term with respect to relative shifting distance from the proposed part box and is defined as
- a is the single orientation shifting penalty weight
- b is a geometrical distance penalty weight
- n 2 for all part prototypes and search the values of a, b for each part prototype by a 6-fold cross validation on training set.
- the detector selecting unit 400 further comprises a weight learning module 401 for learning combination weights of all part detectors, a selection module 402 for selecting one or more part detectors according to the combination weights, and a relearning module 403 for relearning the combination weights of the selected part detectors.
- the output of its ConvNet detector may be directly used as the visible score instead of stacking a linear SVM on the top as the RCNN framework. It is found that appending a SVM detector for mining hard negatives does not show significant improvement over directly using the ConvNet output, especially for GoogLeNet. This may due to the fact the training proposals generated by LDCF are already hard negatives. Thus, the SVM training stage is safely removed to save time of feature extraction.
- a linear SVM is employed to learn complementarity over the 45 part detector scores.
- 6 parts with highest value of the SVM weight is simply selected, yielding approximate performance. It is also shown that the performance improvement mainly benefits from the part complementarity.
- Fig. 8 is an illustration of the selected parts and their weights.
- the testing patch generator 600 further comprises an extracting module for extracting testing part patches from the candidate boxes generated by the second box generator 500 as the generated testing patches for each body part corresponding to the selected part detectors.
- the testing unit 700 further comprises an evaluation module 701 and a result generation module 702.
- the evaluation module 701 may be configured to evaluate a score of each body part using the corresponding part detector from the testing part patches, the selected part detectors and the relearned combination weights.
- the result generation module 702 may be configured to generate a detection score by combining the score of each body part in a weighted manner.
- Fig. 10 is a schematic flowchart illustrating a method 2000 for pedestrian detection according to an embodiment of the present application.
- the method 2000 may be described in detail with respect to Fig. 10.
- candidate boxes are generated from a plurality of pedestrian training images, for example, by employing a region proposal method such as Selective Search, Edgebox, and LDCF.
- a region proposal method such as Selective Search, Edgebox, and LDCF.
- training part patches are generating from ground truth boxes and the candidate boxes, which are generated from the plurality of pedestrian training images.
- the step S220 of training part patches comprises following steps.
- the candidate boxes are labeling as negative or positive candidate boxes by comparison with the ground truth boxes.
- negative and positive training part patches are extracted as the training part patches from the negative and positive candidate boxes.
- step S230 at which part detectors are trained from the training part patches.
- the step S230 of training part detectors comprises following steps.
- the positive and negative training part patches are mixed and splitted into batches.
- each part detector is iteratively trained by using these batches until all part detectors converge respectively.
- parameters are learned for handling shifting.
- the step S240 of selecting complementary part detectors comprises a step S241 of learning combination weights of all part detectors, a step S242 of selecting one or more part detectors according to the combination weights, and a step of S243 of relearning the combination weights of the selected part detectors.
- step S250 at which corresponding candidate boxes are generated from a plurality of pedestrian testing images.
- step S260 testing part patches are generated from the candidate boxes generated from the plurality of pedestrian testing images.
- the step S260 of generating testing part patches further comprises extracting testing part patches from the candidate boxes generated from the plurality of pedestrian testing images as the generated testing part patches for each body part corresponding to the selected part detectors.
- step S270 a detection result is generated from the testing part patches and the selected part detectors.
- the step S270 of generating detection result comprises following steps.
- a score of each body part is evaluated using the corresponding part detector from the testing part patches, the selected part detectors and the relearned combination weights.
- a detection result is generated by combining the score of each body part in a weighted manner.
- Fig. 15 shows a system 3000 for pedestrian detection.
- the system 3000 comprises: a memory 310 that stores executable components; and a processor 320 electrically coupled to the memory 310 that executes the executable components to perform operations of the system 3000.
- the executable components comprise: a first box generating component 311 configured for generating candidate boxes from a plurality of pedestrian training images; a training patch generating component 312 configured for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training component 313 configured for training one or more part detectors from the generated training part patches; a detector selecting component 314 configured for selecting complementary part detectors from all the trained part detectors; a second box generating component 315 configured for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generating component 316 configured for generating testing part patches from the candidate boxes generated by the second box generator; and a testing component 317 configured for generating a detection result from the testing part patches and the selected part detectors.
- the present application is from “Deep Learning Strong Parts for Pedestrian Detection” , and is intended to addresses the problem of detecting pedestrians in a single image, aiming at constructing a pedestrian detector that can handle occlusion at different levels.
- the input is a single static image, and the output consists of detected bounding boxes and confidence scores.
Abstract
Disclosed is an apparatus for pedestrian detection. The system comprises: a first box generator for generating candidate boxes from a plurality of pedestrian training images; a training patch generator for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training unit for training part detectors from the training part patches; a detector selecting unit for selecting complementary part detectors from all the trained part detectors; a second box generator for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generator for generating testing part patches from the candidate boxes generated by the second box generator; and a testing unit for generating a detection result from the testing part patches and the selected part detectors. A method and a system for pedestrian detection are also disclosed.
Description
The present application generally relates to a field of pedestrian detection, more particularly, to an apparatus and a method for pedestrian detection.
Pedestrian detection has numerous applications in video surveillance, robotics and automotive safety. It has been studied extensively in recent years. While pedestrian detection quality has achieved steady improvements over the last several years, occlusion is still an obstacle for constructing a good pedestrian detector. For example, the current best performing detector SpatialPooling+ attains 75%average miss rate reduction over the VJ detector on no occlusion level, while only attaining 21%over VJ on heavy occlusion level1. Occlusion is frequent, i.e. around 70%of all pedestrians in street scenes is occluded in at least one frame. Current pedestrian detectors for occlusion handling can be generally grouped into two categories, training specific detectors for different occlusion types and modeling part visibility as latent variables. In the first category, constructing specific detector requires the prior knowledge of occlusion types. The second kind of approaches divide pedestrian template into several parts and infer the visibility with latent variables. Though these methods achieve promising results, manually selecting parts may not be the optimal solution and may fail when handling pedestrian detection in other scenarios beyond street, such as crowded scenes and market surveillance, where occlusion types may change. Thus there is a requirement for utilizing extensive part detectors to handle pedestrian occlusion at different levels and thus improving pedestrian detection.
Summary
According to an embodiment of the present application, disclosed is an apparatus for pedestrian detection. The system comprises: a first box generator for generating candidate boxes from a plurality of pedestrian training images; a training patch generator for generating
training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training unit for training one or more part detectors from the generated training part patches; a detector selecting unit for selecting complementary part detectors from all the trained part detectors; a second box generator for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generator for generating testing part patches from the candidate boxes generated by the second box generator; and a testing unit for generating detection result from the testing part patches and the selected part detectors.
According to another embodiment of the present application, disclosed is a method for pedestrian detection. The method comprises: generating candidate boxes from a plurality of pedestrian training images; generating training part patches from the candidate boxes generated from the plurality of pedestrian training images and ground truth boxes; training one or more part detectors from the generated training part patches; selecting complementary part detectors from all the trained part detectors; generating candidate boxes from a plurality of pedestrian testing images; generating testing part patches from the candidate boxes generated from the plurality of pedestrian testing images; and generating detection result from the testing part patches and the selected part detectors.
According to yet another embodiment of the present application, disclosed is a system for pedestrian detection. The system comprises: a memory that stores executable components; and a processor electrically coupled to the memory that executes the executable components to perform operations of the system, wherein the executable components comprise: a first box generating component configured for generating candidate boxes from a plurality of pedestrian training images; a training patch generating component configured for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training component configured for training one or more part detectors from the generated training part patches; a detector selecting component configured for selecting complementary part detectors from all the trained part detectors; a second box generating component configured for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generating component configured for generating testing part patches from the candidate boxes generated by the second box generator; and a
testing component configured for generating a detection result from the testing part patches and the selected part detectors.
The present invention has following characteristics:
1) hard negative reduction -with the assist of deep learning pedestrian attribute and scene attribute tasks, the number of hard negatives is significantly decrease;
2) weakly supervised training –this system can be trained only with weakly labeled data, i.e., the required supervision is pedestrian bounding box instead of strong part annotation such as leg and arm;
3) strong part detectors -each part detector is already a strong detector which is capable of detecting pedestrian by observing only part of a candidate box; and
4) complementary parts selection -since not all part detectors are equally weighted and necessary in different scenarios, the present system can automatically select complementary parts and decide their weights.
Exemplary non-limiting embodiments of the present invention are described below with reference to the attached drawings. The drawings are illustrative and generally not to an exact scale. The same or similar elements on different figures are referenced with the same reference numbers.
Fig. 1 is a schematic diagram illustrating a system for pedestrian detection according to an embodiment of the present application.
Fig. 2 is a schematic diagram illustrating a training patch generator according to an embodiment of the present application.
Fig. 3 is an illustration of the training part patches according to an embodiment of the present application.
Fig. 4. is an example of generating training data for each part detector.
Fig. 5. is a schematic diagram illustrating a detector training unit according to another embodiment of the present application.
Fig. 6a shows how rapidly IoU will decrease on with little shifting on horizontal and
vertical orientation.
Fig. 6b shows how to handle shifting problem in AlexNet.
Fig. 7. is a schematic diagram illustrating a detector selecting unit according to an embodiment of the present application.
Fig. 8 is an example of the selected parts and their weights.
Fig. 9. is a schematic diagram illustrating a testing unit according to an embodiment of the present application.
Fig. 10 is a schematic flowchart illustrating a method for pedestrian detection according to an embodiment of the present application.
Fig. 11 is a schematic flowchart illustrating a process for generating training part patches according to an embodiment of the present application.
Fig. 12 is a schematic flowchart illustrating a process for training part detectors according to an embodiment of the present application.
Fig. 13 is a schematic flowchart illustrating a process for selecting complementary part detectors according to an embodiment of the present application.
Fig. 14 is a schematic flowchart illustrating a process for generating detection result according to an embodiment of the present application.
Fig. 15 illustrates a system for pedestrian detection according to an embodiment of the present application.
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When appropriate, the same reference numbers are used throughout the drawings to refer to the same or like parts. Fig. 1 is a schematic diagram illustrating an exemplary apparatus 1000 for pedestrian detection with some disclosed embodiments.
It shall be appreciated that the apparatus 1000 may be implemented using certain hardware, software, or a combination thereof. In addition, the embodiments of the present invention may be adapted to a computer program product embodied on one or more computer
readable storage media (comprising but not limited to disk storage, CD-ROM, optical memory and the like) containing computer program codes.
In the case that the apparatus 1000 is implemented with software, the apparatus 1000 can be run in one or more system that may include a general purpose computer, a computer cluster, a mainstream computer, a computing device dedicated for providing online contents, or a computer network comprising a group of computers operating in a centralized or distributed fashion.
Referring to Fig. 1 again, where the apparatus 1000 is implemented by the hardware, it may comprise a first box generator 100, a training patch generator 200, a detector training unit 300, a detector selecting unit 400, a second box generator 500, a testing patch generator 600, and a testing unit 700. In the embodiment shown in Fig. 1, the first box generator 100 may be configured to generate candidate boxes from a plurality of pedestrian training images. Particularly, most pedestrian patches are kept and most negative patches are filtered out simultaneously. The target prediction generator 200 may be configured to generate training part patches from the candidate boxes generated by the first box generator 100 and ground truth boxes. Particularly, extensive part patches are extracted for each candidate box, such as leg, head and upper body. The detector training unit 300 may be configured to train one or more part detectors from the training part patches. The detector selecting unit 400 may be configured to select complementary part detectors from all the trained part detectors. The output of the detector selecting unit 400 may be a combination of selected complementary part detectors. Each of the complementary part detectors may be selected based on the weight thereof in the support vector machine (SVM) . In some embodiment, the complementary part detectors may be those having the largest weight in the SVM. The second box generator 500 may be configured to generating candidate boxes from a plurality of pedestrian testing images. The testing patch generator 600 may be configured to generate testing part patches from the candidate boxes generated by the second box generator 500. The testing unit 700 may be configured to generate a detection result such as a confidence score from the testing part patches and the selected part detectors.
Normally, occlusion has various patterns. For instance, the left or right half body part
may be occluded by a tree, and the lower half body part also may be occluded by car. Thus, a part pool which contains various semantic body parts may be extensively constructed.
In some embodiment, pedestrian can be considered as a rigid object with a 2m×m grid, where 2m and m indicate the numbers of grids in horizontal and vertical dimension, respectively. Each grid is square and has equal size. Hereinafter, the grid is defined as the minimum unit, and each part prototype is constrained to be rectangle. The sizes for part prototypes are defined as
where wand h indicate the width and height of a part prototype, in terms of grids. Wmin and Hrnin are used to avoid over-local part since we focus on middle-level semantic part.
Then, for each (w, h) ∈S, sliding a h×w rectangle over the grid template will generate part prototypes at different positions. The full part pool could be expressed as follows
where x andy are the coordinates of the top left grid in the part prototype and i is a unique id. Specifically, the full body part prototype is (1, 1, m, 2m, ifull. ) . Setting m with a much larger number will generate an overlarge pool, which would cause too much computation in the training and testing stage. Also, setting too small Wmin or Hmin will result in over-local part prototype, such as Wmin = 0.1 × m.
The first or second box generator 100 or 500 utilize static images such as training or testing images as inputs and employ a pedestrian detector to detect the pedestrians in these images. For example, a region proposal method such as “selective search” , “Edgebox” , and “LDCF” may be used to generate candidate bounding boxes.
The size of training or testing dataset is crucial for deep models, i.e., ConvNet. For example, when using Caltech dataset, which is now the largest pedestrian benchmark that consists of ~250k labeled frames and ~350k an-notated bounding boxes. Instead of using the typical Reasonable training setting, which uses every 30th image in the video and is
composed of ~1.7k pedestrians, we utilize every frame and employ ~50k pedestrian bounding boxes as positive training patches. Negative patches have < 0.5 IoU with any ground truth and are proposed by LDCF.
As shown in Fig. 2, the training patch generator 200 further comprises a labeling module 201 for labeling the candidate boxes as negative or positive candidate boxes by comparison with the ground truth boxes, and an extracting module 202 for extracting negative and positive training part patches from the negative and positive candidate boxes for each body part such as leg, head, and upper body. Fig. 3 is an illustration of the training part patches, namely the output of the generator 200.
Fig. 4 is an example of generating training data for each part detector. (1) Given a part prototype, the corresponding region within a negative pedestrian proposal is used as a negative sample for this part detector. This assumption is owing to the fact that most of upright pedestrians are well aligned and that the corresponding regions in negative and positive pedestrian patches should be different. For example, if a head-shoulder part occupies the upper one third region of a negative proposal, this proposal should be regarded a positive pedestrian patch according to prior knowledge. (2) Each pedestrian is annotated with two BBs that denote the visible (Bvis) and full (Bfull) part. We divide the full part (Bfull) into 2m × m grids and compute the IoU between the visible part (Bvis) and each grid. Then the visible map is obtained by thresholding on the IoU value of each grid. If the visible grids of a ground truth can cover the template grids of a given part prototype, the corresponding region can be extracted as a positive sample.
As shown in Fig. 5, the detector training unit 300 further comprises a mixing module 301 for mixing the positive and negative training part patches and splitting them into batches, a training module 302 for iteratively training each part detector by using the batches of part patches until each of all part detectors converges, and a parameter learning module 303 for learning parameters for handling shifting for each part detector.
It is known that fine-tuning a pre-trained CNN for ImageNet classification task on object detection and segmentation data can significantly improve the performance. Particularly, the parameters learned at pre-training phase are directly used as initial values for
the fine-tuning stage. Similar strategy can be directly adopted to fine-tune the generic CNN image classification models for part recognition. The main disparity between the pre-training and fine-tuning tasks is the type of input data. Image classification task takes the full image or whole object as input, which contains rich context information while part recognition task can only observe a middle-level part patch. Evaluated deep models include AlexNet, Clarifai, and GoogLeNet, the winning models for ImageNet classification challenge in the past three years. AlexNet and Clarifai have ~60 million parameters and share a similar structure, while GoogLeNet only uses 12x fewer parameters but employs a much deeper structure. The framework in the present invention is flexible to incorporate other generic deep models
In recognition-by-proposal detection scheme, i.e. deep detectors, the location quality of proposals is a key for the recognition stage. Pedestrian detectors or proponents usually suffer from poor location quality. As is known, the best proposal method SpatialPooling+recalls 93%pedestrians with 0.5 IoU threshold while only recalls 10%with 0.9 IoU threshold. Shifting is one of the major reasons that cause low IoU. As is shown in Fig 6a, shifting a ground truth bounding box by 10%on horizontal or vertical orientation results in 0.9 IoU value, which is a proposal with high quality. However, shifting on both orientation leads to 0.68 IoU value, which is less effective for the feature extraction stage and classification stage. Except for the whole body shifting, each body part would shift from its fixed template position, and different parts of the same pedestrian may shift towards different orientations. In our framework, the positive training samples for each part detector are well aligned while the testing proposals may shift at all orientations. Thus, handling shifting for both the full body and parts is necessary.
A straight forward way to handle this problem is that we crop multiple patches around each proposal with jitter, then feed the cropped patches into the deep model and choose the highest or average score as the detection score with penalty. However, this method would increase the testing time by k times, where k is the number of cropped patches for each proposal.
To reduce the testing computation, we firstly reformulate the generic ConvNet model with fully connected layer as a fully convolutional neural network, which does not require to
fix the input size and can process multiple neighboring patches via only one forward pass. Afterwards, the input size of our fully convolutional ConvNet can be changed. Take the AlexNet as an example, the original input size of which is 227 × 227. As illustrated in Tablel, after reformulating fc6, fc7, fc8 as conv6 (1× 1 × 4096) , conv7 (1× 1 × 4096) , conv8 (1× 1 ×2), the fully convolutional AlexNet is able to receive an expanded input size because the convolution and pooling operations are unrelated to input size. Since the step size of receptive field for the classification layer is 32, the expanded input should be (227 + 32n) × (227 + 32n) in order to keep the forward procedure applicable, where n indicates expanded step size and is a none negative integer.
Given a proposed part patch (Xmin, Ymin, w, h) and n, the expanded cropping patch is (Xmin’ , Ymin’ , w’ , h’ ) , where
Then we resize the patch to (227 + 32n) × (227 + 32n) and feed it into the fully convolutional AlexNet. As a result, (1 + n) × (1 + n) neighboring 227 × 227 patches are explored simultaneously while the expanded scale keeps the same as the proposal scale. The final output of conv8 can be viewed as a (1 + n) × (1 + n) score map S and each score corresponds to a 227 × 227 region. The final score of the part patch is defined as
where Pi,j is a penalty term with respect to relative shifting distance from the proposed part box and is defined as
where a is the single orientation shifting penalty weight, and b is a geometrical distance penalty weight.
In this implementation, set n = 2 for all part prototypes and search the values of a, b for each part prototype by a 6-fold cross validation on training set. Fig. 6b shows an example of the full body part detector with 9 neighboring patches evaluated, where a = 2 and b = 10. Shifting handling is a kind of context modeling that keeps scale invariant while simply cropping larger region with padding and resizing to 227 × 227 would cause a scale gap between the training and testing stages
As shown in Fig. 7, the detector selecting unit 400 further comprises a weight learning module 401 for learning combination weights of all part detectors, a selection module 402 for selecting one or more part detectors according to the combination weights, and a relearning module 403 for relearning the combination weights of the selected part detectors.
For each part prototype, the output of its ConvNet detector may be directly used as the visible score instead of stacking a linear SVM on the top as the RCNN framework. It is found that appending a SVM detector for mining hard negatives does not show significant improvement over directly using the ConvNet output, especially for GoogLeNet. This may due to the fact the training proposals generated by LDCF are already hard negatives. Thus, the SVM training stage is safely removed to save time of feature extraction.
Then a linear SVM is employed to learn complementarity over the 45 part detector scores. To alleviate the testing computation cost, 6 parts with highest value of the SVM weight is simply selected, yielding approximate performance. It is also shown that the performance improvement mainly benefits from the part complementarity. Fig. 8 is an illustration of the selected parts and their weights.
The testing patch generator 600 further comprises an extracting module for extracting testing part patches from the candidate boxes generated by the second box generator 500 as the generated testing patches for each body part corresponding to the selected part detectors.
As shown in Fig. 9, the testing unit 700 further comprises an evaluation module 701 and a result generation module 702. The evaluation module 701 may be configured to evaluate a score of each body part using the corresponding part detector from the testing part
patches, the selected part detectors and the relearned combination weights. The result generation module 702 may be configured to generate a detection score by combining the score of each body part in a weighted manner.
Fig. 10 is a schematic flowchart illustrating a method 2000 for pedestrian detection according to an embodiment of the present application. Hereinafter, the method 2000 may be described in detail with respect to Fig. 10.
At step S210, candidate boxes are generated from a plurality of pedestrian training images, for example, by employing a region proposal method such as Selective Search, Edgebox, and LDCF.
At step S220, training part patches are generating from ground truth boxes and the candidate boxes, which are generated from the plurality of pedestrian training images.
As shown in Fig. 11, the step S220 of training part patches comprises following steps. To be specific, at step S221, the candidate boxes are labeling as negative or positive candidate boxes by comparison with the ground truth boxes. At step S222, for each body part, negative and positive training part patches are extracted as the training part patches from the negative and positive candidate boxes.
And then the method 2000 proceeds with step S230, at which part detectors are trained from the training part patches.
As shown in Fig. 12, the step S230 of training part detectors comprises following steps. To be specific, at step S231, the positive and negative training part patches are mixed and splitted into batches. At step S232, each part detector is iteratively trained by using these batches until all part detectors converge respectively. At step S233, for each part detector, parameters are learned for handling shifting.
And then the method 2000 proceeds with step S240 of selecting complementary part detectors from all the trained part detectors.
As shown in Fig. 13, the step S240 of selecting complementary part detectors comprises a step S241 of learning combination weights of all part detectors, a step S242 of selecting one or more part detectors according to the combination weights, and a step of S243 of relearning the combination weights of the selected part detectors.
And then the method 2000 proceeds with step S250 at which corresponding candidate boxes are generated from a plurality of pedestrian testing images.
And then the method 2000 proceeds with step S260 at which testing part patches are generated from the candidate boxes generated from the plurality of pedestrian testing images.
The step S260 of generating testing part patches further comprises extracting testing part patches from the candidate boxes generated from the plurality of pedestrian testing images as the generated testing part patches for each body part corresponding to the selected part detectors.
And then the method 2000 proceeds with step S270 at which a detection result is generated from the testing part patches and the selected part detectors.
As shown in Fig. 14, the step S270 of generating detection result comprises following steps. At step S271, a score of each body part is evaluated using the corresponding part detector from the testing part patches, the selected part detectors and the relearned combination weights. At step S272, a detection result is generated by combining the score of each body part in a weighted manner.
Fig. 15 shows a system 3000 for pedestrian detection. The system 3000 comprises: a memory 310 that stores executable components; and a processor 320 electrically coupled to the memory 310 that executes the executable components to perform operations of the system 3000. The executable components comprise: a first box generating component 311 configured for generating candidate boxes from a plurality of pedestrian training images; a training patch generating component 312 configured for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes; a detector training component 313 configured for training one or more part detectors from the generated training part patches; a detector selecting component 314 configured for selecting complementary part detectors from all the trained part detectors; a second box generating component 315 configured for generating candidate boxes from a plurality of pedestrian testing images; a testing patch generating component 316 configured for generating testing part patches from the candidate boxes generated by the second box generator; and a testing component 317 configured for generating a detection result from the testing part patches and the selected part
detectors.
The present application is from “Deep Learning Strong Parts for Pedestrian Detection” , and is intended to addresses the problem of detecting pedestrians in a single image, aiming at constructing a pedestrian detector that can handle occlusion at different levels. The input is a single static image, and the output consists of detected bounding boxes and confidence scores.
Obviously, those skilled in the art can make variations or modifications to the present invention without departing the spirit and scope of the present invention. As such, if these variations or modifications belong to the scope of the claims and equivalent technique, they may also fall into the scope of the present invention.
Claims (25)
- An apparatus for pedestrian detection, comprising:a first box generator for generating candidate boxes from a plurality of pedestrian training images;a training patch generator for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes;a detector training unit for training one or more part detectors from the generated training part patches;a detector selecting unit for selecting complementary part detectors from all the trained part detectors;a second box generator for generating candidate boxes from a plurality of pedestrian testing images;a testing patch generator for generating testing part patches from the candidate boxes generated by the second box generator; anda testing unit for generating a detection result from the testing part patches and the selected part detectors.
- The apparatus of claim 1, wherein the training patch generator comprises:a labeling module configured to label the candidate boxes as negative or positive candidate boxes by comparison with the ground truth boxes; andan extracting module configured to extract negative and positive training part patches, as the generated training part patches, from the negative and positive candidate boxes for each body part.
- The apparatus of claim 2, wherein the detector training unit comprises:a mixing module configured to mix the positive and negative training part patches and split them into batches;a training module configured to iteratively train each part detector by using the batches until each of all part detectors converges.
- The apparatus m of claim 2, wherein the detector training unit further comprises:a parameter learning module configured to learn parameters for handling shifting for each part detector.
- The apparatus of claim 3, wherein the detector selecting unit comprises:a weight learning module configured to learn combination weights of all part detectors; anda selection module configured to select the complementary part detectors according to the combination weights.
- The apparatus of claim 5, wherein the detector selecting unit further comprises:a relearning module configured to relearn the combination weights of the selected complementary part detectors.
- The apparatus of claim 5, wherein the testing patch generator further comprises:an extracting module configured to extract testing part patches from the candidate boxes generated by the second box generator as the generated testing patches for each body part corresponding to the selected part detectors.
- The apparatus of claim 7, wherein the testing unit further comprises:an evaluation module configured to evaluate a score of each body part using the corresponding part detector from the testing part patches, the selected part detectors and the relearned combination weights; anda result generation module configured to generate a detection result by combining the score of each body part in a weighted manner.
- A method for pedestrian detection, comprising:generating candidate boxes from a plurality of pedestrian training images;generating training part patches from the candidate boxes generated from the plurality of pedestrian training images and ground truth boxes;training one or more part detectors from the training part patches;selecting complementary part detectors from all the trained part detectors;generating candidate boxes from a plurality of pedestrian testing images;generating testing part patches from the candidate boxes generated from the plurality of pedestrian testing images; andgenerating a detection result from the testing part patches and the selected part detectors.
- The method of claim 8, wherein the step of generating training part patches comprises:labeling the candidate boxes as negative or positive candidate boxes by comparison with the ground truth boxes; andextracting negative and positive training part patches, as the generated training part patches, from the negative and positive candidate boxes for each body part.
- The method of claim 10, wherein the step of training part detectors comprises:mixing the positive and negative training part patches and splitting them into batches; anditeratively training each part detector by using the batches until each of all part detectors converges.
- The method of claim 11, wherein the step of training part detectors further comprises:for each part detector, learning parameters for handling shifting.
- The method of claim 11, wherein the step of selecting complementary part detectors comprises:learning combination weights of all part detectors; andselecting the complementary part detectors according to the combination weights.
- The method of claim 13, wherein the step of selecting complementary part detectors further comprises:relearning the combination weights of the of the selected complementary part detectors.
- The method of claim 13, wherein the step of generating testing part patches comprises:for each body part corresponding to the selected part detectors, extracting testing part patches from the candidate boxes generated from the plurality of pedestrian testing images as the generated testing part patches.
- The method of claim 15, wherein the step of generating detection result comprises:evaluating a score of each body part using the corresponding part detector from the testing part patches, the selected part detectors and the relearned combination weights; andgenerating a detection result by combining the score of each body part in a weighted manner.
- A system for pedestrian detection, comprising:a memory that stores executable components; anda processor electrically coupled to the memory that executes the executable components to perform operations of the system, wherein the executable components comprise:a first box generating component configured for generating candidate boxes from a plurality of pedestrian training images;a training patch generating component configured for generating training part patches from the candidate boxes generated by the first box generator and ground truth boxes;a detector training component configured for training one or more part detectors from the generated training part patches;a detector selecting component configured for selecting complementary part detectors from all the trained part detectors;a second box generating component configured for generating candidate boxes from a plurality of pedestrian testing images;a testing patch generating component configured for generating testing part patches from the candidate boxes generated by the second box generator; anda testing component configured for generating a detection result from the testing part patches and the selected part detectors.
- The system according to claim 17, wherein the training patch generating component further comprises:a labeling sub-component configured to label the candidate boxes as negative or positive candidate boxes by comparison with the ground truth boxes; andan extracting sub-component configured to extract negative and positive training part patches, as the generated training part patches, from the negative and positive candidate boxes for each body part.
- The system according to claim 18, wherein the detector training component further comprises:a mixing sub-component e configured to mix the positive and negative training part patches and split them into batches;a training sub-component configured to iteratively train each part detector by using the batches until each of all part detectors converges.
- The apparatus m of claim 19, wherein the detector training component further comprises:a parameter learning sub-component configured to learn parameters for handling shifting for each part detector.
- The system according to claim 19, wherein the detector selecting component further comprises:a weight learning sub-component configured to learn combination weights of all part detectors; anda selection sub-component configured to select the complementary part detectors according to the combination weights.
- The system according to claim 21, wherein the detector selecting component further comprises:a relearning sub-component configured to relearn the combination weights of the selected complementary part detectors.
- The system according to claim 21, wherein the testing patch generating component further comprises:an extracting sub-component configured to extract testing part patches from the candidate boxes generated by the second box generator as the generated testing patches for each body part corresponding to the selected part detectors.
- The system according to claim 21, wherein the testing patch generating component further comprises:an extracting sub-component configured to extract testing part patches from the candidate boxes generated by the second box generator as the generated testing patches for each body part corresponding to the selected part detectors.
- The system according to claim 24, wherein the testing component further comprises:an evaluation sub-component configured to evaluate a score of each body part using the corresponding part detector from the testing part patches, the selected part detectors and the relearned combination weights; anda result generation sub-component configured to generate a detection result by combining the score of each body part in a weighted manner.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2015/091517 WO2017059576A1 (en) | 2015-10-09 | 2015-10-09 | Apparatus and method for pedestrian detection |
CN201610876667.7A CN106570453B (en) | 2015-10-09 | 2016-09-29 | Method, device and system for pedestrian detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2015/091517 WO2017059576A1 (en) | 2015-10-09 | 2015-10-09 | Apparatus and method for pedestrian detection |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017059576A1 true WO2017059576A1 (en) | 2017-04-13 |
Family
ID=58487177
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/091517 WO2017059576A1 (en) | 2015-10-09 | 2015-10-09 | Apparatus and method for pedestrian detection |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106570453B (en) |
WO (1) | WO2017059576A1 (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020051545A1 (en) * | 2018-09-07 | 2020-03-12 | Alibaba Group Holding Limited | Method and computer-readable storage medium for generating training samples for training a target detector |
CN111523469A (en) * | 2020-04-23 | 2020-08-11 | 苏州浪潮智能科技有限公司 | Pedestrian re-identification method, system, equipment and computer readable storage medium |
US10762334B2 (en) | 2017-09-29 | 2020-09-01 | Alibaba Group Holding Limited | System and method for entity recognition |
CN111914863A (en) * | 2019-05-09 | 2020-11-10 | 顺丰科技有限公司 | Target detection method and device, terminal equipment and computer readable storage medium |
US11010838B2 (en) | 2018-08-31 | 2021-05-18 | Advanced New Technologies Co., Ltd. | System and method for optimizing damage detection results |
US11069048B2 (en) | 2018-09-07 | 2021-07-20 | Advanced New Technologies Co., Ltd. | System and method for facilitating efficient damage assessments |
US11080839B2 (en) | 2018-08-31 | 2021-08-03 | Advanced New Technologies Co., Ltd. | System and method for training a damage identification model |
US11113582B2 (en) | 2018-08-31 | 2021-09-07 | Advanced New Technologies Co., Ltd. | Method and system for facilitating detection and identification of vehicle parts |
US11182889B2 (en) | 2017-09-29 | 2021-11-23 | Alibaba Group Holding Limited | System and method for authenticating physical objects based on captured images |
US11216690B2 (en) | 2018-08-31 | 2022-01-04 | Alibaba Group Holding Limited | System and method for performing image processing based on a damage assessment image judgement model |
TWI761642B (en) * | 2018-02-01 | 2022-04-21 | 開曼群島商創新先進技術有限公司 | Method, device and electronic device for determining decision-making strategy corresponding to business |
US11475660B2 (en) | 2018-08-31 | 2022-10-18 | Advanced New Technologies Co., Ltd. | Method and system for facilitating recognition of vehicle parts based on a neural network |
US11720572B2 (en) | 2018-01-08 | 2023-08-08 | Advanced New Technologies Co., Ltd. | Method and system for content recommendation |
US11790632B2 (en) | 2018-08-24 | 2023-10-17 | Advanced New Technologies Co., Ltd. | Method and apparatus for sample labeling, and method and apparatus for identifying damage classification |
US11972599B2 (en) | 2018-09-04 | 2024-04-30 | Advanced New Technologies Co., Ltd. | Method and apparatus for generating vehicle damage image on the basis of GAN network |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11188794B2 (en) | 2017-08-10 | 2021-11-30 | Intel Corporation | Convolutional neural network framework using reverse connections and objectness priors for object detection |
CN109697441B (en) | 2017-10-23 | 2021-02-12 | 杭州海康威视数字技术股份有限公司 | Target detection method and device and computer equipment |
CN109447276B (en) * | 2018-09-17 | 2021-11-02 | 烽火通信科技股份有限公司 | Machine learning system, equipment and application method |
CN109359558B (en) * | 2018-09-26 | 2020-12-25 | 腾讯科技(深圳)有限公司 | Image labeling method, target detection method, device and storage medium |
CN110298302B (en) * | 2019-06-25 | 2023-09-08 | 腾讯科技(深圳)有限公司 | Human body target detection method and related equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090097739A1 (en) * | 2007-10-10 | 2009-04-16 | Honeywell International Inc. | People detection in video and image data |
US8131011B2 (en) * | 2006-09-25 | 2012-03-06 | University Of Southern California | Human detection and tracking system |
CN104217225A (en) * | 2014-09-02 | 2014-12-17 | 中国科学院自动化研究所 | A visual target detection and labeling method |
US9042601B2 (en) * | 2013-03-14 | 2015-05-26 | Nec Laboratories America, Inc. | Selective max-pooling for object detection |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102136075B (en) * | 2011-03-04 | 2013-05-15 | 杭州海康威视数字技术股份有限公司 | Multiple-viewing-angle human face detecting method and device thereof under complex scene |
EP2574958B1 (en) * | 2011-09-28 | 2017-02-22 | Honda Research Institute Europe GmbH | Road-terrain detection method and system for driver assistance systems |
CN102609682B (en) * | 2012-01-13 | 2014-02-05 | 北京邮电大学 | Feedback pedestrian detection method for region of interest |
CN103440487B (en) * | 2013-08-27 | 2016-11-02 | 电子科技大学 | A kind of natural scene text location method of local tone difference |
-
2015
- 2015-10-09 WO PCT/CN2015/091517 patent/WO2017059576A1/en active Application Filing
-
2016
- 2016-09-29 CN CN201610876667.7A patent/CN106570453B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8131011B2 (en) * | 2006-09-25 | 2012-03-06 | University Of Southern California | Human detection and tracking system |
US20090097739A1 (en) * | 2007-10-10 | 2009-04-16 | Honeywell International Inc. | People detection in video and image data |
US9042601B2 (en) * | 2013-03-14 | 2015-05-26 | Nec Laboratories America, Inc. | Selective max-pooling for object detection |
CN104217225A (en) * | 2014-09-02 | 2014-12-17 | 中国科学院自动化研究所 | A visual target detection and labeling method |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11182889B2 (en) | 2017-09-29 | 2021-11-23 | Alibaba Group Holding Limited | System and method for authenticating physical objects based on captured images |
US10762334B2 (en) | 2017-09-29 | 2020-09-01 | Alibaba Group Holding Limited | System and method for entity recognition |
US11720572B2 (en) | 2018-01-08 | 2023-08-08 | Advanced New Technologies Co., Ltd. | Method and system for content recommendation |
US11978000B2 (en) | 2018-02-01 | 2024-05-07 | Advanced New Technologies Co., Ltd. | System and method for determining a decision-making strategy |
TWI761642B (en) * | 2018-02-01 | 2022-04-21 | 開曼群島商創新先進技術有限公司 | Method, device and electronic device for determining decision-making strategy corresponding to business |
US11790632B2 (en) | 2018-08-24 | 2023-10-17 | Advanced New Technologies Co., Ltd. | Method and apparatus for sample labeling, and method and apparatus for identifying damage classification |
US11010838B2 (en) | 2018-08-31 | 2021-05-18 | Advanced New Technologies Co., Ltd. | System and method for optimizing damage detection results |
US11113582B2 (en) | 2018-08-31 | 2021-09-07 | Advanced New Technologies Co., Ltd. | Method and system for facilitating detection and identification of vehicle parts |
US11080839B2 (en) | 2018-08-31 | 2021-08-03 | Advanced New Technologies Co., Ltd. | System and method for training a damage identification model |
US11216690B2 (en) | 2018-08-31 | 2022-01-04 | Alibaba Group Holding Limited | System and method for performing image processing based on a damage assessment image judgement model |
US11475660B2 (en) | 2018-08-31 | 2022-10-18 | Advanced New Technologies Co., Ltd. | Method and system for facilitating recognition of vehicle parts based on a neural network |
US11748399B2 (en) | 2018-08-31 | 2023-09-05 | Advanced New Technologies Co., Ltd. | System and method for training a damage identification model |
US11972599B2 (en) | 2018-09-04 | 2024-04-30 | Advanced New Technologies Co., Ltd. | Method and apparatus for generating vehicle damage image on the basis of GAN network |
US11069048B2 (en) | 2018-09-07 | 2021-07-20 | Advanced New Technologies Co., Ltd. | System and method for facilitating efficient damage assessments |
WO2020051545A1 (en) * | 2018-09-07 | 2020-03-12 | Alibaba Group Holding Limited | Method and computer-readable storage medium for generating training samples for training a target detector |
CN111914863A (en) * | 2019-05-09 | 2020-11-10 | 顺丰科技有限公司 | Target detection method and device, terminal equipment and computer readable storage medium |
WO2021212737A1 (en) * | 2020-04-23 | 2021-10-28 | 苏州浪潮智能科技有限公司 | Person re-identification method, system, and device, and computer readable storage medium |
CN111523469B (en) * | 2020-04-23 | 2022-02-18 | 苏州浪潮智能科技有限公司 | Pedestrian re-identification method, system, equipment and computer readable storage medium |
CN111523469A (en) * | 2020-04-23 | 2020-08-11 | 苏州浪潮智能科技有限公司 | Pedestrian re-identification method, system, equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106570453B (en) | 2020-03-03 |
CN106570453A (en) | 2017-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017059576A1 (en) | Apparatus and method for pedestrian detection | |
US9965719B2 (en) | Subcategory-aware convolutional neural networks for object detection | |
Zhang et al. | Self-produced guidance for weakly-supervised object localization | |
JP6188400B2 (en) | Image processing apparatus, program, and image processing method | |
US10860837B2 (en) | Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition | |
US10275719B2 (en) | Hyper-parameter selection for deep convolutional networks | |
Endres et al. | Category-independent object proposals with diverse ranking | |
Vicente et al. | Leave-one-out kernel optimization for shadow detection | |
CN110383291B (en) | System, method, and computer-readable medium for understanding machine learning decisions | |
Wang et al. | Probabilistic inference for occluded and multiview on-road vehicle detection | |
US8965115B1 (en) | Adaptive multi-modal detection and fusion in videos via classification-based-learning | |
CN110084299B (en) | Target detection method and device based on multi-head fusion attention | |
US11803971B2 (en) | Generating improved panoptic segmented digital images based on panoptic segmentation neural networks that utilize exemplar unknown object classes | |
KR102655789B1 (en) | Face detecting method and apparatus | |
WO2016179808A1 (en) | An apparatus and a method for face parts and face detection | |
KR102138680B1 (en) | Apparatus for Video Recognition and Method thereof | |
Abbott et al. | Deep object classification in low resolution lwir imagery via transfer learning | |
Khellal et al. | Pedestrian classification and detection in far infrared images | |
Le et al. | Co-localization with category-consistent features and geodesic distance propagation | |
Juang et al. | Stereo-camera-based object detection using fuzzy color histograms and a fuzzy classifier with depth and shape estimations | |
Smitha et al. | Optimal feed forward neural network based automatic moving vehicle detection system in traffic surveillance system | |
CN112418358A (en) | Vehicle multi-attribute classification method for strengthening deep fusion network | |
Wu et al. | Detection algorithm for dense small objects in high altitude image | |
Lotfi | Trajectory clustering and behaviour retrieval from traffic surveillance videos | |
Anitha | An Efficient Region Based Object Detection method using Deep learning Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15905674 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15905674 Country of ref document: EP Kind code of ref document: A1 |