CN112215103A - Vehicle and pedestrian multi-class detection method and device based on improved ACF - Google Patents
Vehicle and pedestrian multi-class detection method and device based on improved ACF Download PDFInfo
- Publication number
- CN112215103A CN112215103A CN202011034733.9A CN202011034733A CN112215103A CN 112215103 A CN112215103 A CN 112215103A CN 202011034733 A CN202011034733 A CN 202011034733A CN 112215103 A CN112215103 A CN 112215103A
- Authority
- CN
- China
- Prior art keywords
- vehicle
- pedestrian
- detection
- height
- calibration frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 196
- 238000012549 training Methods 0.000 claims abstract description 80
- 230000002776 aggregation Effects 0.000 claims abstract description 42
- 238000004220 aggregation Methods 0.000 claims abstract description 42
- 238000000034 method Methods 0.000 claims abstract description 40
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 230000008030 elimination Effects 0.000 claims abstract description 8
- 238000003379 elimination reaction Methods 0.000 claims abstract description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 29
- 230000003595 spectral effect Effects 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 8
- 239000013598 vector Substances 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 6
- 238000007477 logistic regression Methods 0.000 claims description 5
- 238000000354 decomposition reaction Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 3
- 238000012706 support-vector machine Methods 0.000 claims description 3
- 238000003066 decision tree Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000003064 k means clustering Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/584—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/08—Detecting or categorising vehicles
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Human Computer Interaction (AREA)
- Traffic Control Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a vehicle and pedestrian multi-class detection method and device based on an improved ACF, wherein the method comprises the following steps: obtaining a vehicle training sample and a pedestrian training sample, and preprocessing the vehicle training sample and the pedestrian training sample; extracting the multi-view aggregation channel characteristics of the preprocessed vehicle training samples and the context pixel aggregation channel characteristics of the pedestrian training samples by using a vehicle and pedestrian detection framework, establishing a vehicle detector according to the multi-view aggregation channel characteristics, and establishing a pedestrian detector according to the context pixel aggregation channel characteristics; sharing the aggregation channel characteristics of the image to be detected into the vehicle detector and the pedestrian detector to obtain a vehicle detection result and a pedestrian detection result; and adopting a false detection and elimination strategy based on road constraint to carry out false detection and elimination on the vehicle detection result and the pedestrian detection result. The invention solves the problems of single detection target, low detection precision and easy occurrence of false detection at present.
Description
Technical Field
The invention relates to the technical field of unmanned visual analysis, in particular to a vehicle and pedestrian multi-class detection method and device based on an improved ACF and a storage medium.
Background
With the improvement of science and technology and the improvement of the living standard of people, the automobile keeping quantity is greatly increased, and various factors cause frequent traffic accidents. Unmanned driving seeks to ameliorate this problem, where vehicle and pedestrian detection techniques are of paramount importance. The accuracy and real-time performance of the vehicle and pedestrian detection algorithm directly affect the safety performance of the unmanned vehicle.
The current mainstream vehicle and pedestrian detection algorithms include a deep learning detection algorithm and a statistical feature detection algorithm. The CNN feature training period in deep learning is long, and the calculated amount is large. According to the difference of detection strategies, the statistical characteristic detection method can be subdivided into a DPM method and a decision tree method, and the DPM method and the decision tree method are rarely applied to an unmanned system due to the characteristics of high complexity and low operation speed. In the decision tree method, the design of the feature descriptors is the key of the detection algorithm and is the most studied content at present, and mainly comprises gradient, texture, color and fusion features thereof. The Haar features are mainly used for extracting texture information of a target and are widely applied to the field of vehicle detection, and the HOG features and the like are used for capturing information such as the contour and the shape of the target, are representative of gradient features and are generally used for detecting pedestrians. In addition, the color characteristics of gray scale, RGB, LUV and the like can also be used for representing the target. However, these features can generally only be used to detect specific targets, and expressive power has limitations in complex road scenarios. For the above problems, an Integrated Channel Features (ICF) is proposed first, which fuses Features such as gradients, colors, textures, and the like, and then an Aggregated Channel Feature (ACF) is proposed in order to improve detection performance. Then, the LDCF introduces filtering operation on the basis of the ACF to enhance the expression capability of the ACF, but also brings great calculation amount, and although the algorithm detection precision is further improved, the real-time performance is greatly reduced. Although the LDCF detection precision is greatly improved compared with the ACF detection precision, the real-time performance is reduced, and the LDCF detection method is difficult to be applied to light-weight vehicle pedestrian detection.
Therefore, when the conventional vehicle and pedestrian detection method is applied to a road scene, the problems of single detection target, low detection precision and easy occurrence of false detection are solved.
Disclosure of Invention
In view of the above, it is desirable to provide a method, an apparatus and a storage medium for detecting multiple types of vehicles and pedestrians based on an improved ACF, so as to solve the problems of single detection target, low detection accuracy and easy occurrence of false detection when detecting vehicles and pedestrians.
In a first aspect, the invention provides a vehicle and pedestrian multi-class detection method based on an improved ACF, comprising the following steps:
obtaining a vehicle training sample and a pedestrian training sample, and preprocessing the vehicle training sample and the pedestrian training sample;
extracting the multi-view aggregation channel characteristics of the preprocessed vehicle training samples by using a vehicle and pedestrian detection framework, and establishing a vehicle detector according to the multi-view aggregation channel characteristics;
extracting context pixel aggregation channel characteristics of the preprocessed pedestrian training sample by using a vehicle and pedestrian detection framework, and establishing a pedestrian detector according to the context pixel aggregation channel characteristics;
acquiring a preprocessed image to be detected, and sharing the aggregation channel characteristics of the image to be detected into the vehicle detector and the pedestrian detector to obtain a vehicle detection result and a pedestrian detection result;
and adopting a false detection and elimination strategy based on road constraint to carry out false detection and elimination on the vehicle detection result and the pedestrian detection result.
Preferably, in the method for detecting multiple categories of vehicles and pedestrians based on the improved ACF, the method for preprocessing the vehicle training samples and the pedestrian training samples specifically includes:
scaling the vehicle training sample and the pedestrian training sample in horizontal and vertical directions, and keeping the center position of each target in the vehicle training sample and the pedestrian training sample normalized.
Preferably, in the method for detecting multiple classes of vehicles and pedestrians based on the improved ACF, the step of extracting the multi-view aggregation channel feature of the vehicle training sample after the preprocessing by using the vehicle and pedestrian detection framework and establishing the vehicle detector according to the multi-view aggregation channel feature specifically includes:
and calculating a similar incidence matrix among sample points in the preprocessed vehicle training sample by adopting a spectral clustering algorithm, obtaining characteristic vectors of multiple dimensions through matrix spectral decomposition, clustering the characteristic vectors of the multiple dimensions by adopting a K-means algorithm to extract aggregate channel characteristics of the multiple viewing angles, and then training the vehicle detector of the corresponding viewing angle by utilizing the aggregate channel characteristics of the multiple viewing angles.
Preferably, in the method for detecting multiple classes of vehicles and pedestrians based on the improved ACF, the step of extracting the context pixel aggregation channel feature of the preprocessed pedestrian training sample by using the vehicle and pedestrian detection framework, and establishing the pedestrian detector according to the context pixel aggregation channel feature specifically includes:
extracting 10 characteristic channels of the preprocessed pedestrian training sample, processing the ten channels by using 2 multiplied by 2 average pooling to obtain an aggregation channel F with n being 22×2After the feature is obtained, 2 × 2 average pooling is performed twice on the aggregation channel feature F2 × 2 to obtain a region context pixel aggregation channel F4×4Feature and F8×8Characterized in that F is4×4Feature and F8×8Feature sampling to F2×2And (3) resolution, combining to form 30 deformation-resistant context pixel aggregation channel features with the same size, and establishing the pedestrian detector according to the context pixel aggregation channel features.
Preferably, in the method for detecting multiple categories of vehicles and pedestrians based on the improved ACF, the step of obtaining the preprocessed image to be detected and sharing the aggregate channel characteristics of the image to be detected to the vehicle detector and the pedestrian detector to obtain the vehicle detection result and the pedestrian detection result further includes:
and carrying out confidence score calibration on the vehicle detection result by adopting a parameterized Logistic regression calibration method.
Preferably, in the method for detecting multiple categories of vehicles and pedestrians based on the improved ACF, the step of performing false detection and rejection on the vehicle detection result and the pedestrian detection result by using a false detection and rejection strategy based on road constraints includes:
normalizing the height of the vehicle calibration frame of the vehicle training sample and the position coordinate of the lower edge of the vehicle calibration frame, and the height of the pedestrian calibration frame of the pedestrian training sample and the position coordinate of the lower edge of the pedestrian calibration frame;
training a first regression model between the height of the vehicle calibration frame and the position coordinates of the lower edge of the vehicle calibration frame and a second regression model between the height of the pedestrian calibration frame and the position coordinates of the lower edge of the pedestrian calibration frame by using a support vector machine;
calculating the height of a predicted vehicle calibration frame corresponding to the lower edge position of the vehicle detection result by adopting a first regression model, and calculating the height of a predicted pedestrian calibration frame corresponding to the lower edge position of a pedestrian in a pedestrian detection result by adopting a second regression model;
calculating a first error value between the height of the predicted vehicle calibration frame and the height of an actual vehicle calibration frame in the vehicle detection result, and a second error value between the height of the predicted pedestrian calibration frame and the height of the actual pedestrian calibration frame in the pedestrian detection result;
when the first error value is larger than a first threshold value, judging that the vehicle detection result is detected by mistake, otherwise, receiving the vehicle detection result; and judging that the pedestrian detection result is detected wrongly when the second error value is larger than a second threshold value, and otherwise, receiving the pedestrian detection result.
Preferably, in the method for detecting multiple categories of vehicles and pedestrians based on the improved ACF, the first regression model is:
H1=f(Y1),
wherein H1Indicating the height of the calibration frame of the vehicle, Y1The position coordinates of the lower edge of the vehicle calibration frame are represented;
the first error value is calculated by:
wherein E is1Represents a first error value, h1Denotes the height, h 'of the actual vehicle calibration frame'1Indicating the height of the predicted vehicle calibration box and abs indicates the absolute value.
Preferably, in the vehicle and pedestrian multi-category detection method based on the improved ACF, the second regression model is:
H2=f(Y2),
wherein H2Indicating the height of the pedestrian calibration frame, Y2Representing the position coordinates of the lower edge of the pedestrian calibration frame;
the second error value is calculated by:
wherein E is2Represents a second error value, h2Denotes the height, h 'of the actual vehicle calibration frame'2Indicating the height of the predicted vehicle calibration box and abs indicates the absolute value.
In a second aspect, the present invention further provides a vehicle and pedestrian multi-category detection apparatus based on the improved ACF, including: a processor and a memory;
the memory has stored thereon a computer readable program executable by the processor;
the processor, when executing the computer readable program, implements the steps in the improved ACF-based vehicle pedestrian multi-class detection method as described above.
In a third aspect, the present invention also provides a computer readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the improved ACF-based vehicle pedestrian multi-class detection method as described above.
[ PROBLEMS ] the present invention
In the vehicle and pedestrian multi-class detection method, the device and the storage medium based on the improved ACF, the problem that an Adaboost classifier in an ACF detection algorithm is single in detection class is solved, a multi-class detection framework is adopted, and vehicle and pedestrian detection is carried out simultaneously; in order to solve the problem of low vehicle and pedestrian detection precision, a multi-view vehicle detector and a context pixel pedestrian detector are adopted, so that the view angle difference of a vehicle sample and the deformation caused by the gesture of a pedestrian when the pedestrian walks can be effectively captured, and the detection precision is improved; in order to overcome the false detection phenomenon in the vehicle and pedestrian detection process, the false detection is effectively removed by utilizing the road prior information.
Drawings
FIG. 1 is a flow chart of a method for detecting multiple types of pedestrians in a vehicle based on an improved ACF according to a preferred embodiment of the present invention;
FIG. 2 is a flowchart of the operation of a preferred embodiment of the vehicle pedestrian detection framework of the present invention;
FIG. 3 is a schematic illustration of a training process for the vehicle detector of the present invention;
FIG. 4 is a schematic diagram of a training process for a pedestrian detector according to the present invention;
FIG. 5 is a statistical graph of the relationship between the target height of the calibration frame and the coordinates of its lower edge position according to the present invention;
fig. 6 is a schematic diagram of an operating environment of a preferred embodiment of a vehicle pedestrian multi-category detection program based on an improved ACF.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
Referring to fig. 1, a vehicle and pedestrian multi-category detection method based on an improved ACF provided in an embodiment of the present invention includes the following steps:
s100, obtaining a vehicle training sample and a pedestrian training sample, and preprocessing the vehicle training sample and the pedestrian training sample.
In this embodiment, in order to implement detection of vehicles and pedestrians, training of samples needs to be performed first, and in order to ensure detection accuracy, sample data needs to be enhanced and improved first, and is preprocessed, specifically, in step S100, the method for preprocessing the samples specifically includes:
scaling the vehicle training sample and the pedestrian training sample in horizontal and vertical directions, and keeping the center position of each target in the vehicle training sample and the pedestrian training sample normalized.
Specifically, the current ACF detection algorithm strengthens a data set by using a horizontal flipping method, and ignores the influence of a labeling error of the data set. Meanwhile, images are generally standardized in the training process, and the problem of target misalignment easily occurs in the zoomed images, so that the detection precision is seriously influenced. Therefore, the horizontal inversion is directly removed, multi-scale data reinforcement is added, namely 1.1 times of scaling is carried out in the horizontal direction, the vertical direction and the like of the original training sample, the normalization of the target center position is kept, the sensitivity of the background around the labeling frame can be reduced by utilizing the multi-scale reinforcement, and the classification robustness is improved.
S200, extracting multi-view aggregation channel characteristics of the preprocessed vehicle training samples by using a vehicle and pedestrian detection frame, and establishing a vehicle detector according to the multi-view aggregation channel characteristics;
s300, extracting context pixel aggregation channel characteristics of the preprocessed pedestrian training sample by using a vehicle and pedestrian detection framework, and establishing a pedestrian detector according to the context pixel aggregation channel characteristics.
In the embodiment, because the detection target of the traditional ACF algorithm is single, a multi-class pedestrian vehicle detection framework is introduced, feature extraction is performed on the vehicle sample and the pedestrian sample at the same time, and the ACF features detected by the vehicle detector and the pedestrian detector are shared and then used for vehicle and pedestrian detection. Specifically, the invention provides a vehicle and pedestrian detection framework based on feature sharing, so that an ACF feature is shared by a vehicle detector and a pedestrian detector, the detection efficiency is improved, and multi-class detection is completed at the same time. The frame trains the pedestrian detector and the vehicle detector at the same time in the training stage, and enables the pedestrian detector and the vehicle detector to share ACF characteristics, so that the training efficiency can be obviously improved. Meanwhile, the framework has universality, other detectors can be added, and other detection categories are easy to expand.
Further, the step S200 specifically includes:
and calculating a similar incidence matrix among sample points in the preprocessed vehicle training sample by adopting a spectral clustering algorithm, obtaining characteristic vectors of multiple dimensions through matrix spectral decomposition, clustering the characteristic vectors of the multiple dimensions by adopting a K-means algorithm to extract aggregate channel characteristics of the multiple viewing angles, and then training the vehicle detector of the corresponding viewing angle by utilizing the aggregate channel characteristics of the multiple viewing angles.
Specifically, as shown in fig. 3, when training a Multi-view Aggregated Channel feature vehicle detector (Mv-ACF), the present invention performs a clustering process on vehicle training samples, extracts Features of each view sample, and then trains a corresponding vehicle detector, and in consideration of the fact that the extracted feature dimension of the ACF is high, the present invention uses a K-means algorithm of unsupervised learning. Aiming at the problem of possible clustering degradation, the method adopts a spectral clustering algorithm, firstly calculates a similar incidence matrix among sample points, obtains a characteristic vector through matrix spectral decomposition, constructs a new characteristic space, and then uses a K-means algorithm for clustering.
And verifying the effectiveness of the spectral clustering algorithm used by the invention. Through experiments, K-means clustering is carried out on the training samples, and the training samples are clustered into 20 classes. The classified samples were trained and tested using an Mv-ACF detector and compared to the spectral clustering algorithm results, as shown in table 1. Obviously, the AP precision of the spectral clustering algorithm under different levels is higher than that of the K-means clustering algorithm, and the spectral clustering algorithm achieves the expected effect.
In a preferred embodiment, the step S300 specifically includes:
extracting 10 characteristic channels of the preprocessed pedestrian training sample, processing the ten channels by using 2 multiplied by 2 average pooling to obtain an aggregation channel F with n being 22×2After the feature is obtained, 2 × 2 average pooling is performed twice on the aggregation channel feature F2 × 2 to obtain a region context pixel aggregation channel F4×4Feature and F8×8Characterized in that F is4×4Feature and F8×8Feature sampling to F2×2And (3) resolution, combining to form 30 deformation-resistant context pixel aggregation channel features with the same size, and establishing the pedestrian detector according to the context pixel aggregation channel features.
Specifically, the posture of the pedestrian when walking can cause deformation, so that the difficulty of pedestrian detection is increased. To this end, the invention proposes a Context Pixel Aggregated Channel feature (CP-ACF). As shown in fig. 4, 10 feature channels are extracted first as in the ACF algorithm, and then the ten channels are processed using 2 × 2 average pooling to obtain an ACF feature F2 × 2 with n being 2, and on the basis of this, 2 × 2 average pooling is performed twice again to obtain region context pixel aggregation F4 × 4 and F8 × 8 features. Finally, F4 x 4 and F8 x 8 are up-sampled to F2 x 2 resolution, and finally combined to form 30 deformation-resistant CP-ACF channel features with the same size, so that fusion of local and context features is realized. When the soft cascade Adaboost is classified, the weak classifier can self-adaptively select local and context characteristics of different areas in a CP-ACF channel, compared with the characteristic that the ACF can only select a fixed area, the CP-ACF has stronger anti-deformation capability. The AP accuracies of CP-ACF and ACF at different levels of KITTI validation set are shown in the following table.
The invention designs the vehicle detector and the pedestrian detector in the frame respectively, integrates road information, aims to improve the detection precision, simultaneously realizes the feature sharing of the vehicle detector and the pedestrian detector and improves the algorithm real-time property.
S400, acquiring the preprocessed image to be detected, and sharing the aggregation channel characteristics of the image to be detected into the vehicle detector and the pedestrian detector to obtain a vehicle detection result and a pedestrian detection result.
Specifically, in the detection algorithm, the part which usually takes the longest time is image feature extraction, and in the conventional vehicle and pedestrian detection algorithm, vehicle detection and pedestrian detection are usually performed respectively, that is, feature extraction is performed on the image respectively, which takes a long time. The vehicle pedestrian detection framework based on feature sharing is proposed, as shown in fig. 2, so that the vehicle detector and the pedestrian detector share the ACF feature, the detection efficiency is improved, and meanwhile, multi-class detection is completed.
Further, since the Mv-ACF and the CP-ACF use the same 10 original ACF feature channels, the former uses 2 × 2 average pooling to extract features, and the latter uses three types of average pooling, namely 2 × 2, 4 × 4 and 8 × 8 average pooling to extract features, it can be seen that the feature channels used by the latter contain the former, and the feature pyramid construction patterns of the two are the same, so that the vehicle detector and the pedestrian detector can share the feature pyramid of the latter.
In a preferred embodiment, after the step S400, the method further includes:
and carrying out confidence score calibration on the vehicle detection result by adopting a parameterized Logistic regression calibration method.
Specifically, in the testing process, each subclass detector of the Mv-ACF adopts a data training model with different viewing angles, and the detection result includes detection frames with different distribution confidence scores and inconsistent geometric features (such as aspect ratio). Direct merging introduces noise, which causes instability of subsequent NMS and reduces accuracy. The invention introduces a parameterized Logistic regression calibration method to calibrate the confidence scores of the detection results, so that the distribution of the detection results is more reasonable.
Specifically, let Deti ═ di1,di2,…,dij,…,dirAnd is r detection results of the ith subclass detector. Wherein d isij={Rij,cijDenotes the jth detection result, which is indicated by the detection frame RijAnd confidence score cijAnd (4) forming. Set mDeti={mdi1,mdi2,…,mdij,…,mdirIs the calibrated result, wherein mdij={Rij,c′ij}. The purpose of the confidence score calibration is to adopt a calibration function giC'ij=gi(cij). A parameterized Logistic regression calibration method is introduced, and the scores are normalized, namely:
wherein, the parameter A of the ith subclass detectoriAnd BiObtaining by solving a regularized maximum likelihood problem:
substituting formula (1) into formula (2) to obtain
wherein r is+And r-Respectively for the ith subclass for training parameter AiAnd BiPositive and negative sample numbers of (1). y isjLabel representing the jth sample, yjExpressed as target, yj-1 represents background. Through the above process, the confidence score calibration of the vehicle subclass detector is completed.
And S500, adopting a false detection and elimination strategy based on road constraint to carry out false detection and elimination on the vehicle detection result and the pedestrian detection result.
In the embodiment, in order to avoid reducing the false detection rate, the output result is further subjected to a false detection and rejection step, a false detection and rejection strategy based on road constraint is introduced, the false detection rate is reduced, and the detection algorithm is perfected, so that the method is suitable for detecting the light-weight vehicles and pedestrians. Specifically, the step S500 specifically includes:
normalizing the height of the vehicle calibration frame of the vehicle training sample and the position coordinate of the lower edge of the vehicle calibration frame, and the height of the pedestrian calibration frame of the pedestrian training sample and the position coordinate of the lower edge of the pedestrian calibration frame;
training a first regression model between the height of the vehicle calibration frame and the position coordinates of the lower edge of the vehicle calibration frame and a second regression model between the height of the pedestrian calibration frame and the position coordinates of the lower edge of the pedestrian calibration frame by using a support vector machine;
calculating the height of a predicted vehicle calibration frame corresponding to the lower edge position of the vehicle detection result by adopting a first regression model, and calculating the height of a predicted pedestrian calibration frame corresponding to the lower edge position of a pedestrian in a pedestrian detection result by adopting a second regression model;
calculating a first error value between the height of the predicted vehicle calibration frame and the height of an actual vehicle calibration frame in the vehicle detection result, and a second error value between the height of the predicted pedestrian calibration frame and the height of the actual pedestrian calibration frame in the pedestrian detection result;
when the first error value is larger than a first threshold value, judging that the vehicle detection result is detected by mistake, otherwise, receiving the vehicle detection result; and judging that the pedestrian detection result is detected wrongly when the second error value is larger than a second threshold value, and otherwise, receiving the pedestrian detection result.
Wherein the first regression model is:
H1=f(Y1),
wherein H1Indicating the height of the calibration frame of the vehicle, Y1Indicating the lower edge of the calibration frame of the vehicleA position coordinate;
the first error value is calculated by:
wherein E is1Represents a first error value, h1Denotes the height, h 'of the actual vehicle calibration frame'1Indicating the height of the predicted vehicle calibration box and abs indicates the absolute value.
The second regression model is:
H2=f(Y2),
wherein H2Indicating the height of the pedestrian calibration frame, Y2Representing the position coordinates of the lower edge of the pedestrian calibration frame;
the second error value is calculated by:
wherein E is2Represents a second error value, h2Denotes the height, h 'of the actual vehicle calibration frame'2Indicating the height of the predicted vehicle calibration box and abs indicates the absolute value.
In other words, the invention firstly normalizes the heights H of 12186 pedestrians and 15891 vehicle calibration frames in the Caltech and KITTI training data set and the position coordinates Y of the lower edge of the calibration frame. The false detection phenomenon in the vehicle and pedestrian detection process can be eliminated by utilizing the road prior information. In order to utilize the road prior information, firstly, the heights H of 12186 pedestrians and 15891 vehicle calibration frames in normalized Caltech and KITTI training data sets and the coordinates Y of the lower edges of the calibration frames are counted, and as shown in fig. 5, it can be seen that a certain statistical relationship exists between H and Y. According to the relationship, the invention provides a simple and efficient road constraint (GPC) false detection rejection strategy, namely, a target which does not accord with the relationship is regarded as false detection. The statistical relationship f can be determined using a regression model between H and Y, first normalizing H and Y of the training samples, and then training the regression model W between H and Y using an SVM. And then comparing the vehicle and pedestrian calibration frame given in the detection result with the corresponding ground channel to obtain the vehicle and pedestrian calibration frame which is closest to the real value. After the model is trained, for a detection frame { x, y, W, h } obtained after NMS, the lower edge position of the detection frame is y + h, then the trained regression model W is used for calculating corresponding h ', and finally the relative error between h and h' is calculated.
As shown in fig. 6, based on the above-mentioned multi-class vehicle and pedestrian detection method based on the improved ACF, the present invention further provides a multi-class vehicle and pedestrian detection device based on the improved ACF, where the multi-class vehicle and pedestrian detection device based on the improved ACF may be a mobile terminal, a desktop computer, a notebook, a palmtop computer, a server, and other computing devices. The improved ACF-based vehicle pedestrian multi-category detection apparatus includes a processor 10, a memory 20, and a display 30. Fig. 6 shows only some of the components of the improved ACF-based pedestrian multi-category detection apparatus, but it should be understood that not all of the shown components need be implemented, and that more or fewer components may be implemented instead.
The memory 20 may be an internal storage unit of the ACF-based vehicular pedestrian multi-class detection apparatus in some embodiments, for example, a hard disk or a memory of the ACF-based vehicular pedestrian multi-class detection apparatus. The memory 20 may also be an external storage device of the ACF-based vehicle pedestrian multi-class detection device in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), and the like, which are equipped on the ACF-based vehicle pedestrian multi-class detection device. Further, the memory 20 may also include both an internal memory unit and an external memory device of the ACF-based vehicle and pedestrian multi-category detection device. The memory 20 is used for storing application software installed in the improved ACF-based multi-class vehicle and pedestrian detection device and various types of data, such as program codes of the improved ACF-based multi-class vehicle and pedestrian detection device. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores an improved ACF-based vehicle and pedestrian multi-class detection program 40, and the improved ACF-based vehicle and pedestrian multi-class detection program 40 can be executed by the processor 10, so as to implement the improved ACF-based vehicle and pedestrian multi-class detection method according to the embodiments of the present application.
The processor 10 may be, in some embodiments, a Central Processing Unit (CPU), microprocessor or other data Processing chip for running program codes stored in the memory 20 or Processing data, such as executing the improved ACF-based vehicle and pedestrian multi-category detection method.
The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the improved ACF-based vehicular pedestrian multi-category detection apparatus and for displaying a visual user interface. The components 10-30 of the ACF-based vehicular pedestrian multi-category detection apparatus communicate with each other via a system bus.
In one embodiment, the improved ACF-based vehicle and pedestrian multi-class detection method according to the above embodiment is implemented when the processor 10 executes the improved ACF-based vehicle and pedestrian multi-class detection program 40 in the memory 20, and since the improved ACF-based vehicle and pedestrian multi-class detection method has been described in detail above, it will not be described herein again.
In summary, in the vehicle and pedestrian multi-class detection method, device and storage medium based on the improved ACF provided by the invention, the problem that the detection class of the Adaboost classifier in the ACF detection algorithm is single is solved, and a multi-class detection framework is adopted to simultaneously detect the vehicle and the pedestrian; in order to solve the problem of low vehicle and pedestrian detection precision, a multi-view vehicle detector and a context pixel pedestrian detector are adopted, so that the view angle difference of a vehicle sample and the deformation caused by the gesture of a pedestrian when the pedestrian walks can be effectively captured, and the detection precision is improved; in order to overcome the false detection phenomenon in the vehicle and pedestrian detection process, the false detection is effectively removed by utilizing the road prior information.
Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program instructing relevant hardware (such as a processor, a controller, etc.), and the program may be stored in a computer readable storage medium, and when executed, the program may include the processes of the above method embodiments. The storage medium may be a memory, a magnetic disk, an optical disk, etc.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.
Claims (10)
1. A vehicle and pedestrian multi-class detection method based on an improved ACF is characterized by comprising the following steps:
obtaining a vehicle training sample and a pedestrian training sample, and preprocessing the vehicle training sample and the pedestrian training sample;
extracting the multi-view aggregation channel characteristics of the preprocessed vehicle training samples by using a vehicle and pedestrian detection framework, and establishing a vehicle detector according to the multi-view aggregation channel characteristics;
extracting context pixel aggregation channel characteristics of the preprocessed pedestrian training sample by using a vehicle and pedestrian detection framework, and establishing a pedestrian detector according to the context pixel aggregation channel characteristics;
acquiring a preprocessed image to be detected, and sharing the aggregation channel characteristics of the image to be detected into the vehicle detector and the pedestrian detector to obtain a vehicle detection result and a pedestrian detection result;
and adopting a false detection and elimination strategy based on road constraint to carry out false detection and elimination on the vehicle detection result and the pedestrian detection result.
2. The improved ACF-based vehicle and pedestrian multi-category detection method according to claim 1, wherein the method for preprocessing the vehicle training samples and the pedestrian training samples is specifically:
scaling the vehicle training sample and the pedestrian training sample in horizontal and vertical directions, and keeping the center position of each target in the vehicle training sample and the pedestrian training sample normalized.
3. The improved ACF-based vehicle and pedestrian multi-category detection method according to claim 1, wherein the step of extracting the multi-view aggregated channel features of the vehicle training samples after preprocessing by using a vehicle and pedestrian detection framework and building a vehicle detector according to the multi-view aggregated channel features specifically comprises:
and calculating a similar incidence matrix among sample points in the preprocessed vehicle training sample by adopting a spectral clustering algorithm, obtaining characteristic vectors of multiple dimensions through matrix spectral decomposition, clustering the characteristic vectors of the multiple dimensions by adopting a K-means algorithm to extract aggregate channel characteristics of the multiple viewing angles, and then training the vehicle detector of the corresponding viewing angle by utilizing the aggregate channel characteristics of the multiple viewing angles.
4. The method as claimed in claim 1, wherein the step of extracting the context pixel aggregation channel feature of the preprocessed pedestrian training sample by the vehicle and pedestrian detection framework and building the pedestrian detector according to the context pixel aggregation channel feature specifically comprises:
extracting 10 characteristic channels of the preprocessed pedestrian training sample, processing the ten channels by using 2 multiplied by 2 average pooling to obtain an aggregation channel F with n being 22×2After the characteristics, toThe aggregation channel characteristic F2 multiplied by 2 is subjected to 2 multiplied by 2 average pooling twice to obtain a regional context pixel aggregation channel F4×4Feature and F8×8Characterized in that F is4×4Feature and F8×8Feature sampling to F2×2And (3) resolution, combining to form 30 deformation-resistant context pixel aggregation channel features with the same size, and establishing the pedestrian detector according to the context pixel aggregation channel features.
5. The improved ACF-based vehicle and pedestrian multi-category detection method according to claim 1, wherein the step of acquiring the pre-processed image to be detected, sharing the aggregated channel features of the image to be detected into the vehicle detector and the pedestrian detector to obtain vehicle detection results and pedestrian detection results further comprises:
and carrying out confidence score calibration on the vehicle detection result by adopting a parameterized Logistic regression calibration method.
6. The improved ACF-based vehicle and pedestrian multi-category detection method according to claim 1, wherein the step of performing false detection and rejection on the vehicle detection result and the pedestrian detection result by using a false detection and rejection strategy based on road constraints includes:
normalizing the height of the vehicle calibration frame of the vehicle training sample and the position coordinate of the lower edge of the vehicle calibration frame, and the height of the pedestrian calibration frame of the pedestrian training sample and the position coordinate of the lower edge of the pedestrian calibration frame;
training a first regression model between the height of the vehicle calibration frame and the position coordinates of the lower edge of the vehicle calibration frame and a second regression model between the height of the pedestrian calibration frame and the position coordinates of the lower edge of the pedestrian calibration frame by using a support vector machine;
calculating the height of a predicted vehicle calibration frame corresponding to the lower edge position of the vehicle detection result by adopting a first regression model, and calculating the height of a predicted pedestrian calibration frame corresponding to the lower edge position of a pedestrian in a pedestrian detection result by adopting a second regression model;
calculating a first error value between the height of the predicted vehicle calibration frame and the height of an actual vehicle calibration frame in the vehicle detection result, and a second error value between the height of the predicted pedestrian calibration frame and the height of the actual pedestrian calibration frame in the pedestrian detection result;
when the first error value is larger than a first threshold value, judging that the vehicle detection result is detected by mistake, otherwise, receiving the vehicle detection result; and judging that the pedestrian detection result is detected wrongly when the second error value is larger than a second threshold value, and otherwise, receiving the pedestrian detection result.
7. The improved ACF-based vehicle and pedestrian multi-class detection method as claimed in claim 6, wherein the first regression model is:
H1=f(Y1),
wherein H1Indicating the height of the calibration frame of the vehicle, Y1The position coordinates of the lower edge of the vehicle calibration frame are represented;
the first error value is calculated by:
wherein E is1Represents a first error value, h1Denotes the height, h 'of the actual vehicle calibration frame'1Indicating the height of the predicted vehicle calibration box and abs indicates the absolute value.
8. The improved ACF-based vehicle and pedestrian multi-category detection method of claim 6, wherein the second regression model is:
H2=f(Y2),
wherein H2Indicating the height of the pedestrian calibration frame, Y2Representing the position coordinates of the lower edge of the pedestrian calibration frame;
the second error value is calculated by:
wherein E is2Represents a second error value, h2Denotes the height, h 'of the actual vehicle calibration frame'2Indicating the height of the predicted vehicle calibration box and abs indicates the absolute value.
9. A vehicle pedestrian multi-category detection device based on an improved ACF, comprising: a processor and a memory;
the memory has stored thereon a computer readable program executable by the processor;
the processor, when executing the computer readable program, implements the steps in the improved ACF-based vehicle pedestrian multi-class detection method according to any one of claims 1 to 8.
10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which are executable by one or more processors to implement the steps in the improved ACF-based vehicle pedestrian multi-class detection method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011034733.9A CN112215103B (en) | 2020-09-27 | 2020-09-27 | Vehicle pedestrian multi-category detection method and device based on improved ACF |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011034733.9A CN112215103B (en) | 2020-09-27 | 2020-09-27 | Vehicle pedestrian multi-category detection method and device based on improved ACF |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112215103A true CN112215103A (en) | 2021-01-12 |
CN112215103B CN112215103B (en) | 2024-02-23 |
Family
ID=74050818
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011034733.9A Active CN112215103B (en) | 2020-09-27 | 2020-09-27 | Vehicle pedestrian multi-category detection method and device based on improved ACF |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112215103B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105760858A (en) * | 2016-03-21 | 2016-07-13 | 东南大学 | Pedestrian detection method and apparatus based on Haar-like intermediate layer filtering features |
CN107657225A (en) * | 2017-09-22 | 2018-02-02 | 电子科技大学 | A kind of pedestrian detection method based on converging channels feature |
CN108376235A (en) * | 2018-01-15 | 2018-08-07 | 深圳市易成自动驾驶技术有限公司 | Image detecting method, device and computer readable storage medium |
CN109190456A (en) * | 2018-07-19 | 2019-01-11 | 中国人民解放军战略支援部队信息工程大学 | Pedestrian detection method is overlooked based on the multiple features fusion of converging channels feature and gray level co-occurrence matrixes |
CN110109455A (en) * | 2019-04-24 | 2019-08-09 | 安徽大学 | A kind of Target Tracking System based on ACF converging channels feature |
-
2020
- 2020-09-27 CN CN202011034733.9A patent/CN112215103B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105760858A (en) * | 2016-03-21 | 2016-07-13 | 东南大学 | Pedestrian detection method and apparatus based on Haar-like intermediate layer filtering features |
CN107657225A (en) * | 2017-09-22 | 2018-02-02 | 电子科技大学 | A kind of pedestrian detection method based on converging channels feature |
CN108376235A (en) * | 2018-01-15 | 2018-08-07 | 深圳市易成自动驾驶技术有限公司 | Image detecting method, device and computer readable storage medium |
CN109190456A (en) * | 2018-07-19 | 2019-01-11 | 中国人民解放军战略支援部队信息工程大学 | Pedestrian detection method is overlooked based on the multiple features fusion of converging channels feature and gray level co-occurrence matrixes |
CN110109455A (en) * | 2019-04-24 | 2019-08-09 | 安徽大学 | A kind of Target Tracking System based on ACF converging channels feature |
Non-Patent Citations (1)
Title |
---|
陆泽早 等: "使用聚合通道特征的嵌入式实时人体头肩检测", 中国图象图形学报, 30 April 2019 (2019-04-30) * |
Also Published As
Publication number | Publication date |
---|---|
CN112215103B (en) | 2024-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109740478B (en) | Vehicle detection and identification method, device, computer equipment and readable storage medium | |
EP4148622A1 (en) | Neural network training method, image classification system, and related device | |
US11151363B2 (en) | Expression recognition method, apparatus, electronic device, and storage medium | |
Min et al. | Traffic sign recognition based on semantic scene understanding and structural traffic sign location | |
WO2020173022A1 (en) | Vehicle violation identifying method, server and storage medium | |
Yuan et al. | Robust traffic sign recognition based on color global and local oriented edge magnitude patterns | |
Xu et al. | Detection of sudden pedestrian crossings for driving assistance systems | |
US8750573B2 (en) | Hand gesture detection | |
US8792722B2 (en) | Hand gesture detection | |
CN109190444B (en) | Method for realizing video-based toll lane vehicle feature recognition system | |
CN104091147B (en) | A kind of near-infrared eyes positioning and eye state identification method | |
Huang et al. | Vehicle detection and inter-vehicle distance estimation using single-lens video camera on urban/suburb roads | |
US10445602B2 (en) | Apparatus and method for recognizing traffic signs | |
CN110020592A (en) | Object detection model training method, device, computer equipment and storage medium | |
CN111178245A (en) | Lane line detection method, lane line detection device, computer device, and storage medium | |
US11380010B2 (en) | Image processing device, image processing method, and image processing program | |
US10255511B2 (en) | Real time traffic sign recognition | |
Sugiharto et al. | Traffic sign detection based on HOG and PHOG using binary SVM and k-NN | |
Hua et al. | Pedestrian-and vehicle-detection algorithm based on improved aggregated channel features | |
CN112381870A (en) | Ship identification and navigational speed measurement system and method based on binocular vision | |
CN111950546B (en) | License plate recognition method and device, computer equipment and storage medium | |
Sun et al. | Vehicle Type Recognition Combining Global and Local Features via Two‐Stage Classification | |
CN112541394A (en) | Black eye and rhinitis identification method, system and computer medium | |
Liu et al. | An efficient real-time speed limit signs recognition based on rotation invariant feature | |
CN109726621B (en) | Pedestrian detection method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |