CN115565207A - Occlusion scene downlink person detection method with feature simulation fused - Google Patents

Occlusion scene downlink person detection method with feature simulation fused Download PDF

Info

Publication number
CN115565207A
CN115565207A CN202211510002.6A CN202211510002A CN115565207A CN 115565207 A CN115565207 A CN 115565207A CN 202211510002 A CN202211510002 A CN 202211510002A CN 115565207 A CN115565207 A CN 115565207A
Authority
CN
China
Prior art keywords
pedestrian
feature
thermodynamic diagram
detection
occlusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211510002.6A
Other languages
Chinese (zh)
Other versions
CN115565207B (en
Inventor
韩守东
潘孝枫
丁绘霖
刘东海生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Tuke Intelligent Information Technology Co ltd
Original Assignee
Wuhan Tuke Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Tuke Intelligent Technology Co ltd filed Critical Wuhan Tuke Intelligent Technology Co ltd
Priority to CN202211510002.6A priority Critical patent/CN115565207B/en
Publication of CN115565207A publication Critical patent/CN115565207A/en
Application granted granted Critical
Publication of CN115565207B publication Critical patent/CN115565207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention relates to a pedestrian detection method in an occlusion scene fused with feature simulation. And in the training stage, the pedestrian features are extracted by using the feature extraction network, and are classified according to the labeling information. And for pedestrian features of different classifications, learning the feature simulation strategy through different branches respectively. In the reasoning stage, the features extracted through the backbone network pass through two parallel feature simulation branches to obtain a central point map with different response results, and a more representative central point response map is obtained through an effective fusion strategy. The blocking attribute of the detection frame is designed to solve the problem of missing detection of pedestrians in the dense area, the non-maximum value suppression method of blocking perception is designed, redundant pedestrian detection frames can be deleted in the post-processing stage, and the blocked pedestrian detection frames are reserved. The pedestrian detection performance under the shielding scene is effectively improved.

Description

Occlusion scene downlink person detection method with feature simulation fused
Technical Field
The invention relates to the field of pedestrian target detection research in image processing and machine vision, in particular to a pedestrian detection method in an occlusion scene by fusing feature simulation.
Background
Pedestrian detection in an occlusion scene is an important research subject in the field of computer vision application, and serves as an important upstream task to provide important clues for other downstream tasks such as pedestrian tracking, pedestrian re-identification, automatic driving and the like. Therefore, the pedestrian detection algorithm suitable for various complex scenes has important significance for improving the performance of downstream tasks.
The existing pedestrian detection method comprises a traditional machine vision method based on texture features and the like and a feature extraction method based on deep learning. The method is limited by the limitation of the related method on the appearance characteristics, and the pedestrian detection algorithm in the existing occlusion scene has poor performance on the complex occlusion scene.
Under the complex scene, the occlusion of the pedestrian comprises the intra-class occlusion between the pedestrian and the inter-class occlusion between the pedestrian and other surrounding objects. The apparent features of pedestrians caused by shielding are reduced, so that the shielded pedestrians and the background can not be well distinguished by the detector, and higher missing detection is caused.
Disclosure of Invention
The invention provides a pedestrian detection method in an occlusion scene with fusion of feature simulation aiming at the technical problems in the prior art, which reduces the feature difference in pedestrians and increases the difference between the pedestrian and the background feature by a feature simulation learning mode so as to improve the detection rate of the pedestrian in the occlusion scene. Meanwhile, the shielding attribute is designed as additional semantic information. And designing a non-maximum suppression algorithm of the occlusion perception, not only considering the prediction attribute of the pedestrian detection frame, but also considering the occlusion attribute of the pedestrian detection frame, and effectively reserving the detection frame with a low confidence score caused by occlusion while suppressing the redundant detection frame.
According to a first aspect of the invention, a method for detecting pedestrians in an occlusion scene simulated by fused features is provided, which includes: step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image obtained through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid response thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a whole-body-visible feature mimicking learning centroid response thermodynamic diagram;
step 2, acquiring high-level features of the image to be detected through the backbone network, and inputting the high-level features into the feature simulation learning network to obtain the first central point response thermodynamic diagram and the second central point response thermodynamic diagram;
step 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the feature simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram;
and 4, considering the shielding attribute and the classification confidence of the detection frame, and performing non-maximum suppression on the third central point response thermodynamic diagram by using shielding perception to obtain the detection result of the image to be detected.
On the basis of the technical scheme, the invention can be improved as follows.
Optionally, the training process of the feature simulation learning network includes:
101, acquiring high-level features of a training image, a visible part detection frame and a whole body part detection frame of a target pedestrian;
102, extracting the high-level features by adopting RoI-Align to obtain the pedestrian whole body part features and the pedestrian visible part features according to the marking information of the visible part and the whole body part of the pedestrian; calculating visibility according to the ratio of the areas of a visible part detection frame and a whole-body part detection frame of a pedestrian, and classifying the whole-body part characteristic of the pedestrian into a pedestrian-obstructing characteristic and a non-pedestrian-obstructing characteristic according to the visibility;
step 103, inputting the feature of the blocked pedestrian and the feature of the non-blocked pedestrian into a blocking-non-blocking feature simulation module for learning, and enabling the feature of the blocked pedestrian to learn and simulate the feature representation of the feature of the non-blocked pedestrian to obtain the first central point response thermodynamic diagram; inputting the whole-body pedestrian feature and the visible part pedestrian feature into a whole-body-visible feature simulation module for learning, and enabling the whole-body pedestrian feature to learn the feature representation of the visible part pedestrian feature to obtain the second central point response thermodynamic diagram;
and 104, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram.
Optionally, the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature includes:
calculating the visibility of pedestrians
Figure 534700DEST_PATH_IMAGE001
Figure 358299DEST_PATH_IMAGE002
Wherein the content of the first and second substances,
Figure 621922DEST_PATH_IMAGE003
is the area of the frame visible to the pedestrian,
Figure 50629DEST_PATH_IMAGE004
is the area of the whole body frame of the pedestrian;
classifying the pedestrian whole body part feature into the pedestrian-obstructing feature and a non-pedestrian-obstructing feature according to the visibility of the pedestrian:
Figure 233348DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 435922DEST_PATH_IMAGE006
the ith occluded pedestrian feature is represented,
Figure 34394DEST_PATH_IMAGE007
to represent
Figure 848766DEST_PATH_IMAGE008
A set of occluded pedestrian features;
Figure 592600DEST_PATH_IMAGE009
representing the ith non-occluding pedestrian feature,
Figure 390791DEST_PATH_IMAGE010
to represent
Figure 261796DEST_PATH_IMAGE011
A set of non-occluded pedestrian features.
Optionally, the process of training the occlusion-non-occlusion feature simulation module and the whole-body-visual feature simulation module in step 103 includes:
dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing simulation; the pedestrian features include: a pedestrian whole body part feature, a pedestrian visible part feature, a pedestrian obstructing feature and a non-pedestrian obstructing feature;
extracting the characteristic of each pedestrian to a fixed size by adopting RoI-Align, calculating the mean value of the simulated characteristic on each channel, and using the characteristic mean value as a simulated object;
by the Smooth-L1 function
Figure 196253DEST_PATH_IMAGE012
To mimic constraints for each feature that needs to be mimicked:
Figure 603665DEST_PATH_IMAGE013
wherein the content of the first and second substances,
Figure 889152DEST_PATH_IMAGE014
indicating the jth emulated feature,
Figure 767110DEST_PATH_IMAGE015
the ensemble represents the mean of the N simulated features,
Figure 274183DEST_PATH_IMAGE016
the ith feature to be simulated is shown, and M is the number of features to be simulated.
Optionally, the fusion policy of the central point response thermodynamic diagram in step 3 and step 104 is:
Figure 641711DEST_PATH_IMAGE017
;
wherein the content of the first and second substances,
Figure 352178DEST_PATH_IMAGE018
representing a first center point response thermodynamic diagram,
Figure 50138DEST_PATH_IMAGE019
representing a second center point response thermodynamic diagram,
Figure 224767DEST_PATH_IMAGE020
a third center point response thermodynamic diagram is represented,
Figure 966458DEST_PATH_IMAGE021
optionally, the features model a loss function of the learning network
Figure 226538DEST_PATH_IMAGE022
Comprises the following steps:
Figure 226724DEST_PATH_IMAGE023
Figure 255860DEST_PATH_IMAGE024
and
Figure 168452DEST_PATH_IMAGE025
a loss function for the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram respectively,
Figure 804576DEST_PATH_IMAGE026
learning constraint penalty functions for the feature emulation of the occlusion-nonocclusion feature emulation module,
Figure 218240DEST_PATH_IMAGE027
learning a constraint loss function for the feature emulation of the whole-body-partial occlusion feature emulation module;
Figure 977249DEST_PATH_IMAGE028
is the equilibrium coefficient;
Figure 450956DEST_PATH_IMAGE029
wherein Lm is a loss calculation function; f denotes a set of pedestrian whole-body features, and V denotes a set of pedestrian visible features.
Optionally, the method for suppressing the non-maximum value by using the occlusion sensing in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:
step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402;
step 402, calculating the shielding attribute difference of the two intersected detection frames; when the difference of the shielding attributes exceeds a set threshold value, the two intersected detection frames are reserved; when the difference of the shielding attributes does not exceed a set threshold value, deleting one of the two intersected detection frames;
the shielding attribute is the ratio of the length of the visible part of each detection frame of the detection frame to the length of the detection frame.
Optionally, the occlusion attribute of the detection frame is:
O = {o i | i = 1, 2, 3, 4}
wherein the content of the first and second substances,o 1 , o 2 , o 3 , o 4 respectively representing the visible length ratios of the upper frame, the right frame, the lower frame and the left frame;Oan occlusion property vector representing a detection box.
Optionally, the step 4 includes:
step 401', initialize the sequence of test frames
Figure 810262DEST_PATH_IMAGE030
And corresponding confidence score sequences
Figure 168562DEST_PATH_IMAGE031
Wherein, in the process,
Figure 172290DEST_PATH_IMAGE032
it indicates the (i) th detection box,
Figure 177417DEST_PATH_IMAGE033
is that
Figure 368227DEST_PATH_IMAGE032
A confidence score of (d);
step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box with the highest current confidence score
Figure 999060DEST_PATH_IMAGE034
Taking out M from the detection frame sequence B and putting the M into a set F;
step 403', judge
Figure 778666DEST_PATH_IMAGE035
When it is used, order
Figure 594175DEST_PATH_IMAGE036
Figure 678806DEST_PATH_IMAGE037
Wherein IoU is the cross-over ratio calculation function,
Figure 601412DEST_PATH_IMAGE038
is a set intersection ratio threshold;
Figure 579732DEST_PATH_IMAGE039
is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame, a lower frame or a left frame;
Figure 441509DEST_PATH_IMAGE040
and
Figure 872490DEST_PATH_IMAGE041
respectively a detection frame M and a detection frame
Figure 173022DEST_PATH_IMAGE032
The occlusion property of the jth detection frame;
and step 404', circularly executing the step 402' -the step 403' until the sequence B is empty, and returning final sets F and S as a final detection frame sequence and a corresponding confidence score sequence respectively.
The invention provides a method for detecting pedestrians descending in an occlusion scene with feature simulation fused, which comprises the following steps of firstly, providing feature simulation reduction intra-class feature differences, and improving inter-class feature differences between pedestrians and background classes; secondly, a fusion characteristic imitation learning strategy is provided, difference complementation is realized, and the detection rate of the sheltered scene is improved; and thirdly, constructing an occlusion attribute, proposing occlusion perception non-maximum value inhibition, and effectively reserving a detection frame inhibited due to occlusion. By innovatively fusing the method, the pedestrian detection method under the occlusion scene simulated by the fusion characteristics is constructed and used for improving the pedestrian detection performance under the occlusion scene.
Drawings
Fig. 1 is a structural diagram of a pedestrian detection method under an occlusion scene simulated by fused features according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of feature modeling learning provided by an embodiment of the present invention;
fig. 3 is a code program diagram of an occlusion aware non-maximum suppression algorithm according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a structural diagram of a method for detecting a pedestrian under an occlusion scene simulated by fused features according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image obtained through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a full body-visible feature mimicking learning centroid response thermodynamic diagram.
And 2, acquiring high-level features of the image to be detected through a backbone network, and inputting the high-level features into a feature simulation learning network to obtain a first central point response thermodynamic diagram and a second central point response thermodynamic diagram.
And 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the characteristic simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram for subsequent post-processing.
And 4, considering the shielding attribute and the classification confidence of the detection frame, and using the non-maximum value of shielding perception to suppress the third central point response thermodynamic diagram, so as to realize post-processing of the prediction result and obtain the detection result of the image to be detected.
According to the pedestrian detection method in the occlusion scene fused with the feature simulation, provided by the embodiment of the invention, the feature difference in pedestrians is reduced in a feature simulation learning mode, and meanwhile, the difference between the pedestrian and the background feature is increased, so that the detection rate of the pedestrian in the occlusion scene is improved. Meanwhile, the shielding attribute is designed as extra semantic information. And designing a non-maximum suppression algorithm of the occlusion perception, not only considering the prediction attribute of the pedestrian detection frame, but also considering the occlusion attribute of the pedestrian detection frame, and effectively reserving the detection frame with a low confidence score caused by occlusion while suppressing the redundant detection frame.
Example 1
Embodiment 1 provided by the present invention is an embodiment of a method for detecting a pedestrian under an occlusion scene simulated by fused features, and as can be seen from fig. 1, the embodiment of the method for detecting a pedestrian under an occlusion scene includes:
step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image obtained through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid response thermodynamic diagram mimics the learning centroid thermodynamic diagram for the occluding-non-occluding feature and the second centroid response thermodynamic diagram mimics the learning centroid thermodynamic diagram for the full body-visible feature.
In one possible embodiment, the training process of the feature simulation learning network includes:
step 101, acquiring high-level features of a training image, and a visible part detection frame and a whole body part detection frame of a target pedestrian.
102, extracting high-level features by adopting RoI-Align to obtain pedestrian whole body part features and pedestrian visible part features according to marking information of a visible part and a whole body part of a pedestrian; and calculating visibility according to the ratio of the areas of the visible part detection frame and the whole-body part detection frame of the pedestrian, and classifying the whole-body part characteristic of the pedestrian into a pedestrian shielding characteristic and a non-pedestrian shielding characteristic according to the visibility.
In one possible embodiment, the step 102 of classifying the pedestrian full-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature comprises:
calculating the visibility of pedestrians
Figure 864903DEST_PATH_IMAGE001
Figure 22215DEST_PATH_IMAGE002
Wherein, the first and the second end of the pipe are connected with each other,
Figure 815859DEST_PATH_IMAGE003
is the area of the frame visible to the pedestrian,
Figure 451239DEST_PATH_IMAGE004
is the area of the whole body frame of the pedestrian.
Classifying the pedestrian whole body part features into blocking pedestrian features and non-blocking pedestrian features according to the visibility of the pedestrian:
Figure 764671DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 92884DEST_PATH_IMAGE006
indicating the ith occluding pedestrian feature,
Figure 373824DEST_PATH_IMAGE007
represent
Figure 812896DEST_PATH_IMAGE008
A set of occluded pedestrian features;
Figure 479369DEST_PATH_IMAGE009
representing the ith non-occluding pedestrian feature,
Figure 978484DEST_PATH_IMAGE010
to represent
Figure 481140DEST_PATH_IMAGE011
A set of non-occluded pedestrian features.
103, inputting the characteristic of the blocked pedestrian and the characteristic of the non-blocked pedestrian into a blocking-non-blocking characteristic simulation module for learning, and enabling the characteristic of the blocked pedestrian to learn the characteristic representation simulating the characteristic of the non-blocked pedestrian to obtain a first central point response thermodynamic diagram; inputting the whole-body characteristic of the pedestrian and the visible part characteristic of the pedestrian into a whole-body-visible characteristic simulation module for learning, and enabling the whole-body characteristic of the pedestrian to learn the characteristic representation of the visible part characteristic of the pedestrian to obtain a second central point response thermodynamic diagram.
Referring to fig. 2, which is a schematic diagram of the learning of feature simulation provided in the embodiment of the present invention, in a possible embodiment manner, with reference to fig. 1 and fig. 2, the process of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module in step 103 includes:
dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing simulation; the features of the pedestrian include: a pedestrian whole body part feature, a pedestrian visible part feature, an occluded pedestrian feature, and a non-occluded pedestrian feature.
The features of each pedestrian were first extracted to a fixed size (7 × 256) using RoI-Align, and then the mean of the simulated features on each lane was calculated, using the feature mean as the object of the simulation.
By the Smooth-L1 function
Figure 723903DEST_PATH_IMAGE012
To mimic constraints for each feature that needs to be mimicked:
Figure 946680DEST_PATH_IMAGE013
wherein the content of the first and second substances,
Figure 272488DEST_PATH_IMAGE014
indicating the jth emulated feature,
Figure 590337DEST_PATH_IMAGE015
the ensemble represents the mean of the N simulated features,
Figure 371212DEST_PATH_IMAGE016
the ith feature to be simulated is shown, and M is the number of features to be simulated.
Two different occlusion emulation strategies are proposed: an occlusion-non-occlusion feature mimic learning module and a whole-body-visual feature mimic learning module.
And 104, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram in a weighted fusion mode, and activating sigmod to obtain a third central point response thermodynamic diagram.
In one possible embodiment, the fusion strategy of the center point response thermodynamic diagrams in step 3 and step 104 is:
Figure 763010DEST_PATH_IMAGE017
;
wherein the content of the first and second substances,
Figure 603927DEST_PATH_IMAGE018
representing a first center point response thermodynamic diagram,
Figure 831908DEST_PATH_IMAGE019
representing a second center point response thermodynamic diagram,
Figure 416473DEST_PATH_IMAGE020
a third center point response thermodynamic diagram is represented,
Figure 459516DEST_PATH_IMAGE021
obtained by experiments.
In one possible embodiment, the features model a loss function of the learning network
Figure 674596DEST_PATH_IMAGE022
Comprises the following steps:
Figure 622829DEST_PATH_IMAGE023
Figure 276665DEST_PATH_IMAGE024
and
Figure 377476DEST_PATH_IMAGE025
a loss function for the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram respectively,
Figure 560196DEST_PATH_IMAGE026
learning constraint penalty functions for the feature emulation of the occlusion-nonocclusion feature emulation module,
Figure 339933DEST_PATH_IMAGE027
learning a constraint loss function for the feature emulation of the whole-body-partial occlusion feature emulation module;
Figure 907048DEST_PATH_IMAGE028
is a balance coefficient, set by experiment
Figure 986999DEST_PATH_IMAGE042
Figure 747145DEST_PATH_IMAGE029
Wherein Lm is a loss calculation function;
Figure 279757DEST_PATH_IMAGE043
to represent
Figure 868870DEST_PATH_IMAGE044
A set of full-body features of an individual pedestrian,
Figure 741011DEST_PATH_IMAGE045
to represent
Figure 265534DEST_PATH_IMAGE044
One rowA set of human visual features.
And 2, acquiring high-level features of the image to be detected through a backbone network, inputting the high-level features into a feature simulation learning network, and obtaining a first central point response thermodynamic diagram and a second central point response thermodynamic diagram.
And 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the characteristic simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram for subsequent post-processing.
And 4, considering the shielding attribute and the classification confidence of the detection frame, and using the non-maximum value of shielding perception to suppress the third central point response thermodynamic diagram, so as to realize post-processing of the prediction result and obtain the detection result of the image to be detected.
In a possible embodiment, in the post-processing stage, the method for suppressing non-maximum using occlusion perception in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:
step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402.
Step 402, calculating the shielding attribute difference of the two intersected detection frames; when the difference of the shielding attributes exceeds a set threshold value, the shielding attributes are different detection frames and need to be reserved; when the difference of the shielding attributes does not exceed the set threshold, the shielding attributes are indicated to be a redundant detection frame, and the suppression deletion is required.
The occlusion attribute is a ratio of the length of the visible part of each detection frame of the detection frame to the length of the detection frame.
In a scene of a vehicle-mounted camera, in a process from an initial position to infinity, a target at infinity is reduced to the middle of an image, and the vertical coordinate of the position of a lower detection frame is gradually reduced according to the depth of field of the target in the image. According to this phenomenon, for detection frames having an intersection relationship with each other, the occlusion relationship between pedestrians is determined from the ordinate value of the lower boundary of the detection frame, and the occlusion attribute of the detection frame is defined based on the occlusion relationship.
It can be understood that the occlusion property of the detection frame is:
O = {o i | i = 1, 2, 3, 4}
wherein, the first and the second end of the pipe are connected with each other,o 1 , o 2 , o 3 , o 4 respectively representing the visible length ratios of the upper frame, the right frame, the lower frame and the left frame;Oan occlusion property vector representing a detection box. The shielding attributes of the four detection frames form the shielding attribute of the whole detection frame.
Fig. 3 is a code program diagram of an occlusion perception non-maximum suppression algorithm according to an embodiment of the present invention, and as can be seen from fig. 1 and fig. 3, in another possible embodiment, step 4 includes:
step 401', initialize the sequence of test frames
Figure 911541DEST_PATH_IMAGE030
And corresponding confidence score sequences
Figure 117394DEST_PATH_IMAGE031
Wherein, in the step (A),
Figure 171938DEST_PATH_IMAGE032
it indicates the (i) th detection box,
Figure 991995DEST_PATH_IMAGE033
is that
Figure 968041DEST_PATH_IMAGE032
The confidence score of (c).
Step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box with the highest current confidence score
Figure 39903DEST_PATH_IMAGE034
And taking M out of the detection frame sequence B and putting the M into a set F.
Step 403', judge
Figure 824319DEST_PATH_IMAGE035
When it is used, order
Figure 221802DEST_PATH_IMAGE036
Figure 839472DEST_PATH_IMAGE037
Wherein IoU is the cross-over ratio calculation function,
Figure 715024DEST_PATH_IMAGE038
is a set intersection ratio threshold;
Figure 947422DEST_PATH_IMAGE039
is an occlusion attribute difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame, a lower frame or a left frame;
Figure 656752DEST_PATH_IMAGE040
and
Figure 607391DEST_PATH_IMAGE041
respectively a detection frame M and a detection frame
Figure 145689DEST_PATH_IMAGE032
The occlusion property of the jth detection frame.
Step 404', circularly executing step 402' -step 403' until the sequence B is empty, returning the final sets F and S as the final detection frame sequence and the corresponding confidence score sequence respectively
Based on the defects in the background art, the embodiment of the invention provides a method for detecting pedestrians descending in an occlusion scene by fusing feature simulation, and 1, the feature simulation is innovatively used for reducing the difference among the characteristics of the pedestrians, an effective thermodynamic diagram fusion strategy is provided by combining a model, and the detection rate of the pedestrians in the occlusion scene is effectively improved. 2. The pedestrian shielding attribute is constructed by utilizing the existing information and can be used as semantic information related to other visual tasks. 3. A blocking perception non-maximum value suppression algorithm is designed, redundant detection frames can be deleted, and meanwhile blocked pedestrian detection frames are reserved.
It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. A method for detecting pedestrians in an occlusion scene by fusing feature simulation is characterized by comprising the following steps:
step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image acquired through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid response thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a whole-body-visible feature mimicking learning centroid response thermodynamic diagram;
step 2, acquiring high-level features of an image to be detected through the backbone network, and inputting the high-level features into the feature simulation learning network to obtain the first central point response thermodynamic diagram and the second central point response thermodynamic diagram;
step 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the feature simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram;
and 4, considering the shielding attribute and the classification confidence of the detection frame, and performing non-maximum suppression on the third central point response thermodynamic diagram by using shielding perception to obtain the detection result of the image to be detected.
2. The detection method according to claim 1, wherein the training process of the feature emulation learning network comprises:
101, acquiring high-level features of a training image, a visible part detection frame and a whole body part detection frame of a target pedestrian;
102, extracting the high-level features by adopting RoI-Align to obtain the pedestrian whole body part features and the pedestrian visible part features according to the marking information of the visible part and the whole body part of the pedestrian; calculating visibility according to the ratio of the areas of a visible part detection frame and a whole body part detection frame of the pedestrian, and classifying the whole body part characteristic of the pedestrian into a pedestrian-blocking characteristic and a non-pedestrian-blocking characteristic according to the visibility;
step 103, inputting the feature of the blocked pedestrian and the feature of the non-blocked pedestrian into a blocking-non-blocking feature simulation module for learning, and enabling the feature of the blocked pedestrian to learn and simulate the feature representation of the feature of the non-blocked pedestrian to obtain the first central point response thermodynamic diagram; inputting the whole-body pedestrian feature and the visible part pedestrian feature into a whole-body-visible feature simulation module for learning, and enabling the whole-body pedestrian feature to learn the feature representation of the visible part pedestrian feature to obtain the second central point response thermodynamic diagram;
and 104, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram.
3. The detection method according to claim 2, wherein the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature comprises:
calculating the visibility of pedestrians
Figure 195315DEST_PATH_IMAGE001
Figure 680654DEST_PATH_IMAGE002
Wherein the content of the first and second substances,
Figure 100003DEST_PATH_IMAGE003
is the area of the frame visible to the pedestrian,
Figure 402808DEST_PATH_IMAGE004
is the area of the whole body frame of the pedestrian;
classifying the pedestrian whole-body part features into the pedestrian-obstructing feature and the non-pedestrian-obstructing feature according to the visibility of the pedestrian:
Figure 759971DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 173635DEST_PATH_IMAGE006
indicating the ith occluding pedestrian feature,
Figure 683376DEST_PATH_IMAGE007
to represent
Figure 157083DEST_PATH_IMAGE008
A set of occluded pedestrian features;
Figure 1542DEST_PATH_IMAGE009
representing the ith non-occluding pedestrian feature,
Figure 953317DEST_PATH_IMAGE010
represent
Figure 81679DEST_PATH_IMAGE011
A set of non-occluded pedestrian features.
4. The detection method according to claim 2, wherein the step 103 of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module comprises:
dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing to be simulated; the pedestrian features include: a pedestrian whole body part feature, a pedestrian visible part feature, a pedestrian obstructing feature and a non-pedestrian obstructing feature;
extracting the characteristic of each pedestrian to a fixed size by adopting RoI-Align, calculating the mean value of the simulated characteristic on each channel, and using the characteristic mean value as a simulated object;
by the Smooth-L1 function
Figure 726287DEST_PATH_IMAGE012
To mimic constraints for each feature that needs to be mimicked:
Figure 58043DEST_PATH_IMAGE013
wherein the content of the first and second substances,
Figure 813509DEST_PATH_IMAGE014
indicating the jth emulated feature,
Figure 294913DEST_PATH_IMAGE015
the ensemble represents the mean of the N simulated features,
Figure 110422DEST_PATH_IMAGE016
indicating the ith feature to be imitatedAnd M is the number of features to be emulated.
5. The method of claim 2, wherein the fusion strategy of the center point response thermodynamic diagram in step 3 and step 104 is:
Figure 929473DEST_PATH_IMAGE017
;
wherein the content of the first and second substances,
Figure 488631DEST_PATH_IMAGE018
representing a first center point response thermodynamic diagram,
Figure 326005DEST_PATH_IMAGE019
representing a second center point response thermodynamic diagram,
Figure 46837DEST_PATH_IMAGE020
a third center point response thermodynamic diagram is represented,
Figure 353184DEST_PATH_IMAGE021
6. the detection method according to claim 2, wherein the features model a loss function of a learning network
Figure 450453DEST_PATH_IMAGE022
Comprises the following steps:
Figure 909379DEST_PATH_IMAGE023
Figure 801111DEST_PATH_IMAGE024
and
Figure 594755DEST_PATH_IMAGE025
a loss function for the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram respectively,
Figure 495715DEST_PATH_IMAGE026
learning a constraint penalty function for feature modeling of the occlusion-non-occlusion feature modeling module,
Figure 42103DEST_PATH_IMAGE027
learning a constraint loss function for the feature emulation of the whole-body-partial occlusion feature emulation module;
Figure 370316DEST_PATH_IMAGE028
is the equilibrium coefficient;
Figure 385676DEST_PATH_IMAGE029
wherein Lm is a loss calculation function; f denotes a set of pedestrian whole-body features, and V denotes a set of pedestrian visible features.
7. The detection method according to claim 1, wherein the method of using non-maximum suppression of occlusion perception in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:
step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402;
step 402, calculating the shielding attribute difference of the two intersected detection frames; when the difference of the shielding attributes exceeds a set threshold value, the two intersected detection frames are reserved; when the difference of the shielding attributes does not exceed a set threshold value, deleting one of the two intersected detection frames;
the shielding attribute is the ratio of the length of the visible part of each detection frame of the detection frame to the length of the detection frame.
8. The detection method according to claim 7, wherein the occlusion property of the detection frame is:
O = {o i | i = 1, 2, 3, 4}
wherein the content of the first and second substances,o 1 , o 2 , o 3 , o 4 respectively representing the visible length ratios of the upper frame, the right frame, the lower frame and the left frame;Oan occlusion property vector representing a detection box.
9. The detection method according to claim 1, wherein the step 4 comprises:
step 401', initialize the sequence of test frames
Figure 90327DEST_PATH_IMAGE030
And corresponding confidence score sequences
Figure 249477DEST_PATH_IMAGE031
Wherein, in the step (A),
Figure 748591DEST_PATH_IMAGE032
it indicates the (i) th detection box,
Figure 251248DEST_PATH_IMAGE033
is that
Figure 494010DEST_PATH_IMAGE032
A confidence score of (d);
step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box with the highest current confidence score
Figure 14990DEST_PATH_IMAGE034
Taking out M from the detection frame sequence B and putting the M into a set F;
step 403', judge
Figure 419427DEST_PATH_IMAGE035
When it is used, order
Figure 674959DEST_PATH_IMAGE036
Figure 190254DEST_PATH_IMAGE037
Wherein IoU is the cross-over ratio calculation function,
Figure 332784DEST_PATH_IMAGE038
is a set intersection ratio threshold;
Figure 173701DEST_PATH_IMAGE039
is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame, a lower frame or a left frame;
Figure 650950DEST_PATH_IMAGE040
and
Figure 235515DEST_PATH_IMAGE041
respectively a detection frame M and a detection frame
Figure 731088DEST_PATH_IMAGE032
The occlusion property of the jth detection frame;
and step 404', circularly executing the step 402' -the step 403' until the sequence B is empty, and returning final sets F and S as a final detection frame sequence and a corresponding confidence score sequence respectively.
CN202211510002.6A 2022-11-29 2022-11-29 Occlusion scene downlink person detection method with feature simulation fused Active CN115565207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211510002.6A CN115565207B (en) 2022-11-29 2022-11-29 Occlusion scene downlink person detection method with feature simulation fused

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211510002.6A CN115565207B (en) 2022-11-29 2022-11-29 Occlusion scene downlink person detection method with feature simulation fused

Publications (2)

Publication Number Publication Date
CN115565207A true CN115565207A (en) 2023-01-03
CN115565207B CN115565207B (en) 2023-04-07

Family

ID=84769737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211510002.6A Active CN115565207B (en) 2022-11-29 2022-11-29 Occlusion scene downlink person detection method with feature simulation fused

Country Status (1)

Country Link
CN (1) CN115565207B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115713731A (en) * 2023-01-10 2023-02-24 武汉图科智能科技有限公司 Crowd scene pedestrian detection model construction method and crowd scene pedestrian detection method
CN115937906A (en) * 2023-02-16 2023-04-07 武汉图科智能科技有限公司 Occlusion scene pedestrian re-identification method based on occlusion inhibition and feature reconstruction

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598601A (en) * 2019-08-30 2019-12-20 电子科技大学 Face 3D key point detection method and system based on distributed thermodynamic diagram
CN111738091A (en) * 2020-05-27 2020-10-02 复旦大学 Posture estimation and human body analysis system based on multi-task deep learning
CN112836676A (en) * 2021-03-01 2021-05-25 创新奇智(北京)科技有限公司 Abnormal behavior detection method and device, electronic equipment and storage medium
CN113191204A (en) * 2021-04-07 2021-07-30 华中科技大学 Multi-scale blocking pedestrian detection method and system
CN113239885A (en) * 2021-06-04 2021-08-10 新大陆数字技术股份有限公司 Face detection and recognition method and system
CN114419671A (en) * 2022-01-18 2022-04-29 北京工业大学 Hypergraph neural network-based occluded pedestrian re-identification method
CN114419568A (en) * 2022-01-18 2022-04-29 东北大学 Multi-view pedestrian detection method based on feature fusion
EP4002198A1 (en) * 2019-12-24 2022-05-25 Tencent Technology (Shenzhen) Company Limited Posture acquisition method and device, and key point coordinate positioning model training method and device
CN114639042A (en) * 2022-03-17 2022-06-17 哈尔滨理工大学 Video target detection algorithm based on improved CenterNet backbone network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598601A (en) * 2019-08-30 2019-12-20 电子科技大学 Face 3D key point detection method and system based on distributed thermodynamic diagram
EP4002198A1 (en) * 2019-12-24 2022-05-25 Tencent Technology (Shenzhen) Company Limited Posture acquisition method and device, and key point coordinate positioning model training method and device
CN111738091A (en) * 2020-05-27 2020-10-02 复旦大学 Posture estimation and human body analysis system based on multi-task deep learning
CN112836676A (en) * 2021-03-01 2021-05-25 创新奇智(北京)科技有限公司 Abnormal behavior detection method and device, electronic equipment and storage medium
CN113191204A (en) * 2021-04-07 2021-07-30 华中科技大学 Multi-scale blocking pedestrian detection method and system
CN113239885A (en) * 2021-06-04 2021-08-10 新大陆数字技术股份有限公司 Face detection and recognition method and system
CN114419671A (en) * 2022-01-18 2022-04-29 北京工业大学 Hypergraph neural network-based occluded pedestrian re-identification method
CN114419568A (en) * 2022-01-18 2022-04-29 东北大学 Multi-view pedestrian detection method based on feature fusion
CN114639042A (en) * 2022-03-17 2022-06-17 哈尔滨理工大学 Video target detection algorithm based on improved CenterNet backbone network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GENG CHEN 等: "Automatic Schelling Points Detection from Meshes", 《IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS ( EARLY ACCESS )》 *
朱肖磊 等: "基于可见区域的拥挤行人检测", 《电子测量技术》 *
李翔 等: "一种面向遮挡行人检测的改进YOLOv3算法", 《光学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115713731A (en) * 2023-01-10 2023-02-24 武汉图科智能科技有限公司 Crowd scene pedestrian detection model construction method and crowd scene pedestrian detection method
CN115937906A (en) * 2023-02-16 2023-04-07 武汉图科智能科技有限公司 Occlusion scene pedestrian re-identification method based on occlusion inhibition and feature reconstruction

Also Published As

Publication number Publication date
CN115565207B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN115565207B (en) Occlusion scene downlink person detection method with feature simulation fused
CN111489403B (en) Method and device for generating virtual feature map by using GAN
CN110210551A (en) A kind of visual target tracking method based on adaptive main body sensitivity
CN112966697B (en) Target detection method, device and equipment based on scene semantics and storage medium
CN108805016B (en) Head and shoulder area detection method and device
CN111709285A (en) Epidemic situation protection monitoring method and device based on unmanned aerial vehicle and storage medium
CN105144239A (en) Image processing device, program, and image processing method
JP2022174707A (en) Pedestrian re-identification system and method based on space sequence feature learning
CN108446694A (en) A kind of object detection method and device
CN107633242A (en) Training method, device, equipment and the storage medium of network model
CN105303163B (en) A kind of method and detection device of target detection
CN112200154A (en) Face recognition method and device for mask, electronic equipment and storage medium
CN109034086A (en) Vehicle recognition methods, apparatus and system again
EP3859673A1 (en) Model generation
CN111783716A (en) Pedestrian detection method, system and device based on attitude information
CN111797769A (en) Small target sensitive vehicle detection system
CN113033523B (en) Method and system for constructing falling judgment model and falling judgment method and system
CN113570615A (en) Image processing method based on deep learning, electronic equipment and storage medium
CN115713731B (en) Crowd scene pedestrian detection model construction method and crowd scene pedestrian detection method
CN114494893B (en) Remote sensing image feature extraction method based on semantic reuse context feature pyramid
CN111160219B (en) Object integrity evaluation method and device, electronic equipment and storage medium
CN114360026A (en) Natural occlusion expression recognition method and system with accurate attention
CN114359892A (en) Three-dimensional target detection method and device and computer readable storage medium
CN112733671A (en) Pedestrian detection method, device and readable storage medium
CN111274894A (en) Improved YOLOv 3-based method for detecting on-duty state of personnel

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: No. 548, 5th Floor, Building 10, No. 28 Linping Avenue, Donghu Street, Linping District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Tuke Intelligent Information Technology Co.,Ltd.

Address before: 430000 B033, No. 05, 4th floor, building 2, international enterprise center, No. 1, Guanggu Avenue, Donghu New Technology Development Zone, Wuhan, Hubei (Wuhan area of free trade zone)

Patentee before: Wuhan Tuke Intelligent Technology Co.,Ltd.

CP03 Change of name, title or address