CN115565207A - Occlusion scene downlink person detection method with feature simulation fused - Google Patents
Occlusion scene downlink person detection method with feature simulation fused Download PDFInfo
- Publication number
- CN115565207A CN115565207A CN202211510002.6A CN202211510002A CN115565207A CN 115565207 A CN115565207 A CN 115565207A CN 202211510002 A CN202211510002 A CN 202211510002A CN 115565207 A CN115565207 A CN 115565207A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- feature
- thermodynamic diagram
- detection
- occlusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 131
- 238000004088 simulation Methods 0.000 title claims abstract description 53
- 230000004044 response Effects 0.000 claims abstract description 84
- 238000000034 method Methods 0.000 claims abstract description 27
- 230000004927 fusion Effects 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 14
- 230000008447 perception Effects 0.000 claims abstract description 12
- 230000001629 suppression Effects 0.000 claims abstract description 10
- 230000000903 blocking effect Effects 0.000 claims abstract description 8
- 238000010586 diagram Methods 0.000 claims description 98
- 230000006870 function Effects 0.000 claims description 22
- 239000000126 substance Substances 0.000 claims description 13
- 208000006440 Open Bite Diseases 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 8
- 230000003213 activating effect Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000003278 mimic effect Effects 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 3
- 238000012805 post-processing Methods 0.000 abstract description 6
- 238000000605 extraction Methods 0.000 abstract description 2
- 238000002372 labelling Methods 0.000 abstract 1
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 7
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Human Computer Interaction (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a pedestrian detection method in an occlusion scene fused with feature simulation. And in the training stage, the pedestrian features are extracted by using the feature extraction network, and are classified according to the labeling information. And for pedestrian features of different classifications, learning the feature simulation strategy through different branches respectively. In the reasoning stage, the features extracted through the backbone network pass through two parallel feature simulation branches to obtain a central point map with different response results, and a more representative central point response map is obtained through an effective fusion strategy. The blocking attribute of the detection frame is designed to solve the problem of missing detection of pedestrians in the dense area, the non-maximum value suppression method of blocking perception is designed, redundant pedestrian detection frames can be deleted in the post-processing stage, and the blocked pedestrian detection frames are reserved. The pedestrian detection performance under the shielding scene is effectively improved.
Description
Technical Field
The invention relates to the field of pedestrian target detection research in image processing and machine vision, in particular to a pedestrian detection method in an occlusion scene by fusing feature simulation.
Background
Pedestrian detection in an occlusion scene is an important research subject in the field of computer vision application, and serves as an important upstream task to provide important clues for other downstream tasks such as pedestrian tracking, pedestrian re-identification, automatic driving and the like. Therefore, the pedestrian detection algorithm suitable for various complex scenes has important significance for improving the performance of downstream tasks.
The existing pedestrian detection method comprises a traditional machine vision method based on texture features and the like and a feature extraction method based on deep learning. The method is limited by the limitation of the related method on the appearance characteristics, and the pedestrian detection algorithm in the existing occlusion scene has poor performance on the complex occlusion scene.
Under the complex scene, the occlusion of the pedestrian comprises the intra-class occlusion between the pedestrian and the inter-class occlusion between the pedestrian and other surrounding objects. The apparent features of pedestrians caused by shielding are reduced, so that the shielded pedestrians and the background can not be well distinguished by the detector, and higher missing detection is caused.
Disclosure of Invention
The invention provides a pedestrian detection method in an occlusion scene with fusion of feature simulation aiming at the technical problems in the prior art, which reduces the feature difference in pedestrians and increases the difference between the pedestrian and the background feature by a feature simulation learning mode so as to improve the detection rate of the pedestrian in the occlusion scene. Meanwhile, the shielding attribute is designed as additional semantic information. And designing a non-maximum suppression algorithm of the occlusion perception, not only considering the prediction attribute of the pedestrian detection frame, but also considering the occlusion attribute of the pedestrian detection frame, and effectively reserving the detection frame with a low confidence score caused by occlusion while suppressing the redundant detection frame.
According to a first aspect of the invention, a method for detecting pedestrians in an occlusion scene simulated by fused features is provided, which includes: step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image obtained through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid response thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a whole-body-visible feature mimicking learning centroid response thermodynamic diagram;
step 2, acquiring high-level features of the image to be detected through the backbone network, and inputting the high-level features into the feature simulation learning network to obtain the first central point response thermodynamic diagram and the second central point response thermodynamic diagram;
step 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the feature simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram;
and 4, considering the shielding attribute and the classification confidence of the detection frame, and performing non-maximum suppression on the third central point response thermodynamic diagram by using shielding perception to obtain the detection result of the image to be detected.
On the basis of the technical scheme, the invention can be improved as follows.
Optionally, the training process of the feature simulation learning network includes:
101, acquiring high-level features of a training image, a visible part detection frame and a whole body part detection frame of a target pedestrian;
102, extracting the high-level features by adopting RoI-Align to obtain the pedestrian whole body part features and the pedestrian visible part features according to the marking information of the visible part and the whole body part of the pedestrian; calculating visibility according to the ratio of the areas of a visible part detection frame and a whole-body part detection frame of a pedestrian, and classifying the whole-body part characteristic of the pedestrian into a pedestrian-obstructing characteristic and a non-pedestrian-obstructing characteristic according to the visibility;
step 103, inputting the feature of the blocked pedestrian and the feature of the non-blocked pedestrian into a blocking-non-blocking feature simulation module for learning, and enabling the feature of the blocked pedestrian to learn and simulate the feature representation of the feature of the non-blocked pedestrian to obtain the first central point response thermodynamic diagram; inputting the whole-body pedestrian feature and the visible part pedestrian feature into a whole-body-visible feature simulation module for learning, and enabling the whole-body pedestrian feature to learn the feature representation of the visible part pedestrian feature to obtain the second central point response thermodynamic diagram;
and 104, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram.
Optionally, the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature includes:
Wherein the content of the first and second substances,is the area of the frame visible to the pedestrian,is the area of the whole body frame of the pedestrian;
classifying the pedestrian whole body part feature into the pedestrian-obstructing feature and a non-pedestrian-obstructing feature according to the visibility of the pedestrian:
wherein the content of the first and second substances,the ith occluded pedestrian feature is represented,to representA set of occluded pedestrian features;representing the ith non-occluding pedestrian feature,to representA set of non-occluded pedestrian features.
Optionally, the process of training the occlusion-non-occlusion feature simulation module and the whole-body-visual feature simulation module in step 103 includes:
dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing simulation; the pedestrian features include: a pedestrian whole body part feature, a pedestrian visible part feature, a pedestrian obstructing feature and a non-pedestrian obstructing feature;
extracting the characteristic of each pedestrian to a fixed size by adopting RoI-Align, calculating the mean value of the simulated characteristic on each channel, and using the characteristic mean value as a simulated object;
wherein the content of the first and second substances,indicating the jth emulated feature,the ensemble represents the mean of the N simulated features,the ith feature to be simulated is shown, and M is the number of features to be simulated.
Optionally, the fusion policy of the central point response thermodynamic diagram in step 3 and step 104 is:
wherein the content of the first and second substances,representing a first center point response thermodynamic diagram,representing a second center point response thermodynamic diagram,a third center point response thermodynamic diagram is represented,。
optionally, the features model a loss function of the learning networkComprises the following steps:
anda loss function for the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram respectively,learning constraint penalty functions for the feature emulation of the occlusion-nonocclusion feature emulation module,learning a constraint loss function for the feature emulation of the whole-body-partial occlusion feature emulation module;is the equilibrium coefficient;
wherein Lm is a loss calculation function; f denotes a set of pedestrian whole-body features, and V denotes a set of pedestrian visible features.
Optionally, the method for suppressing the non-maximum value by using the occlusion sensing in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:
step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402;
step 402, calculating the shielding attribute difference of the two intersected detection frames; when the difference of the shielding attributes exceeds a set threshold value, the two intersected detection frames are reserved; when the difference of the shielding attributes does not exceed a set threshold value, deleting one of the two intersected detection frames;
the shielding attribute is the ratio of the length of the visible part of each detection frame of the detection frame to the length of the detection frame.
Optionally, the occlusion attribute of the detection frame is:
O = {o i | i = 1, 2, 3, 4}
wherein the content of the first and second substances,o 1 , o 2 , o 3 , o 4 respectively representing the visible length ratios of the upper frame, the right frame, the lower frame and the left frame;Oan occlusion property vector representing a detection box.
Optionally, the step 4 includes:
step 401', initialize the sequence of test framesAnd corresponding confidence score sequencesWherein, in the process,it indicates the (i) th detection box,is thatA confidence score of (d);
step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box with the highest current confidence scoreTaking out M from the detection frame sequence B and putting the M into a set F;
is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame, a lower frame or a left frame;andrespectively a detection frame M and a detection frameThe occlusion property of the jth detection frame;
and step 404', circularly executing the step 402' -the step 403' until the sequence B is empty, and returning final sets F and S as a final detection frame sequence and a corresponding confidence score sequence respectively.
The invention provides a method for detecting pedestrians descending in an occlusion scene with feature simulation fused, which comprises the following steps of firstly, providing feature simulation reduction intra-class feature differences, and improving inter-class feature differences between pedestrians and background classes; secondly, a fusion characteristic imitation learning strategy is provided, difference complementation is realized, and the detection rate of the sheltered scene is improved; and thirdly, constructing an occlusion attribute, proposing occlusion perception non-maximum value inhibition, and effectively reserving a detection frame inhibited due to occlusion. By innovatively fusing the method, the pedestrian detection method under the occlusion scene simulated by the fusion characteristics is constructed and used for improving the pedestrian detection performance under the occlusion scene.
Drawings
Fig. 1 is a structural diagram of a pedestrian detection method under an occlusion scene simulated by fused features according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of feature modeling learning provided by an embodiment of the present invention;
fig. 3 is a code program diagram of an occlusion aware non-maximum suppression algorithm according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a structural diagram of a method for detecting a pedestrian under an occlusion scene simulated by fused features according to an embodiment of the present invention, as shown in fig. 1, the method includes:
And 2, acquiring high-level features of the image to be detected through a backbone network, and inputting the high-level features into a feature simulation learning network to obtain a first central point response thermodynamic diagram and a second central point response thermodynamic diagram.
And 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the characteristic simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram for subsequent post-processing.
And 4, considering the shielding attribute and the classification confidence of the detection frame, and using the non-maximum value of shielding perception to suppress the third central point response thermodynamic diagram, so as to realize post-processing of the prediction result and obtain the detection result of the image to be detected.
According to the pedestrian detection method in the occlusion scene fused with the feature simulation, provided by the embodiment of the invention, the feature difference in pedestrians is reduced in a feature simulation learning mode, and meanwhile, the difference between the pedestrian and the background feature is increased, so that the detection rate of the pedestrian in the occlusion scene is improved. Meanwhile, the shielding attribute is designed as extra semantic information. And designing a non-maximum suppression algorithm of the occlusion perception, not only considering the prediction attribute of the pedestrian detection frame, but also considering the occlusion attribute of the pedestrian detection frame, and effectively reserving the detection frame with a low confidence score caused by occlusion while suppressing the redundant detection frame.
Example 1
In one possible embodiment, the training process of the feature simulation learning network includes:
step 101, acquiring high-level features of a training image, and a visible part detection frame and a whole body part detection frame of a target pedestrian.
102, extracting high-level features by adopting RoI-Align to obtain pedestrian whole body part features and pedestrian visible part features according to marking information of a visible part and a whole body part of a pedestrian; and calculating visibility according to the ratio of the areas of the visible part detection frame and the whole-body part detection frame of the pedestrian, and classifying the whole-body part characteristic of the pedestrian into a pedestrian shielding characteristic and a non-pedestrian shielding characteristic according to the visibility.
In one possible embodiment, the step 102 of classifying the pedestrian full-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature comprises:
Wherein, the first and the second end of the pipe are connected with each other,is the area of the frame visible to the pedestrian,is the area of the whole body frame of the pedestrian.
Classifying the pedestrian whole body part features into blocking pedestrian features and non-blocking pedestrian features according to the visibility of the pedestrian:
wherein the content of the first and second substances,indicating the ith occluding pedestrian feature,representA set of occluded pedestrian features;representing the ith non-occluding pedestrian feature,to representA set of non-occluded pedestrian features.
103, inputting the characteristic of the blocked pedestrian and the characteristic of the non-blocked pedestrian into a blocking-non-blocking characteristic simulation module for learning, and enabling the characteristic of the blocked pedestrian to learn the characteristic representation simulating the characteristic of the non-blocked pedestrian to obtain a first central point response thermodynamic diagram; inputting the whole-body characteristic of the pedestrian and the visible part characteristic of the pedestrian into a whole-body-visible characteristic simulation module for learning, and enabling the whole-body characteristic of the pedestrian to learn the characteristic representation of the visible part characteristic of the pedestrian to obtain a second central point response thermodynamic diagram.
Referring to fig. 2, which is a schematic diagram of the learning of feature simulation provided in the embodiment of the present invention, in a possible embodiment manner, with reference to fig. 1 and fig. 2, the process of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module in step 103 includes:
dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing simulation; the features of the pedestrian include: a pedestrian whole body part feature, a pedestrian visible part feature, an occluded pedestrian feature, and a non-occluded pedestrian feature.
The features of each pedestrian were first extracted to a fixed size (7 × 256) using RoI-Align, and then the mean of the simulated features on each lane was calculated, using the feature mean as the object of the simulation.
wherein the content of the first and second substances,indicating the jth emulated feature,the ensemble represents the mean of the N simulated features,the ith feature to be simulated is shown, and M is the number of features to be simulated.
Two different occlusion emulation strategies are proposed: an occlusion-non-occlusion feature mimic learning module and a whole-body-visual feature mimic learning module.
And 104, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram in a weighted fusion mode, and activating sigmod to obtain a third central point response thermodynamic diagram.
In one possible embodiment, the fusion strategy of the center point response thermodynamic diagrams in step 3 and step 104 is:
wherein the content of the first and second substances,representing a first center point response thermodynamic diagram,representing a second center point response thermodynamic diagram,a third center point response thermodynamic diagram is represented,obtained by experiments.
In one possible embodiment, the features model a loss function of the learning networkComprises the following steps:
anda loss function for the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram respectively,learning constraint penalty functions for the feature emulation of the occlusion-nonocclusion feature emulation module,learning a constraint loss function for the feature emulation of the whole-body-partial occlusion feature emulation module;is a balance coefficient, set by experiment。
Wherein Lm is a loss calculation function;to representA set of full-body features of an individual pedestrian,to representOne rowA set of human visual features.
And 2, acquiring high-level features of the image to be detected through a backbone network, inputting the high-level features into a feature simulation learning network, and obtaining a first central point response thermodynamic diagram and a second central point response thermodynamic diagram.
And 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the characteristic simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram for subsequent post-processing.
And 4, considering the shielding attribute and the classification confidence of the detection frame, and using the non-maximum value of shielding perception to suppress the third central point response thermodynamic diagram, so as to realize post-processing of the prediction result and obtain the detection result of the image to be detected.
In a possible embodiment, in the post-processing stage, the method for suppressing non-maximum using occlusion perception in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:
step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402.
Step 402, calculating the shielding attribute difference of the two intersected detection frames; when the difference of the shielding attributes exceeds a set threshold value, the shielding attributes are different detection frames and need to be reserved; when the difference of the shielding attributes does not exceed the set threshold, the shielding attributes are indicated to be a redundant detection frame, and the suppression deletion is required.
The occlusion attribute is a ratio of the length of the visible part of each detection frame of the detection frame to the length of the detection frame.
In a scene of a vehicle-mounted camera, in a process from an initial position to infinity, a target at infinity is reduced to the middle of an image, and the vertical coordinate of the position of a lower detection frame is gradually reduced according to the depth of field of the target in the image. According to this phenomenon, for detection frames having an intersection relationship with each other, the occlusion relationship between pedestrians is determined from the ordinate value of the lower boundary of the detection frame, and the occlusion attribute of the detection frame is defined based on the occlusion relationship.
It can be understood that the occlusion property of the detection frame is:
O = {o i | i = 1, 2, 3, 4}
wherein, the first and the second end of the pipe are connected with each other,o 1 , o 2 , o 3 , o 4 respectively representing the visible length ratios of the upper frame, the right frame, the lower frame and the left frame;Oan occlusion property vector representing a detection box. The shielding attributes of the four detection frames form the shielding attribute of the whole detection frame.
Fig. 3 is a code program diagram of an occlusion perception non-maximum suppression algorithm according to an embodiment of the present invention, and as can be seen from fig. 1 and fig. 3, in another possible embodiment, step 4 includes:
step 401', initialize the sequence of test framesAnd corresponding confidence score sequencesWherein, in the step (A),it indicates the (i) th detection box,is thatThe confidence score of (c).
Step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box with the highest current confidence scoreAnd taking M out of the detection frame sequence B and putting the M into a set F.
is an occlusion attribute difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame, a lower frame or a left frame;andrespectively a detection frame M and a detection frameThe occlusion property of the jth detection frame.
Step 404', circularly executing step 402' -step 403' until the sequence B is empty, returning the final sets F and S as the final detection frame sequence and the corresponding confidence score sequence respectively
Based on the defects in the background art, the embodiment of the invention provides a method for detecting pedestrians descending in an occlusion scene by fusing feature simulation, and 1, the feature simulation is innovatively used for reducing the difference among the characteristics of the pedestrians, an effective thermodynamic diagram fusion strategy is provided by combining a model, and the detection rate of the pedestrians in the occlusion scene is effectively improved. 2. The pedestrian shielding attribute is constructed by utilizing the existing information and can be used as semantic information related to other visual tasks. 3. A blocking perception non-maximum value suppression algorithm is designed, redundant detection frames can be deleted, and meanwhile blocked pedestrian detection frames are reserved.
It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (9)
1. A method for detecting pedestrians in an occlusion scene by fusing feature simulation is characterized by comprising the following steps:
step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image acquired through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid response thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a whole-body-visible feature mimicking learning centroid response thermodynamic diagram;
step 2, acquiring high-level features of an image to be detected through the backbone network, and inputting the high-level features into the feature simulation learning network to obtain the first central point response thermodynamic diagram and the second central point response thermodynamic diagram;
step 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the feature simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram;
and 4, considering the shielding attribute and the classification confidence of the detection frame, and performing non-maximum suppression on the third central point response thermodynamic diagram by using shielding perception to obtain the detection result of the image to be detected.
2. The detection method according to claim 1, wherein the training process of the feature emulation learning network comprises:
101, acquiring high-level features of a training image, a visible part detection frame and a whole body part detection frame of a target pedestrian;
102, extracting the high-level features by adopting RoI-Align to obtain the pedestrian whole body part features and the pedestrian visible part features according to the marking information of the visible part and the whole body part of the pedestrian; calculating visibility according to the ratio of the areas of a visible part detection frame and a whole body part detection frame of the pedestrian, and classifying the whole body part characteristic of the pedestrian into a pedestrian-blocking characteristic and a non-pedestrian-blocking characteristic according to the visibility;
step 103, inputting the feature of the blocked pedestrian and the feature of the non-blocked pedestrian into a blocking-non-blocking feature simulation module for learning, and enabling the feature of the blocked pedestrian to learn and simulate the feature representation of the feature of the non-blocked pedestrian to obtain the first central point response thermodynamic diagram; inputting the whole-body pedestrian feature and the visible part pedestrian feature into a whole-body-visible feature simulation module for learning, and enabling the whole-body pedestrian feature to learn the feature representation of the visible part pedestrian feature to obtain the second central point response thermodynamic diagram;
and 104, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram.
3. The detection method according to claim 2, wherein the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature comprises:
Wherein the content of the first and second substances,is the area of the frame visible to the pedestrian,is the area of the whole body frame of the pedestrian;
classifying the pedestrian whole-body part features into the pedestrian-obstructing feature and the non-pedestrian-obstructing feature according to the visibility of the pedestrian:
4. The detection method according to claim 2, wherein the step 103 of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module comprises:
dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing to be simulated; the pedestrian features include: a pedestrian whole body part feature, a pedestrian visible part feature, a pedestrian obstructing feature and a non-pedestrian obstructing feature;
extracting the characteristic of each pedestrian to a fixed size by adopting RoI-Align, calculating the mean value of the simulated characteristic on each channel, and using the characteristic mean value as a simulated object;
5. The method of claim 2, wherein the fusion strategy of the center point response thermodynamic diagram in step 3 and step 104 is:
6. the detection method according to claim 2, wherein the features model a loss function of a learning networkComprises the following steps:
anda loss function for the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram respectively,learning a constraint penalty function for feature modeling of the occlusion-non-occlusion feature modeling module,learning a constraint loss function for the feature emulation of the whole-body-partial occlusion feature emulation module;is the equilibrium coefficient;
wherein Lm is a loss calculation function; f denotes a set of pedestrian whole-body features, and V denotes a set of pedestrian visible features.
7. The detection method according to claim 1, wherein the method of using non-maximum suppression of occlusion perception in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:
step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402;
step 402, calculating the shielding attribute difference of the two intersected detection frames; when the difference of the shielding attributes exceeds a set threshold value, the two intersected detection frames are reserved; when the difference of the shielding attributes does not exceed a set threshold value, deleting one of the two intersected detection frames;
the shielding attribute is the ratio of the length of the visible part of each detection frame of the detection frame to the length of the detection frame.
8. The detection method according to claim 7, wherein the occlusion property of the detection frame is:
O = {o i | i = 1, 2, 3, 4}
wherein the content of the first and second substances,o 1 , o 2 , o 3 , o 4 respectively representing the visible length ratios of the upper frame, the right frame, the lower frame and the left frame;Oan occlusion property vector representing a detection box.
9. The detection method according to claim 1, wherein the step 4 comprises:
step 401', initialize the sequence of test framesAnd corresponding confidence score sequencesWherein, in the step (A),it indicates the (i) th detection box,is thatA confidence score of (d);
step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box with the highest current confidence scoreTaking out M from the detection frame sequence B and putting the M into a set F;
is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame, a lower frame or a left frame;andrespectively a detection frame M and a detection frameThe occlusion property of the jth detection frame;
and step 404', circularly executing the step 402' -the step 403' until the sequence B is empty, and returning final sets F and S as a final detection frame sequence and a corresponding confidence score sequence respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211510002.6A CN115565207B (en) | 2022-11-29 | 2022-11-29 | Occlusion scene downlink person detection method with feature simulation fused |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211510002.6A CN115565207B (en) | 2022-11-29 | 2022-11-29 | Occlusion scene downlink person detection method with feature simulation fused |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115565207A true CN115565207A (en) | 2023-01-03 |
CN115565207B CN115565207B (en) | 2023-04-07 |
Family
ID=84769737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211510002.6A Active CN115565207B (en) | 2022-11-29 | 2022-11-29 | Occlusion scene downlink person detection method with feature simulation fused |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115565207B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115713731A (en) * | 2023-01-10 | 2023-02-24 | 武汉图科智能科技有限公司 | Crowd scene pedestrian detection model construction method and crowd scene pedestrian detection method |
CN115937906A (en) * | 2023-02-16 | 2023-04-07 | 武汉图科智能科技有限公司 | Occlusion scene pedestrian re-identification method based on occlusion inhibition and feature reconstruction |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598601A (en) * | 2019-08-30 | 2019-12-20 | 电子科技大学 | Face 3D key point detection method and system based on distributed thermodynamic diagram |
CN111738091A (en) * | 2020-05-27 | 2020-10-02 | 复旦大学 | Posture estimation and human body analysis system based on multi-task deep learning |
CN112836676A (en) * | 2021-03-01 | 2021-05-25 | 创新奇智(北京)科技有限公司 | Abnormal behavior detection method and device, electronic equipment and storage medium |
CN113191204A (en) * | 2021-04-07 | 2021-07-30 | 华中科技大学 | Multi-scale blocking pedestrian detection method and system |
CN113239885A (en) * | 2021-06-04 | 2021-08-10 | 新大陆数字技术股份有限公司 | Face detection and recognition method and system |
CN114419671A (en) * | 2022-01-18 | 2022-04-29 | 北京工业大学 | Hypergraph neural network-based occluded pedestrian re-identification method |
CN114419568A (en) * | 2022-01-18 | 2022-04-29 | 东北大学 | Multi-view pedestrian detection method based on feature fusion |
EP4002198A1 (en) * | 2019-12-24 | 2022-05-25 | Tencent Technology (Shenzhen) Company Limited | Posture acquisition method and device, and key point coordinate positioning model training method and device |
CN114639042A (en) * | 2022-03-17 | 2022-06-17 | 哈尔滨理工大学 | Video target detection algorithm based on improved CenterNet backbone network |
-
2022
- 2022-11-29 CN CN202211510002.6A patent/CN115565207B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598601A (en) * | 2019-08-30 | 2019-12-20 | 电子科技大学 | Face 3D key point detection method and system based on distributed thermodynamic diagram |
EP4002198A1 (en) * | 2019-12-24 | 2022-05-25 | Tencent Technology (Shenzhen) Company Limited | Posture acquisition method and device, and key point coordinate positioning model training method and device |
CN111738091A (en) * | 2020-05-27 | 2020-10-02 | 复旦大学 | Posture estimation and human body analysis system based on multi-task deep learning |
CN112836676A (en) * | 2021-03-01 | 2021-05-25 | 创新奇智(北京)科技有限公司 | Abnormal behavior detection method and device, electronic equipment and storage medium |
CN113191204A (en) * | 2021-04-07 | 2021-07-30 | 华中科技大学 | Multi-scale blocking pedestrian detection method and system |
CN113239885A (en) * | 2021-06-04 | 2021-08-10 | 新大陆数字技术股份有限公司 | Face detection and recognition method and system |
CN114419671A (en) * | 2022-01-18 | 2022-04-29 | 北京工业大学 | Hypergraph neural network-based occluded pedestrian re-identification method |
CN114419568A (en) * | 2022-01-18 | 2022-04-29 | 东北大学 | Multi-view pedestrian detection method based on feature fusion |
CN114639042A (en) * | 2022-03-17 | 2022-06-17 | 哈尔滨理工大学 | Video target detection algorithm based on improved CenterNet backbone network |
Non-Patent Citations (3)
Title |
---|
GENG CHEN 等: "Automatic Schelling Points Detection from Meshes", 《IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS ( EARLY ACCESS )》 * |
朱肖磊 等: "基于可见区域的拥挤行人检测", 《电子测量技术》 * |
李翔 等: "一种面向遮挡行人检测的改进YOLOv3算法", 《光学学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115713731A (en) * | 2023-01-10 | 2023-02-24 | 武汉图科智能科技有限公司 | Crowd scene pedestrian detection model construction method and crowd scene pedestrian detection method |
CN115937906A (en) * | 2023-02-16 | 2023-04-07 | 武汉图科智能科技有限公司 | Occlusion scene pedestrian re-identification method based on occlusion inhibition and feature reconstruction |
Also Published As
Publication number | Publication date |
---|---|
CN115565207B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115565207B (en) | Occlusion scene downlink person detection method with feature simulation fused | |
CN111489403B (en) | Method and device for generating virtual feature map by using GAN | |
CN112966697B (en) | Target detection method, device and equipment based on scene semantics and storage medium | |
CN112926410B (en) | Target tracking method, device, storage medium and intelligent video system | |
CN110210551A (en) | A kind of visual target tracking method based on adaptive main body sensitivity | |
CN111709285A (en) | Epidemic situation protection monitoring method and device based on unmanned aerial vehicle and storage medium | |
CN108805016B (en) | Head and shoulder area detection method and device | |
CN105144239A (en) | Image processing device, program, and image processing method | |
JP2022174707A (en) | Pedestrian re-identification system and method based on space sequence feature learning | |
CN110582783B (en) | Training device, image recognition device, training method, and computer-readable information storage medium | |
CN108446694A (en) | A kind of object detection method and device | |
CN107633242A (en) | Training method, device, equipment and the storage medium of network model | |
CN105303163B (en) | A kind of method and detection device of target detection | |
KR102117007B1 (en) | Method and apparatus for recognizing object on image | |
CN113033523B (en) | Method and system for constructing falling judgment model and falling judgment method and system | |
CN112200154A (en) | Face recognition method and device for mask, electronic equipment and storage medium | |
CN111797769A (en) | Small target sensitive vehicle detection system | |
CN112560584A (en) | Face detection method and device, storage medium and terminal | |
CN111783716A (en) | Pedestrian detection method, system and device based on attitude information | |
CN111967399A (en) | Improved fast RCNN behavior identification method | |
CN113570615A (en) | Image processing method based on deep learning, electronic equipment and storage medium | |
CN115713731B (en) | Crowd scene pedestrian detection model construction method and crowd scene pedestrian detection method | |
CN114494893B (en) | Remote sensing image feature extraction method based on semantic reuse context feature pyramid | |
CN110956097A (en) | Method and module for extracting occluded human body and method and device for scene conversion | |
CN111160219B (en) | Object integrity evaluation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: No. 548, 5th Floor, Building 10, No. 28 Linping Avenue, Donghu Street, Linping District, Hangzhou City, Zhejiang Province Patentee after: Hangzhou Tuke Intelligent Information Technology Co.,Ltd. Address before: 430000 B033, No. 05, 4th floor, building 2, international enterprise center, No. 1, Guanggu Avenue, Donghu New Technology Development Zone, Wuhan, Hubei (Wuhan area of free trade zone) Patentee before: Wuhan Tuke Intelligent Technology Co.,Ltd. |
|
CP03 | Change of name, title or address |