CN111783716A - Pedestrian detection method, system and device based on attitude information - Google Patents

Pedestrian detection method, system and device based on attitude information Download PDF

Info

Publication number
CN111783716A
CN111783716A CN202010664330.6A CN202010664330A CN111783716A CN 111783716 A CN111783716 A CN 111783716A CN 202010664330 A CN202010664330 A CN 202010664330A CN 111783716 A CN111783716 A CN 111783716A
Authority
CN
China
Prior art keywords
pedestrian
description
network
confidence score
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010664330.6A
Other languages
Chinese (zh)
Inventor
徐常胜
姚涵涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202010664330.6A priority Critical patent/CN111783716A/en
Publication of CN111783716A publication Critical patent/CN111783716A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention belongs to the field of pedestrian detection, and particularly relates to a pedestrian detection method, system and device based on attitude information, aiming at solving the problem that the accuracy of the existing pedestrian detection method cannot meet the requirement in a multi-person environment. The method comprises the following steps: obtaining a pedestrian candidate box and a corresponding first confidence score based on a pre-trained region extraction networkr(ii) a Acquiring comprehensive description of the pedestrian candidate frame based on the pre-trained pedestrian recognition network, performing secondary classification based on the description, and taking a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp(ii) a Based on scorerAnd scorepAnd acquiring a third confidence score, and executing the range on the set confidence threshold value to determine the pedestrian. The invention can be used forThe problems of shielding and false detection commonly existing in the pedestrian detection task are solved, and the accuracy of pedestrian detection is improved.

Description

Pedestrian detection method, system and device based on attitude information
Technical Field
The invention belongs to the field of pedestrian detection, and particularly relates to a pedestrian detection method, system and device based on attitude information.
Background
As a special branch of object detection, pedestrian detection has received great attention in both academic and industrial circles, with the goal of predicting where a pedestrian is located in a given image and represented by a series of bounding boxes. In addition to early manual characterization studies, pedestrian detection using convolutional neural networks has made tremendous progress over the past few years.
Recently, researchers have demonstrated that models based on convolutional neural networks help improve the performance of pedestrian detection. These convolutional neural network-based models can be divided into two categories: pedestrian detection with anchor points and pedestrian detection without anchor points. Generally, a detection model with an anchor point generates a large number of target candidate frames, and then judges whether each candidate frame contains a pedestrian through a classifier. The disadvantage of this approach is that most candidate blocks are redundant and therefore a lot of time is wasted in learning the feature representation. To avoid the above problems, researchers have designed anchorless detectors that can predict pedestrians directly from pictures. While existing methods can locate pedestrians for a given picture, they are not robust to occluded pedestrian detection.
Because scenes such as streets in the real world are often crowded with pedestrians and various objects, occlusion is a key problem in pedestrian detection. To address this challenge, researchers have attempted to model using pedestrian visual depictions. However, when the background is similar to a pedestrian, using only visual depictions is not sufficient to distinguish occluded pedestrians from the background. Since the detection model with anchor points can generate candidate frames of occluded pedestrians, the core problem of solving occlusion detection is how to generate a robust description to filter occluded pedestrians.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, to solve the problem that the accuracy of the existing pedestrian detection method cannot meet the requirement in a multi-person environment, a first aspect of the present invention provides a pedestrian detection method based on attitude information, the method including the following steps:
step (ii) ofS100, acquiring a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr
Step S200, acquiring comprehensive description of the pedestrian candidate frame based on the pre-trained pedestrian recognition network, performing secondary classification based on the description, and taking a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp
Step S300, based on scorerAnd scorepAcquiring a third confidence score, and determining the pedestrian by being extensive to a set confidence threshold;
wherein the content of the first and second substances,
the pedestrian recognition network comprises a visual feature module, a human body posture module and a classification module; the visual feature module is constructed based on a feature extraction network and is used for acquiring the visual description; the human body posture module is constructed based on a convolutional neural network and is used for acquiring the posture description fp(ii) a The classification module is a two-classification network and is used for acquiring a second confidence score based on the comprehensive descriptionp
In some preferred embodiments, the area extraction network is constructed based on an object detection network, the loss function L of whichrpnIs composed of
Figure BDA0002579788070000021
Wherein L isclsIs a cross-entropy loss of two classes, LregIs the regression loss, gamma is a predetermined coordination parameter, piIs the predicted probability of the i-th pedestrian candidate frame,
Figure BDA0002579788070000022
judging the correct probability for the ith pedestrian candidate frame classification, tiIs a vector of coordinates of the i-th pedestrian candidate frame,
Figure BDA0002579788070000031
and marking a vector of the frame coordinates corresponding to the real pedestrian for the ith pedestrian candidate frame.
In some preferred embodiments, the classification loss LclsComprises the following steps:
Figure BDA0002579788070000032
regression loss LregComprises the following steps:
Figure BDA0002579788070000033
Figure BDA0002579788070000034
in some preferred embodiments, the visual feature module is composed of a top 10 layer network of VGG-19 and a convolution block, and obtains the visual description f based on the pedestrian candidate boxvDescription of the vision f by a full link layervCarry out two classifications to obtain confidence score1
In some preferred embodiments, the human body posture module comprises a feature extraction network, a first sub-network, a second sub-network, a full connectivity layer;
the feature extraction network is constructed on the basis of a convolution network of VGG-19 and is used for extracting a feature map F of the pedestrian candidate frame;
the first sub-network and the second sub-network are respectively constructed based on a convolutional neural network, and a confidence map S and an associated domain L of a corresponding pedestrian candidate frame are predicted based on a feature map F;
the full connection layer is used for obtaining the attitude description f based on the confidence coefficient graph S and the association domain LpAnd obtaining a confidence score2
In some preferred embodiments, the classification module is configured to classify the image based on a visual description fvAnd attitude description fpObtaining confidence score3And based on the confidence score1Confidence score2Confidence score3Carrying out weighted summation through a preset weighting coefficient to obtain a second confidence scorep
In some preferred embodiments, the third confidence score is calculated by:
score=αscorer+βscorep
wherein α and β are preset weighting parameters.
In some preferred embodiments, one or more of the visual feature module, the human body posture module, and the classification module are respectively constrained by corresponding cross entropy loss function memorability during training.
In a second aspect of the present invention, a pedestrian detection system based on attitude information is provided, the system comprising a first unit, a second unit, and a third unit:
the first unit is configured to acquire a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr
The second unit is configured to acquire a comprehensive description of the pedestrian candidate frame based on a pre-trained pedestrian recognition network, perform secondary classification based on the description, and use a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp
The third unit is configured to calculate score based on a preset weightrAnd scorepTaking the sum as a third confidence score, and then, executing the range on the set confidence threshold value to determine the pedestrian;
wherein the content of the first and second substances,
the pedestrian recognition network comprises a visual feature module, a human body posture module and a classification module; the visual feature module is constructed based on a feature extraction network and is used for acquiring the visual description; the human body posture module is constructed based on a convolutional neural network and is used for acquiring the posture description fp(ii) a The classification module is a two-classification network and is used for acquiring a second confidence score based on the comprehensive descriptionp
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-described pedestrian detection method based on attitude information.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the above-described pedestrian detection method based on attitude information.
The invention has the beneficial effects that:
the pedestrian detection method and the pedestrian detection device can well solve the problems of shielding and false detection commonly existing in the pedestrian detection task, and improve the accuracy of pedestrian detection. The invention can be well embedded into any existing detector (with or without anchor points), thereby greatly improving the detection efficiency and the generalization.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of a pedestrian detection method based on attitude information according to an embodiment of the present invention;
FIG. 2 is a block diagram of a pedestrian detection network based on attitude information in one embodiment of the present invention;
fig. 3 is a detailed structural diagram of a pedestrian identification network according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The invention discloses a pedestrian detection method based on attitude information, which comprises the following steps as shown in figure 1:
step S100, acquiring a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr
Step S200, acquiring comprehensive description of the pedestrian candidate frame based on the pre-trained pedestrian recognition network, performing secondary classification based on the description, and taking a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp
Step S300, based on scorerAnd scorepAcquiring a third confidence score, and determining the pedestrian by being extensive to a set confidence threshold;
wherein the content of the first and second substances,
the pedestrian recognition network comprises a visual feature module, a human body posture module and a classification module; the visual feature module is constructed based on a feature extraction network and is used for acquiring the visual description; the human body posture module is constructed based on a convolutional neural network and is used for obtaining the posture description fp; the classification module is a two-classification network and is used for acquiring a second confidence score based on the comprehensive descriptionp
In order to more clearly explain the pedestrian detection method based on the attitude information, the following will describe each step in an embodiment of the method in detail with reference to the accompanying drawings.
The implementation of the detection method in an embodiment of the present invention needs to rely on the trained network obtained by the construction of the corresponding detection network and the prior training, so the following description of the technical solution is first performed from the construction of the detection network to be trained.
The detection network on which the method of the invention is implemented comprises an area extraction network, a pedestrian recognition network and a detection output network as shown in figure 2.
For convenience of description, the training samples will be described as follows: picture I corresponding to training sample, determining all n pedestrians existing in picture I and using rectangular frame T*={t1 *,t2 *,…,tn *Position, where the real frame coordinates
Figure BDA0002579788070000061
Figure BDA0002579788070000062
Is the coordinate of the center point of the rectangular frame,
Figure BDA0002579788070000063
the width and height of the rectangular frame.
1. Area extraction network
Any existing target detector may be used as the area extraction network for global modeling to generate a series of pedestrian candidate boxes and corresponding confidence scores.
The network passes a multi-tasking loss function LrpnOptimizing the network:
Figure BDA0002579788070000071
wherein L isclsIs a cross-entropy loss of two classes, LregIs the regression loss, gamma is a predetermined coordination parameter, piIs the predicted probability of the i-th pedestrian candidate frame,
Figure BDA0002579788070000072
judging the correct probability for the ith pedestrian candidate frame classification, tiIs a vector of coordinates of the i-th pedestrian candidate frame,
Figure BDA0002579788070000073
and marking a vector of the frame coordinates corresponding to the real pedestrian for the ith pedestrian candidate frame.
In this embodiment, when the ratio of the intersection to the union between the target frame i and any one of the real frames is greater than 0.5,
Figure BDA0002579788070000074
otherwise
Figure BDA0002579788070000075
Loss of classification LclsComprises the following steps:
Figure BDA0002579788070000076
regression loss LregComprises the following steps:
Figure BDA0002579788070000077
Figure BDA0002579788070000078
wherein, ti=[tx,ty,tw,th]Is a vector representing the predicted candidate box coordinates,
Figure BDA0002579788070000079
is tiCorresponding real frame coordinates.
Figure BDA00025797880700000710
Figure BDA00025797880700000711
Figure BDA00025797880700000712
Figure BDA00025797880700000713
Wherein x, y, w, h respectively represent the center coordinates and width and height of the candidate frame, xa、ya、wa、haRepresenting the coordinates of the center point of the anchor box and the width and height, x, respectively*、y*、w*、h*Representing the coordinates of the center point of the real box and the width and height, respectively.
To eliminate the redundant detection results generated for the same pedestrian, all candidate boxes may be fused using non-maximum suppression and set IoU to a threshold of 0.5.
2. Pedestrian identification network
After a candidate frame possibly containing pedestrians is generated by using a region extraction network, a local candidate region is modeled by using a pedestrian recognition network, the confidence score of the candidate region is optimized by obtaining visual feature description and human body posture description, and a false detection frame is removed. The pedestrian recognition network is composed of three modules, namely a visual feature module, a human body posture module and a classification module, as shown in fig. 3.
(1) Visual feature module
For a pedestrian candidate box output by the area extraction network, pixels of the pedestrian candidate box are adjusted into 256 × 256, then the pedestrian candidate box is sent into the visual feature module to obtain a 128-dimensional visual description fv, and then the visual description is subjected to secondary classification by using a full-connection layer and confidence coefficient is obtained
Figure BDA0002579788070000081
This module loses L through cross entropy during trainingvAnd (6) carrying out constraint.
Figure BDA0002579788070000082
The probability of the background is predicted, and the probability of the pedestrian is predicted, and the value of the probability is 0 or 1.
(2) Human body posture module
The human body posture module comprises a feature extraction network, a first sub-network, a second sub-network and a full connection layer, and is adjusted to 256 × 256 pixels for eachFirstly, extracting a feature map F of the pedestrian candidate frame through a feature extraction network constructed based on a convolutional network construction of VGG-19, then respectively predicting a confidence map S and an association domain L (the confidence map and the association domain respectively represent key points and connection relations between points in human body posture information) of the corresponding pedestrian candidate frame based on the feature map F by using a first sub-network and a second sub-network constructed based on a convolutional neural network, and finally obtaining a posture description F through a full connection layer based on the confidence map S and the association domain LpAnd obtaining a confidence score2
Attitude description fpThe acquisition can be divided into the following stages:
the human body posture module generates a confidence map S in the first stage1=ρ1(F) And associated domain
Figure BDA0002579788070000091
Where ρ is1And
Figure BDA0002579788070000092
the convolutional neural networks are all formed by three convolutional layers of 3 × 3 and two layers of 1 × 1;
in each of the following stages, we combine the predictions of the two subnetworks in the previous stage with the feature F of the original image to generate a new prediction, which is detailed as follows:
Figure BDA0002579788070000093
Figure BDA0002579788070000094
where ρ istAnd
Figure BDA0002579788070000095
(t is the stage, t is more than or equal to 2) are all convolutional neural networks formed by five convolutional layers of 7 × 7 and two convolutional layers of 1 × 1;
in the last stage, we combine the confidence maps S6And shut offLinked domain L6Obtaining the posture description f of the human bodyp
The human body posture module can perform parameter initialization by using the trained OpenPose model, and the parameters of the human body posture module are fixed and cannot be updated when the whole pedestrian recognition network is trained. Then, inputting the attitude information into the full connection layer to obtain 128-dimensional attitude description fpAnd using a full connection layer to perform two classifications on the attitude description to obtain confidence
Figure BDA0002579788070000096
Figure BDA0002579788070000097
This module loses L through cross entropy during trainingpAnd (6) carrying out constraint.
Figure BDA0002579788070000098
The probability of the background is predicted, and the probability of the pedestrian is predicted, and the value of the probability is 0 or 1.
(3) Classification module
In obtaining a visual description fvAnd attitude description fpThey are then combined into 256-dimensional descriptions, then binned through several fully connected layers, and constrained visually and by cross-entropy loss L.
In the module based on the visual description fvAnd attitude description fpTwo-class confidence acquisition through several fully-connected layers
Figure BDA0002579788070000099
Figure BDA00025797880700000910
The probability of the background is predicted, and the probability of the pedestrian is predicted, and the value of the probability is 0 or 1.
Based on confidence score1Confidence score2Confidence score3Carrying out weighted summation through a preset weighting coefficient to obtain a second confidence scorep. For example, a weighting coefficient e may be set1、e2、e3Then the second confidence scorepIs composed of
scorep=score1e1+score2e2+score3e3
Wherein e is1、e2、e3The sum is 1.
In this embodiment, the detailed structure of the pedestrian identification network is shown in the figure, and passes through the loss function LprnThe constraint is specifically expressed as follows:
Lprn=L+λ2Lv3Lp
l, L thereinv、LpLoss functions of a visual characteristic module, a human body posture module and a classification module respectively, and two super parameters lambda2=λ3=0.5。
In the training process, based on the loss function LprnAnd carrying out integral training on the pedestrian recognition network.
3. Detecting an output network
Confidence score output by regional extraction networkrConfidence score with pedestrian recognition network outputpPerforming fusion as a final confidence score of the generated candidate region score:
score=αscorer+βscorep
wherein
Figure BDA0002579788070000101
And
Figure BDA0002579788070000102
wherein
Figure BDA0002579788070000103
Which represents the probability of predicting a pedestrian,
Figure BDA0002579788070000104
represents the probability of predicting the background, where r, p, α and β are weighting parametersWhen re is low, the candidate area is determined as the background.
And training the detection network based on a pre-constructed training sample to obtain the optimal parameters of each part of network and obtain the optimized network.
Based on the optimized network, the pedestrian detection method based on the attitude information comprises the following steps:
step S100, acquiring a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr
Step S200, acquiring comprehensive description of the pedestrian candidate frame based on the pre-trained pedestrian recognition network, performing secondary classification based on the description, and taking a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp
Step S300, based on scorerAnd scorepAnd acquiring a third confidence score, and executing the range on the set confidence threshold value to determine the pedestrian.
A pedestrian detection system based on attitude information according to a second embodiment of the present invention includes a first unit, a second unit, and a third unit:
the first unit is configured to acquire a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr
The second unit is configured to acquire a comprehensive description of the pedestrian candidate frame based on a pre-trained pedestrian recognition network, perform secondary classification based on the description, and use a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp
The third unit is configured to calculate score based on a preset weightrAnd scorepTaking the sum as a third confidence score, and then, executing the range on the set confidence threshold value to determine the pedestrian;
wherein the content of the first and second substances,
the pedestrian recognition network comprises a visual feature module, a human body posture module and a classification module(ii) a The visual feature module is constructed based on a feature extraction network and is used for acquiring the visual description; the human body posture module is constructed based on a convolutional neural network and is used for acquiring the posture description fp(ii) a The classification module is a two-classification network and is used for acquiring a second confidence score based on the comprehensive descriptionp
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the pedestrian detection system based on the posture information provided in the foregoing embodiment is only exemplified by the division of the above functional modules, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into a plurality of sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A storage device of a third embodiment of the present invention stores therein a plurality of programs adapted to be loaded and executed by a processor to implement the above-described pedestrian detection method based on attitude information.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the above-described pedestrian detection method based on attitude information.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (11)

1. A pedestrian detection method based on attitude information is characterized by comprising the following steps:
step S100, acquiring a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr
Step S200, acquiring comprehensive description of the pedestrian candidate frame based on the pre-trained pedestrian recognition network, performing secondary classification based on the description, and taking a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp
Step S300, calculating score based on preset weightrAnd scorepTaking the sum as a third confidence score, and then, executing the range on the set confidence threshold value to determine the pedestrian;
wherein the content of the first and second substances,
the pedestrian recognition network comprises a visual feature module, a human body posture module and a classification module; the visual characteristic modelThe block is constructed based on a feature extraction network and is used for acquiring the visual description; the human body posture module is constructed based on a convolutional neural network and is used for acquiring the posture description fp(ii) a The classification module is a two-classification network and is used for acquiring a second confidence score based on the comprehensive descriptionp
2. The pedestrian detection method based on attitude information of claim 1, wherein the area extraction network is constructed based on an object detection network, and a loss function L thereofrpnIs composed of
Figure FDA0002579788060000011
Wherein L isclsIs a cross-entropy loss of two classes, LregIs the regression loss, gamma is a predetermined coordination parameter, piIs the predicted probability of the i-th pedestrian candidate frame,
Figure FDA0002579788060000012
judging the correct probability for the ith pedestrian candidate frame classification, tiIs a vector of coordinates of the i-th pedestrian candidate frame,
Figure FDA0002579788060000013
and marking a vector of the frame coordinates corresponding to the real pedestrian for the ith pedestrian candidate frame.
3. The pedestrian detection method based on the attitude information of claim 2, characterized in that the classification loss L isclsComprises the following steps:
Figure FDA0002579788060000021
regression loss LregComprises the following steps:
Figure FDA0002579788060000022
Figure FDA0002579788060000023
4. the pedestrian detection method based on the attitude information of claim 1, wherein the visual feature module is composed of a front 10 network of VGG-19 and a convolution block, and obtains the visual description f based on the pedestrian candidate framevDescription of the vision f by a full link layervCarry out two classifications to obtain confidence score1
5. The pedestrian detection method based on the attitude information of claim 4, wherein the human body attitude module comprises a feature extraction network, a first sub-network, a second sub-network, a full connection layer;
the feature extraction network is constructed on the basis of a convolution network of VGG-19 and is used for extracting a feature map F of the pedestrian candidate frame;
the first sub-network and the second sub-network are respectively constructed based on a convolutional neural network, and a confidence map S and an associated domain L of a corresponding pedestrian candidate frame are predicted based on a feature map F;
the full connection layer is used for obtaining the attitude description f based on the confidence coefficient graph S and the association domain LpAnd obtaining a confidence score2
6. The pedestrian detection method based on attitude information of claim 5, wherein the classification module is configured to classify the pedestrian based on a visual description fvAnd attitude description fpObtaining confidence score3And based on the confidence score1Confidence score2Confidence score3Carrying out weighted summation through a preset weighting coefficient to obtain a second confidence scorep
7. The pedestrian detection method based on the attitude information of claim 6, wherein the third confidence score is calculated by:
score=αscorer+βscorep
wherein α and β are preset weighting parameters.
8. The pedestrian detection method based on the posture information as claimed in any one of claims 1 to 7, wherein one or more of the visual feature module, the human body posture module and the classification module are respectively constrained by corresponding cross entropy loss function memorability during training.
9. A pedestrian detection system based on attitude information is characterized by comprising a first unit, a second unit and a third unit:
the first unit is configured to acquire a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr
The second unit is configured to acquire a comprehensive description of the pedestrian candidate frame based on a pre-trained pedestrian recognition network, perform secondary classification based on the description, and use a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp
The third unit is configured to calculate score based on a preset weightrAnd scorepTaking the sum as a third confidence score, and then, executing the range on the set confidence threshold value to determine the pedestrian;
wherein the content of the first and second substances,
the pedestrian recognition network comprises a visual feature module, a human body posture module and a classification module; the visual feature module is constructed based on a feature extraction network and is used for acquiring the visual description; the human body posture module is constructed based on a convolutional neural network and is used for acquiring the posture description fp(ii) a The classification module is a two-classification network and is used for acquiring a second confidence score based on the comprehensive descriptionp
10. A storage device having stored therein a plurality of programs, characterized in that the programs are adapted to be loaded and executed by a processor to implement the pedestrian detection method based on attitude information according to any one of claims 1 to 8.
11. A processing device comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; characterized in that the program is adapted to be loaded and executed by a processor to implement the pedestrian detection method based on attitude information according to any one of claims 1 to 8.
CN202010664330.6A 2020-07-10 2020-07-10 Pedestrian detection method, system and device based on attitude information Pending CN111783716A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010664330.6A CN111783716A (en) 2020-07-10 2020-07-10 Pedestrian detection method, system and device based on attitude information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010664330.6A CN111783716A (en) 2020-07-10 2020-07-10 Pedestrian detection method, system and device based on attitude information

Publications (1)

Publication Number Publication Date
CN111783716A true CN111783716A (en) 2020-10-16

Family

ID=72767368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010664330.6A Pending CN111783716A (en) 2020-07-10 2020-07-10 Pedestrian detection method, system and device based on attitude information

Country Status (1)

Country Link
CN (1) CN111783716A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560649A (en) * 2020-12-09 2021-03-26 广州云从鼎望科技有限公司 Behavior action detection method, system, equipment and medium
CN114821818A (en) * 2022-06-29 2022-07-29 广东信聚丰科技股份有限公司 Motion data analysis method and system based on intelligent sports
CN114863556A (en) * 2022-04-13 2022-08-05 上海大学 Multi-neural-network fusion continuous action recognition method based on skeleton posture

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279742A (en) * 2013-05-24 2013-09-04 中国科学院自动化研究所 Multi-resolution pedestrian detection method and device based on multi-task model
CN108537136A (en) * 2018-03-19 2018-09-14 复旦大学 The pedestrian's recognition methods again generated based on posture normalized image

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279742A (en) * 2013-05-24 2013-09-04 中国科学院自动化研究所 Multi-resolution pedestrian detection method and device based on multi-task model
CN108537136A (en) * 2018-03-19 2018-09-14 复旦大学 The pedestrian's recognition methods again generated based on posture normalized image

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ROSS GIRSHICK: "Fast R-CNN", 《HTTPS://ARXIV.ORG/ABS/1504.08083V2》 *
Y. JIAO ET AL.: "PEN: Pose-Embedding Network for Pedestrian Detection", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》 *
Z. CAO ET AL.: "Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
杨露菁 等: "《智能图像处理及应用》", 31 March 2019, 中国铁道出版社 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560649A (en) * 2020-12-09 2021-03-26 广州云从鼎望科技有限公司 Behavior action detection method, system, equipment and medium
CN114863556A (en) * 2022-04-13 2022-08-05 上海大学 Multi-neural-network fusion continuous action recognition method based on skeleton posture
CN114821818A (en) * 2022-06-29 2022-07-29 广东信聚丰科技股份有限公司 Motion data analysis method and system based on intelligent sports

Similar Documents

Publication Publication Date Title
Arietta et al. City forensics: Using visual elements to predict non-visual city attributes
CN111178183B (en) Face detection method and related device
CN112734775B (en) Image labeling, image semantic segmentation and model training methods and devices
CN110287960A (en) The detection recognition method of curve text in natural scene image
CN111126258A (en) Image recognition method and related device
CN111783716A (en) Pedestrian detection method, system and device based on attitude information
US11308714B1 (en) Artificial intelligence system for identifying and assessing attributes of a property shown in aerial imagery
CN108805016B (en) Head and shoulder area detection method and device
CN110674685B (en) Human body analysis segmentation model and method based on edge information enhancement
CN113537070B (en) Detection method, detection device, electronic equipment and storage medium
CN109671055B (en) Pulmonary nodule detection method and device
CN111931764A (en) Target detection method, target detection framework and related equipment
CN113919497A (en) Attack and defense method based on feature manipulation for continuous learning ability system
CN115187786A (en) Rotation-based CenterNet2 target detection method
CN115359366A (en) Remote sensing image target detection method based on parameter optimization
CN113781519A (en) Target tracking method and target tracking device
CN113469099A (en) Training method, detection method, device, equipment and medium of target detection model
CN113673505A (en) Example segmentation model training method, device and system and storage medium
Sun et al. Automatic building age prediction from street view images
CN112926487B (en) Pedestrian re-identification method and device
CN115331162A (en) Cross-scale infrared pedestrian detection method, system, medium, equipment and terminal
CN114387496A (en) Target detection method and electronic equipment
CN113569600A (en) Method and device for identifying weight of object, electronic equipment and storage medium
CN117593890B (en) Detection method and device for road spilled objects, electronic equipment and storage medium
CN110659384A (en) Video structured analysis method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201016