CN111783716A - Pedestrian detection method, system and device based on attitude information - Google Patents
Pedestrian detection method, system and device based on attitude information Download PDFInfo
- Publication number
- CN111783716A CN111783716A CN202010664330.6A CN202010664330A CN111783716A CN 111783716 A CN111783716 A CN 111783716A CN 202010664330 A CN202010664330 A CN 202010664330A CN 111783716 A CN111783716 A CN 111783716A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- description
- network
- confidence score
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 62
- 230000000007 visual effect Effects 0.000 claims abstract description 45
- 238000000605 extraction Methods 0.000 claims abstract description 28
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 11
- 239000000126 substance Substances 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 238000000034 method Methods 0.000 abstract description 20
- 230000006870 function Effects 0.000 description 12
- 238000004590 computer program Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000010276 construction Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 239000000523 sample Substances 0.000 description 3
- 230000000644 propagated effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention belongs to the field of pedestrian detection, and particularly relates to a pedestrian detection method, system and device based on attitude information, aiming at solving the problem that the accuracy of the existing pedestrian detection method cannot meet the requirement in a multi-person environment. The method comprises the following steps: obtaining a pedestrian candidate box and a corresponding first confidence score based on a pre-trained region extraction networkr(ii) a Acquiring comprehensive description of the pedestrian candidate frame based on the pre-trained pedestrian recognition network, performing secondary classification based on the description, and taking a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp(ii) a Based on scorerAnd scorepAnd acquiring a third confidence score, and executing the range on the set confidence threshold value to determine the pedestrian. The invention can be used forThe problems of shielding and false detection commonly existing in the pedestrian detection task are solved, and the accuracy of pedestrian detection is improved.
Description
Technical Field
The invention belongs to the field of pedestrian detection, and particularly relates to a pedestrian detection method, system and device based on attitude information.
Background
As a special branch of object detection, pedestrian detection has received great attention in both academic and industrial circles, with the goal of predicting where a pedestrian is located in a given image and represented by a series of bounding boxes. In addition to early manual characterization studies, pedestrian detection using convolutional neural networks has made tremendous progress over the past few years.
Recently, researchers have demonstrated that models based on convolutional neural networks help improve the performance of pedestrian detection. These convolutional neural network-based models can be divided into two categories: pedestrian detection with anchor points and pedestrian detection without anchor points. Generally, a detection model with an anchor point generates a large number of target candidate frames, and then judges whether each candidate frame contains a pedestrian through a classifier. The disadvantage of this approach is that most candidate blocks are redundant and therefore a lot of time is wasted in learning the feature representation. To avoid the above problems, researchers have designed anchorless detectors that can predict pedestrians directly from pictures. While existing methods can locate pedestrians for a given picture, they are not robust to occluded pedestrian detection.
Because scenes such as streets in the real world are often crowded with pedestrians and various objects, occlusion is a key problem in pedestrian detection. To address this challenge, researchers have attempted to model using pedestrian visual depictions. However, when the background is similar to a pedestrian, using only visual depictions is not sufficient to distinguish occluded pedestrians from the background. Since the detection model with anchor points can generate candidate frames of occluded pedestrians, the core problem of solving occlusion detection is how to generate a robust description to filter occluded pedestrians.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, to solve the problem that the accuracy of the existing pedestrian detection method cannot meet the requirement in a multi-person environment, a first aspect of the present invention provides a pedestrian detection method based on attitude information, the method including the following steps:
step (ii) ofS100, acquiring a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr;
Step S200, acquiring comprehensive description of the pedestrian candidate frame based on the pre-trained pedestrian recognition network, performing secondary classification based on the description, and taking a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp;
Step S300, based on scorerAnd scorepAcquiring a third confidence score, and determining the pedestrian by being extensive to a set confidence threshold;
wherein the content of the first and second substances,
the pedestrian recognition network comprises a visual feature module, a human body posture module and a classification module; the visual feature module is constructed based on a feature extraction network and is used for acquiring the visual description; the human body posture module is constructed based on a convolutional neural network and is used for acquiring the posture description fp(ii) a The classification module is a two-classification network and is used for acquiring a second confidence score based on the comprehensive descriptionp。
In some preferred embodiments, the area extraction network is constructed based on an object detection network, the loss function L of whichrpnIs composed of
Wherein L isclsIs a cross-entropy loss of two classes, LregIs the regression loss, gamma is a predetermined coordination parameter, piIs the predicted probability of the i-th pedestrian candidate frame,judging the correct probability for the ith pedestrian candidate frame classification, tiIs a vector of coordinates of the i-th pedestrian candidate frame,and marking a vector of the frame coordinates corresponding to the real pedestrian for the ith pedestrian candidate frame.
In some preferred embodiments, the classification loss LclsComprises the following steps:
regression loss LregComprises the following steps:
in some preferred embodiments, the visual feature module is composed of a top 10 layer network of VGG-19 and a convolution block, and obtains the visual description f based on the pedestrian candidate boxvDescription of the vision f by a full link layervCarry out two classifications to obtain confidence score1。
In some preferred embodiments, the human body posture module comprises a feature extraction network, a first sub-network, a second sub-network, a full connectivity layer;
the feature extraction network is constructed on the basis of a convolution network of VGG-19 and is used for extracting a feature map F of the pedestrian candidate frame;
the first sub-network and the second sub-network are respectively constructed based on a convolutional neural network, and a confidence map S and an associated domain L of a corresponding pedestrian candidate frame are predicted based on a feature map F;
the full connection layer is used for obtaining the attitude description f based on the confidence coefficient graph S and the association domain LpAnd obtaining a confidence score2。
In some preferred embodiments, the classification module is configured to classify the image based on a visual description fvAnd attitude description fpObtaining confidence score3And based on the confidence score1Confidence score2Confidence score3Carrying out weighted summation through a preset weighting coefficient to obtain a second confidence scorep。
In some preferred embodiments, the third confidence score is calculated by:
score=αscorer+βscorep
wherein α and β are preset weighting parameters.
In some preferred embodiments, one or more of the visual feature module, the human body posture module, and the classification module are respectively constrained by corresponding cross entropy loss function memorability during training.
In a second aspect of the present invention, a pedestrian detection system based on attitude information is provided, the system comprising a first unit, a second unit, and a third unit:
the first unit is configured to acquire a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr;
The second unit is configured to acquire a comprehensive description of the pedestrian candidate frame based on a pre-trained pedestrian recognition network, perform secondary classification based on the description, and use a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp;
The third unit is configured to calculate score based on a preset weightrAnd scorepTaking the sum as a third confidence score, and then, executing the range on the set confidence threshold value to determine the pedestrian;
wherein the content of the first and second substances,
the pedestrian recognition network comprises a visual feature module, a human body posture module and a classification module; the visual feature module is constructed based on a feature extraction network and is used for acquiring the visual description; the human body posture module is constructed based on a convolutional neural network and is used for acquiring the posture description fp(ii) a The classification module is a two-classification network and is used for acquiring a second confidence score based on the comprehensive descriptionp。
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-described pedestrian detection method based on attitude information.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the above-described pedestrian detection method based on attitude information.
The invention has the beneficial effects that:
the pedestrian detection method and the pedestrian detection device can well solve the problems of shielding and false detection commonly existing in the pedestrian detection task, and improve the accuracy of pedestrian detection. The invention can be well embedded into any existing detector (with or without anchor points), thereby greatly improving the detection efficiency and the generalization.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of a pedestrian detection method based on attitude information according to an embodiment of the present invention;
FIG. 2 is a block diagram of a pedestrian detection network based on attitude information in one embodiment of the present invention;
fig. 3 is a detailed structural diagram of a pedestrian identification network according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The invention discloses a pedestrian detection method based on attitude information, which comprises the following steps as shown in figure 1:
step S100, acquiring a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr;
Step S200, acquiring comprehensive description of the pedestrian candidate frame based on the pre-trained pedestrian recognition network, performing secondary classification based on the description, and taking a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp;
Step S300, based on scorerAnd scorepAcquiring a third confidence score, and determining the pedestrian by being extensive to a set confidence threshold;
wherein the content of the first and second substances,
the pedestrian recognition network comprises a visual feature module, a human body posture module and a classification module; the visual feature module is constructed based on a feature extraction network and is used for acquiring the visual description; the human body posture module is constructed based on a convolutional neural network and is used for obtaining the posture description fp; the classification module is a two-classification network and is used for acquiring a second confidence score based on the comprehensive descriptionp。
In order to more clearly explain the pedestrian detection method based on the attitude information, the following will describe each step in an embodiment of the method in detail with reference to the accompanying drawings.
The implementation of the detection method in an embodiment of the present invention needs to rely on the trained network obtained by the construction of the corresponding detection network and the prior training, so the following description of the technical solution is first performed from the construction of the detection network to be trained.
The detection network on which the method of the invention is implemented comprises an area extraction network, a pedestrian recognition network and a detection output network as shown in figure 2.
For convenience of description, the training samples will be described as follows: picture I corresponding to training sample, determining all n pedestrians existing in picture I and using rectangular frame T*={t1 *,t2 *,…,tn *Position, where the real frame coordinates Is the coordinate of the center point of the rectangular frame,the width and height of the rectangular frame.
1. Area extraction network
Any existing target detector may be used as the area extraction network for global modeling to generate a series of pedestrian candidate boxes and corresponding confidence scores.
The network passes a multi-tasking loss function LrpnOptimizing the network:
wherein L isclsIs a cross-entropy loss of two classes, LregIs the regression loss, gamma is a predetermined coordination parameter, piIs the predicted probability of the i-th pedestrian candidate frame,judging the correct probability for the ith pedestrian candidate frame classification, tiIs a vector of coordinates of the i-th pedestrian candidate frame,and marking a vector of the frame coordinates corresponding to the real pedestrian for the ith pedestrian candidate frame.
In this embodiment, when the ratio of the intersection to the union between the target frame i and any one of the real frames is greater than 0.5,otherwise
Loss of classification LclsComprises the following steps:
regression loss LregComprises the following steps:
wherein, ti=[tx,ty,tw,th]Is a vector representing the predicted candidate box coordinates,is tiCorresponding real frame coordinates.
Wherein x, y, w, h respectively represent the center coordinates and width and height of the candidate frame, xa、ya、wa、haRepresenting the coordinates of the center point of the anchor box and the width and height, x, respectively*、y*、w*、h*Representing the coordinates of the center point of the real box and the width and height, respectively.
To eliminate the redundant detection results generated for the same pedestrian, all candidate boxes may be fused using non-maximum suppression and set IoU to a threshold of 0.5.
2. Pedestrian identification network
After a candidate frame possibly containing pedestrians is generated by using a region extraction network, a local candidate region is modeled by using a pedestrian recognition network, the confidence score of the candidate region is optimized by obtaining visual feature description and human body posture description, and a false detection frame is removed. The pedestrian recognition network is composed of three modules, namely a visual feature module, a human body posture module and a classification module, as shown in fig. 3.
(1) Visual feature module
For a pedestrian candidate box output by the area extraction network, pixels of the pedestrian candidate box are adjusted into 256 × 256, then the pedestrian candidate box is sent into the visual feature module to obtain a 128-dimensional visual description fv, and then the visual description is subjected to secondary classification by using a full-connection layer and confidence coefficient is obtainedThis module loses L through cross entropy during trainingvAnd (6) carrying out constraint.The probability of the background is predicted, and the probability of the pedestrian is predicted, and the value of the probability is 0 or 1.
(2) Human body posture module
The human body posture module comprises a feature extraction network, a first sub-network, a second sub-network and a full connection layer, and is adjusted to 256 × 256 pixels for eachFirstly, extracting a feature map F of the pedestrian candidate frame through a feature extraction network constructed based on a convolutional network construction of VGG-19, then respectively predicting a confidence map S and an association domain L (the confidence map and the association domain respectively represent key points and connection relations between points in human body posture information) of the corresponding pedestrian candidate frame based on the feature map F by using a first sub-network and a second sub-network constructed based on a convolutional neural network, and finally obtaining a posture description F through a full connection layer based on the confidence map S and the association domain LpAnd obtaining a confidence score2。
Attitude description fpThe acquisition can be divided into the following stages:
the human body posture module generates a confidence map S in the first stage1=ρ1(F) And associated domainWhere ρ is1Andthe convolutional neural networks are all formed by three convolutional layers of 3 × 3 and two layers of 1 × 1;
in each of the following stages, we combine the predictions of the two subnetworks in the previous stage with the feature F of the original image to generate a new prediction, which is detailed as follows:
where ρ istAnd(t is the stage, t is more than or equal to 2) are all convolutional neural networks formed by five convolutional layers of 7 × 7 and two convolutional layers of 1 × 1;
in the last stage, we combine the confidence maps S6And shut offLinked domain L6Obtaining the posture description f of the human bodyp。
The human body posture module can perform parameter initialization by using the trained OpenPose model, and the parameters of the human body posture module are fixed and cannot be updated when the whole pedestrian recognition network is trained. Then, inputting the attitude information into the full connection layer to obtain 128-dimensional attitude description fpAnd using a full connection layer to perform two classifications on the attitude description to obtain confidence This module loses L through cross entropy during trainingpAnd (6) carrying out constraint.The probability of the background is predicted, and the probability of the pedestrian is predicted, and the value of the probability is 0 or 1.
(3) Classification module
In obtaining a visual description fvAnd attitude description fpThey are then combined into 256-dimensional descriptions, then binned through several fully connected layers, and constrained visually and by cross-entropy loss L.
In the module based on the visual description fvAnd attitude description fpTwo-class confidence acquisition through several fully-connected layers The probability of the background is predicted, and the probability of the pedestrian is predicted, and the value of the probability is 0 or 1.
Based on confidence score1Confidence score2Confidence score3Carrying out weighted summation through a preset weighting coefficient to obtain a second confidence scorep. For example, a weighting coefficient e may be set1、e2、e3Then the second confidence scorepIs composed of
scorep=score1e1+score2e2+score3e3
Wherein e is1、e2、e3The sum is 1.
In this embodiment, the detailed structure of the pedestrian identification network is shown in the figure, and passes through the loss function LprnThe constraint is specifically expressed as follows:
Lprn=L+λ2Lv+λ3Lp
l, L thereinv、LpLoss functions of a visual characteristic module, a human body posture module and a classification module respectively, and two super parameters lambda2=λ3=0.5。
In the training process, based on the loss function LprnAnd carrying out integral training on the pedestrian recognition network.
3. Detecting an output network
Confidence score output by regional extraction networkrConfidence score with pedestrian recognition network outputpPerforming fusion as a final confidence score of the generated candidate region score:
score=αscorer+βscorep
whereinAndwhereinWhich represents the probability of predicting a pedestrian,represents the probability of predicting the background, where r, p, α and β are weighting parametersWhen re is low, the candidate area is determined as the background.
And training the detection network based on a pre-constructed training sample to obtain the optimal parameters of each part of network and obtain the optimized network.
Based on the optimized network, the pedestrian detection method based on the attitude information comprises the following steps:
step S100, acquiring a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr;
Step S200, acquiring comprehensive description of the pedestrian candidate frame based on the pre-trained pedestrian recognition network, performing secondary classification based on the description, and taking a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp;
Step S300, based on scorerAnd scorepAnd acquiring a third confidence score, and executing the range on the set confidence threshold value to determine the pedestrian.
A pedestrian detection system based on attitude information according to a second embodiment of the present invention includes a first unit, a second unit, and a third unit:
the first unit is configured to acquire a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr;
The second unit is configured to acquire a comprehensive description of the pedestrian candidate frame based on a pre-trained pedestrian recognition network, perform secondary classification based on the description, and use a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp;
The third unit is configured to calculate score based on a preset weightrAnd scorepTaking the sum as a third confidence score, and then, executing the range on the set confidence threshold value to determine the pedestrian;
wherein the content of the first and second substances,
the pedestrian recognition network comprises a visual feature module, a human body posture module and a classification module(ii) a The visual feature module is constructed based on a feature extraction network and is used for acquiring the visual description; the human body posture module is constructed based on a convolutional neural network and is used for acquiring the posture description fp(ii) a The classification module is a two-classification network and is used for acquiring a second confidence score based on the comprehensive descriptionp。
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the pedestrian detection system based on the posture information provided in the foregoing embodiment is only exemplified by the division of the above functional modules, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into a plurality of sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A storage device of a third embodiment of the present invention stores therein a plurality of programs adapted to be loaded and executed by a processor to implement the above-described pedestrian detection method based on attitude information.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the above-described pedestrian detection method based on attitude information.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
Claims (11)
1. A pedestrian detection method based on attitude information is characterized by comprising the following steps:
step S100, acquiring a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr;
Step S200, acquiring comprehensive description of the pedestrian candidate frame based on the pre-trained pedestrian recognition network, performing secondary classification based on the description, and taking a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp;
Step S300, calculating score based on preset weightrAnd scorepTaking the sum as a third confidence score, and then, executing the range on the set confidence threshold value to determine the pedestrian;
wherein the content of the first and second substances,
the pedestrian recognition network comprises a visual feature module, a human body posture module and a classification module; the visual characteristic modelThe block is constructed based on a feature extraction network and is used for acquiring the visual description; the human body posture module is constructed based on a convolutional neural network and is used for acquiring the posture description fp(ii) a The classification module is a two-classification network and is used for acquiring a second confidence score based on the comprehensive descriptionp。
2. The pedestrian detection method based on attitude information of claim 1, wherein the area extraction network is constructed based on an object detection network, and a loss function L thereofrpnIs composed of
Wherein L isclsIs a cross-entropy loss of two classes, LregIs the regression loss, gamma is a predetermined coordination parameter, piIs the predicted probability of the i-th pedestrian candidate frame,judging the correct probability for the ith pedestrian candidate frame classification, tiIs a vector of coordinates of the i-th pedestrian candidate frame,and marking a vector of the frame coordinates corresponding to the real pedestrian for the ith pedestrian candidate frame.
4. the pedestrian detection method based on the attitude information of claim 1, wherein the visual feature module is composed of a front 10 network of VGG-19 and a convolution block, and obtains the visual description f based on the pedestrian candidate framevDescription of the vision f by a full link layervCarry out two classifications to obtain confidence score1。
5. The pedestrian detection method based on the attitude information of claim 4, wherein the human body attitude module comprises a feature extraction network, a first sub-network, a second sub-network, a full connection layer;
the feature extraction network is constructed on the basis of a convolution network of VGG-19 and is used for extracting a feature map F of the pedestrian candidate frame;
the first sub-network and the second sub-network are respectively constructed based on a convolutional neural network, and a confidence map S and an associated domain L of a corresponding pedestrian candidate frame are predicted based on a feature map F;
the full connection layer is used for obtaining the attitude description f based on the confidence coefficient graph S and the association domain LpAnd obtaining a confidence score2。
6. The pedestrian detection method based on attitude information of claim 5, wherein the classification module is configured to classify the pedestrian based on a visual description fvAnd attitude description fpObtaining confidence score3And based on the confidence score1Confidence score2Confidence score3Carrying out weighted summation through a preset weighting coefficient to obtain a second confidence scorep。
7. The pedestrian detection method based on the attitude information of claim 6, wherein the third confidence score is calculated by:
score=αscorer+βscorep
wherein α and β are preset weighting parameters.
8. The pedestrian detection method based on the posture information as claimed in any one of claims 1 to 7, wherein one or more of the visual feature module, the human body posture module and the classification module are respectively constrained by corresponding cross entropy loss function memorability during training.
9. A pedestrian detection system based on attitude information is characterized by comprising a first unit, a second unit and a third unit:
the first unit is configured to acquire a pedestrian candidate frame and a corresponding first confidence score based on a pre-trained region extraction networkr;
The second unit is configured to acquire a comprehensive description of the pedestrian candidate frame based on a pre-trained pedestrian recognition network, perform secondary classification based on the description, and use a classification result as a second confidence scorep(ii) a The integrated description comprises a visual description fvAnd attitude description fp;
The third unit is configured to calculate score based on a preset weightrAnd scorepTaking the sum as a third confidence score, and then, executing the range on the set confidence threshold value to determine the pedestrian;
wherein the content of the first and second substances,
the pedestrian recognition network comprises a visual feature module, a human body posture module and a classification module; the visual feature module is constructed based on a feature extraction network and is used for acquiring the visual description; the human body posture module is constructed based on a convolutional neural network and is used for acquiring the posture description fp(ii) a The classification module is a two-classification network and is used for acquiring a second confidence score based on the comprehensive descriptionp。
10. A storage device having stored therein a plurality of programs, characterized in that the programs are adapted to be loaded and executed by a processor to implement the pedestrian detection method based on attitude information according to any one of claims 1 to 8.
11. A processing device comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; characterized in that the program is adapted to be loaded and executed by a processor to implement the pedestrian detection method based on attitude information according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010664330.6A CN111783716A (en) | 2020-07-10 | 2020-07-10 | Pedestrian detection method, system and device based on attitude information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010664330.6A CN111783716A (en) | 2020-07-10 | 2020-07-10 | Pedestrian detection method, system and device based on attitude information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111783716A true CN111783716A (en) | 2020-10-16 |
Family
ID=72767368
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010664330.6A Pending CN111783716A (en) | 2020-07-10 | 2020-07-10 | Pedestrian detection method, system and device based on attitude information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111783716A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560649A (en) * | 2020-12-09 | 2021-03-26 | 广州云从鼎望科技有限公司 | Behavior action detection method, system, equipment and medium |
CN114821818A (en) * | 2022-06-29 | 2022-07-29 | 广东信聚丰科技股份有限公司 | Motion data analysis method and system based on intelligent sports |
CN114863556A (en) * | 2022-04-13 | 2022-08-05 | 上海大学 | Multi-neural-network fusion continuous action recognition method based on skeleton posture |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279742A (en) * | 2013-05-24 | 2013-09-04 | 中国科学院自动化研究所 | Multi-resolution pedestrian detection method and device based on multi-task model |
CN108537136A (en) * | 2018-03-19 | 2018-09-14 | 复旦大学 | The pedestrian's recognition methods again generated based on posture normalized image |
-
2020
- 2020-07-10 CN CN202010664330.6A patent/CN111783716A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279742A (en) * | 2013-05-24 | 2013-09-04 | 中国科学院自动化研究所 | Multi-resolution pedestrian detection method and device based on multi-task model |
CN108537136A (en) * | 2018-03-19 | 2018-09-14 | 复旦大学 | The pedestrian's recognition methods again generated based on posture normalized image |
Non-Patent Citations (4)
Title |
---|
ROSS GIRSHICK: "Fast R-CNN", 《HTTPS://ARXIV.ORG/ABS/1504.08083V2》 * |
Y. JIAO ET AL.: "PEN: Pose-Embedding Network for Pedestrian Detection", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》 * |
Z. CAO ET AL.: "Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
杨露菁 等: "《智能图像处理及应用》", 31 March 2019, 中国铁道出版社 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560649A (en) * | 2020-12-09 | 2021-03-26 | 广州云从鼎望科技有限公司 | Behavior action detection method, system, equipment and medium |
CN114863556A (en) * | 2022-04-13 | 2022-08-05 | 上海大学 | Multi-neural-network fusion continuous action recognition method based on skeleton posture |
CN114821818A (en) * | 2022-06-29 | 2022-07-29 | 广东信聚丰科技股份有限公司 | Motion data analysis method and system based on intelligent sports |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Arietta et al. | City forensics: Using visual elements to predict non-visual city attributes | |
CN111178183B (en) | Face detection method and related device | |
CN112734775B (en) | Image labeling, image semantic segmentation and model training methods and devices | |
CN110287960A (en) | The detection recognition method of curve text in natural scene image | |
CN111126258A (en) | Image recognition method and related device | |
CN111783716A (en) | Pedestrian detection method, system and device based on attitude information | |
US11308714B1 (en) | Artificial intelligence system for identifying and assessing attributes of a property shown in aerial imagery | |
CN108805016B (en) | Head and shoulder area detection method and device | |
CN110674685B (en) | Human body analysis segmentation model and method based on edge information enhancement | |
CN113537070B (en) | Detection method, detection device, electronic equipment and storage medium | |
CN109671055B (en) | Pulmonary nodule detection method and device | |
CN111931764A (en) | Target detection method, target detection framework and related equipment | |
CN113919497A (en) | Attack and defense method based on feature manipulation for continuous learning ability system | |
CN115187786A (en) | Rotation-based CenterNet2 target detection method | |
CN115359366A (en) | Remote sensing image target detection method based on parameter optimization | |
CN113781519A (en) | Target tracking method and target tracking device | |
CN113469099A (en) | Training method, detection method, device, equipment and medium of target detection model | |
CN113673505A (en) | Example segmentation model training method, device and system and storage medium | |
Sun et al. | Automatic building age prediction from street view images | |
CN112926487B (en) | Pedestrian re-identification method and device | |
CN115331162A (en) | Cross-scale infrared pedestrian detection method, system, medium, equipment and terminal | |
CN114387496A (en) | Target detection method and electronic equipment | |
CN113569600A (en) | Method and device for identifying weight of object, electronic equipment and storage medium | |
CN117593890B (en) | Detection method and device for road spilled objects, electronic equipment and storage medium | |
CN110659384A (en) | Video structured analysis method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201016 |