CN108009466B - Pedestrian detection method and device - Google Patents

Pedestrian detection method and device Download PDF

Info

Publication number
CN108009466B
CN108009466B CN201610971349.9A CN201610971349A CN108009466B CN 108009466 B CN108009466 B CN 108009466B CN 201610971349 A CN201610971349 A CN 201610971349A CN 108009466 B CN108009466 B CN 108009466B
Authority
CN
China
Prior art keywords
pedestrian
frames
frame
pedestrians
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610971349.9A
Other languages
Chinese (zh)
Other versions
CN108009466A (en
Inventor
俞刚
彭雨翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kuangshi Technology Co Ltd
Beijing Megvii Technology Co Ltd
Original Assignee
Beijing Kuangshi Technology Co Ltd
Beijing Megvii Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kuangshi Technology Co Ltd, Beijing Megvii Technology Co Ltd filed Critical Beijing Kuangshi Technology Co Ltd
Priority to CN201610971349.9A priority Critical patent/CN108009466B/en
Publication of CN108009466A publication Critical patent/CN108009466A/en
Application granted granted Critical
Publication of CN108009466B publication Critical patent/CN108009466B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Abstract

The embodiment of the invention provides a pedestrian detection method and device. The pedestrian detection method includes: acquiring an image to be processed; detecting a pedestrian in the image to be processed to obtain a pedestrian detection preliminary result, wherein the pedestrian detection preliminary result comprises one or more pedestrian frames, and each pedestrian frame is used for indicating an area in which the pedestrian possibly exists in the image to be processed; performing skeleton analysis on the pedestrians contained in each of the one or more pedestrian frames to obtain skeleton information respectively related to each of the one or more pedestrian frames; and screening the one or more pedestrian frames according to the skeleton information respectively related to each of the one or more pedestrian frames to obtain at least one pedestrian frame corresponding to at least part of pedestrians in the image to be processed one by one. According to the method and the device, the pedestrian frames are screened by using the skeleton information of the pedestrians contained in the pedestrian frames, redundant pedestrian frames of the same pedestrian can be filtered, and meanwhile pedestrian frames of different pedestrians can be reserved.

Description

Pedestrian detection method and device
Technical Field
The invention relates to the field of computers, in particular to a pedestrian detection method and device.
Background
In the field of monitoring, pedestrian detection plays a very important role. The current pedestrian detection algorithm usually extracts a plurality of windows (each window is a rectangular frame and may also be called a pedestrian frame) with different scales from an image to be processed by a sliding-window (sliding-window) method, and judges whether a pedestrian exists in each window. Each window may be scored for a score representing the probability of a pedestrian being present in the window. The sliding window method results in a high score for multiple windows that may exist for the same pedestrian, so after being processed by the sliding window method, it is often necessary to filter the multiple windows corresponding to the same pedestrian using non-maximum suppression (NMS). Those skilled in the art will appreciate that NMS is based primarily on an inter-section-over-intersection of two windows, using a high scoring window to filter other windows that overlap significantly with this window. The approach of filtering windows with the NMS may result in windows containing different pedestrians than the high scoring windows being filtered in pedestrian dense scenarios (e.g., when multiple people are in close proximity), which may greatly reduce the recall (recall) of the pedestrian detection algorithm.
Disclosure of Invention
The present invention has been made in view of the above problems. The invention provides a pedestrian detection method and device.
According to an aspect of the present invention, there is provided a pedestrian detection method. The method comprises the following steps: acquiring an image to be processed; detecting pedestrians in the image to be processed to obtain a pedestrian detection preliminary result, wherein the pedestrian detection preliminary result comprises one or more pedestrian frames, and each pedestrian frame is used for indicating an area in which a pedestrian is likely to exist in the image to be processed; performing skeleton analysis on the pedestrians contained in each of the one or more pedestrian frames to obtain skeleton information respectively related to each of the one or more pedestrian frames; and screening the one or more pedestrian frames according to the skeleton information respectively related to each of the one or more pedestrian frames to obtain at least one pedestrian frame corresponding to at least part of pedestrians in the image to be processed one by one.
Illustratively, after the screening the one or more pedestrian frames according to the skeleton information respectively related to each of the one or more pedestrian frames, the pedestrian detection method further comprises: and for each of the at least one pedestrian frame, judging whether the pedestrian contained in the pedestrian frame is a real pedestrian according to the skeleton information related to the pedestrian frame, and if not, filtering the pedestrian frame.
For example, the skeleton information related to any one of the one or more pedestrian frames includes a keypoint confidence degree in one-to-one correspondence with a certain number of keypoints of the pedestrian included in the pedestrian frame, and for each of the at least one pedestrian frame, determining whether the pedestrian included in the pedestrian frame is a real pedestrian according to the skeleton information related to the pedestrian frame includes: for each of the at least one pedestrian frame, performing summation or averaging on the key point confidence coefficients corresponding to the specific number of key points of the pedestrians contained in the pedestrian frame one by one to obtain a first total confidence coefficient; and for each of the at least one pedestrian frame, comparing the first total confidence with a corresponding confidence threshold, if the first total confidence is greater than the corresponding confidence threshold, determining that the pedestrian contained in the pedestrian frame is a real pedestrian, otherwise determining that the pedestrian contained in the pedestrian frame is not a real pedestrian.
Illustratively, the screening the one or more pedestrian frames according to the skeleton information respectively related to each of the one or more pedestrian frames to obtain at least one pedestrian frame corresponding to at least part of pedestrians in the image to be processed one by one comprises: determining, for any two pedestrian frames of the plurality of pedestrian frames, whether pedestrians contained in the two pedestrian frames are the same pedestrian according to skeleton information respectively associated with each of the two pedestrian frames, if the preliminary result of pedestrian detection includes the plurality of pedestrian frames; and for two or more pedestrian frames containing the same pedestrian, selecting a unique pedestrian frame from the two or more pedestrian frames as one of the at least one pedestrian frame.
For example, the skeleton information related to any one of the one or more pedestrian frames includes a key point feature map in one-to-one correspondence with a specific number of key points of pedestrians included in the pedestrian frame, and the determining, for any two of the multiple pedestrian frames, whether the pedestrians included in the two pedestrian frames are the same pedestrian according to the skeleton information respectively related to each of the two pedestrian frames includes: for any two pedestrian frames in the plurality of pedestrian frames, calculating the similarity between the skeletons of the pedestrians contained in the two pedestrian frames by using the key point feature maps which are in one-to-one correspondence with the specific number of key points of the pedestrians contained in each of the two pedestrian frames; and comparing the calculated similarity with a similarity threshold value for any two pedestrian frames in the plurality of pedestrian frames, if the calculated similarity is greater than the similarity threshold value, determining that the pedestrians contained in the two pedestrian frames are the same pedestrian, otherwise determining that the pedestrians contained in the two pedestrian frames are not the same pedestrian.
Illustratively, the preliminary pedestrian detection result further includes pedestrian frame confidence levels corresponding to the one or more pedestrian frames one to one, the skeleton information related to any one of the one or more pedestrian frames includes keypoint confidence levels corresponding to a certain number of keypoints of the pedestrians included in the pedestrian frame one to one, and for two or more pedestrian frames including the same pedestrian, selecting a unique pedestrian frame from the two or more pedestrian frames as one of the at least one pedestrian frame includes: for two or more pedestrian frames containing the same pedestrian, selecting the unique pedestrian frame from the two or more pedestrian frames according to a pedestrian frame confidence corresponding to the two or more pedestrian frames one to one and a keypoint confidence corresponding to the specific number of keypoints of the pedestrian contained in each of the two or more pedestrian frames.
Illustratively, for two or more pedestrian frames containing the same pedestrian, the selecting the unique pedestrian frame from the two or more pedestrian frames according to the pedestrian frame confidence levels corresponding to the two or more pedestrian frames one-to-one and the keypoint confidence levels corresponding to the specific number of keypoints of the pedestrian contained in each of the two or more pedestrian frames comprises: for two or more pedestrian frames containing the same pedestrian, for each of the two or more pedestrian frames, performing summation or averaging on the confidence coefficient of the pedestrian frame corresponding to the pedestrian frame and the confidence coefficients of the key points, corresponding to the specific number of the key points, of the pedestrian contained in the pedestrian frame to obtain a second total confidence coefficient; and for two or more pedestrian frames containing the same pedestrian, selecting the pedestrian frame with the highest second total confidence as the unique pedestrian frame.
Illustratively, the detecting the pedestrian in the image to be processed to obtain a pedestrian detection preliminary result comprises: inputting the image to be processed into a first convolution neural network so as to extract the characteristics of the image to be processed; and inputting the characteristics of the image to be processed into a second convolutional neural network to obtain the preliminary result of pedestrian detection.
Illustratively, the skeleton analyzing the pedestrians contained in each of the one or more pedestrian frames to obtain skeleton information respectively related to each of the one or more pedestrian frames comprises: inputting the features of the image to be processed and the preliminary result of pedestrian detection into a full convolution network to obtain the skeleton information respectively associated with each of the one or more pedestrian frames.
Illustratively, the pedestrian detection method further includes: acquiring a training image, wherein a target pedestrian frame corresponding to each pedestrian in the training image and target positions of a specific number of key points of each pedestrian are marked; constructing a first loss function by taking target pedestrian frames corresponding to pedestrians in the training image in a one-to-one mode as target values of one or more pedestrian frames in a pedestrian detection preliminary result obtained by the second convolutional neural network aiming at the training image, and constructing a second loss function by taking target positions of the pedestrians in the training image and the key points of the specific number as target values of skeleton information obtained by the full convolutional network aiming at the training image; and training parameters in the first convolutional neural network, the second convolutional neural network, and the full convolutional network using the first loss function and the second loss function.
According to another aspect of the present invention, a pedestrian detection apparatus is provided. The device includes: the image to be processed acquisition module is used for acquiring an image to be processed; the detection module is used for detecting pedestrians in the image to be processed to obtain a pedestrian detection preliminary result, wherein the pedestrian detection preliminary result comprises one or more pedestrian frames, and each pedestrian frame is used for indicating an area where pedestrians are likely to exist in the image to be processed; the skeleton analysis module is used for carrying out skeleton analysis on the pedestrians contained in each of the one or more pedestrian frames so as to obtain skeleton information respectively related to each of the one or more pedestrian frames; and the screening module is used for screening the one or more pedestrian frames according to the skeleton information respectively related to each of the one or more pedestrian frames so as to obtain at least one pedestrian frame in one-to-one correspondence with at least part of pedestrians in the image to be processed.
Exemplarily, the pedestrian detection device further includes: and the real pedestrian judgment module is used for judging whether the pedestrian contained in the pedestrian frame is a real pedestrian or not according to the skeleton information related to the pedestrian frame for each of the at least one pedestrian frame, and filtering the pedestrian frame if the pedestrian contained in the pedestrian frame is not the real pedestrian.
Illustratively, the skeleton information related to any one of the one or more pedestrian frames includes a keypoint confidence degree in one-to-one correspondence with a specific number of keypoints of the pedestrians included in the pedestrian frame, and the real pedestrian determination module includes: a first total confidence obtaining submodule, configured to sum or average, for each of the at least one pedestrian frame, the confidence levels of the keypoints that correspond to the pedestrians included in the pedestrian frame in the specific number one to one, so as to obtain a first total confidence level; and the confidence degree comparison submodule is used for comparing the first total confidence degree with a corresponding confidence degree threshold value for each of the at least one pedestrian frame, if the first total confidence degree is greater than the corresponding confidence degree threshold value, the pedestrian contained in the pedestrian frame is determined to be a real pedestrian, otherwise, the pedestrian contained in the pedestrian frame is determined not to be a real pedestrian.
Illustratively, the screening module includes: the same pedestrian determination submodule is used for determining whether pedestrians contained in any two pedestrian frames in the pedestrian frames are the same pedestrian or not according to skeleton information respectively relevant to each of the two pedestrian frames under the condition that the preliminary pedestrian detection result comprises the multiple pedestrian frames; and a pedestrian frame selection sub-module for selecting, for two or more pedestrian frames containing the same pedestrian, a unique pedestrian frame from among the two or more pedestrian frames as one of the at least one pedestrian frame, in a case where the preliminary result of pedestrian detection includes a plurality of pedestrian frames.
Illustratively, the skeleton information related to any one of the one or more pedestrian frames includes a key point feature map in one-to-one correspondence with a specific number of key points of pedestrians included in the pedestrian frame, and the same pedestrian determination submodule includes: a similarity calculation unit configured to calculate, for any two of the plurality of pedestrian frames, a similarity between skeletons of pedestrians included in the two pedestrian frames using a keypoint feature map that is one-to-one corresponding to the specific number of keypoints of the pedestrians included in each of the two pedestrian frames; and the similarity comparison unit is used for comparing the calculated similarity with a similarity threshold value for any two pedestrian frames in the plurality of pedestrian frames, if the calculated similarity is greater than the similarity threshold value, the pedestrians contained in the two pedestrian frames are determined to be the same pedestrian, otherwise, the pedestrians contained in the two pedestrian frames are determined not to be the same pedestrian.
Illustratively, the preliminary result of pedestrian detection further includes a confidence level of a pedestrian frame corresponding to one of the one or more pedestrian frames, the skeleton information related to any one of the one or more pedestrian frames includes a confidence level of a keypoint corresponding to one of a specific number of keypoints of the pedestrian included in the pedestrian frame, and the pedestrian frame selection submodule includes: a pedestrian frame selection unit configured to, for two or more pedestrian frames including the same pedestrian, select the unique pedestrian frame from the two or more pedestrian frames according to a pedestrian frame confidence corresponding to the two or more pedestrian frames one to one and a keypoint confidence corresponding to the specific number of keypoints of the pedestrian included in each of the two or more pedestrian frames one to one.
Illustratively, the pedestrian frame selection unit includes: a second total confidence obtaining subunit, configured to, for each of two or more pedestrian frames that include the same pedestrian, sum or average a pedestrian frame confidence corresponding to the pedestrian frame and a keypoint confidence corresponding to the pedestrian included in the pedestrian frame and corresponding to the specific number of keypoints one by one, to obtain a second total confidence; and a pedestrian frame selection subunit, configured to select, as the unique pedestrian frame, the pedestrian frame with the highest second total confidence, for two or more pedestrian frames that include the same pedestrian.
Illustratively, the detection module includes: the first input submodule is used for inputting the image to be processed into a first convolution neural network so as to extract the characteristics of the image to be processed; and the second input submodule is used for inputting the characteristics of the image to be processed into a second convolutional neural network so as to obtain the pedestrian detection preliminary result.
Illustratively, the skeletal analysis module comprises: and the third input submodule is used for inputting the characteristics of the image to be processed and the preliminary pedestrian detection result into a full convolution network so as to obtain the skeleton information respectively related to each of the one or more pedestrian frames.
Exemplarily, the pedestrian detection device further includes: the training image acquisition module is used for acquiring a training image, wherein a target pedestrian frame corresponding to each pedestrian in the training image and target positions of a specific number of key points of each pedestrian are marked; a loss function constructing module, configured to construct a first loss function by using a target pedestrian frame corresponding to a pedestrian in the training image in a one-to-one manner as a target value of one or more pedestrian frames in a pedestrian detection preliminary result obtained by the second convolutional neural network for the training image, and construct a second loss function by using a target position of the pedestrian in the training image and the key points of the specific number as a target value of skeleton information obtained by the full convolutional network for the training image; and a training module for training parameters in the first convolutional neural network, the second convolutional neural network and the full convolutional network by using the first loss function and the second loss function.
According to the pedestrian detection method and device provided by the embodiment of the invention, the pedestrian frames are screened by using the skeleton information of the pedestrians contained in the pedestrian frames, so that the aims of filtering redundant pedestrian frames of the same pedestrian and retaining pedestrian frames of different pedestrians can be achieved.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.
FIG. 1 shows a schematic block diagram of an example electronic device for implementing a pedestrian detection method and apparatus in accordance with embodiments of the invention;
FIG. 2 shows a schematic flow diagram of a pedestrian detection method according to one embodiment of the invention;
FIG. 3 shows a schematic diagram of a data processing flow of a pedestrian detection method according to one embodiment of the invention;
FIG. 4 shows a schematic flow diagram of a pedestrian detection method according to another embodiment of the invention;
FIG. 5 shows a schematic block diagram of a pedestrian detection arrangement, according to one embodiment of the present invention; and
FIG. 6 shows a schematic block diagram of a pedestrian detection system in accordance with one embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the invention described herein without inventive step, shall fall within the scope of protection of the invention.
In order to solve the above-mentioned problems, embodiments of the present invention provide a pedestrian detection method and apparatus, which utilize skeleton information of pedestrians contained in a pedestrian frame (i.e., a window) to screen the pedestrian frame, so as to filter redundant pedestrian frames of the same pedestrian and retain pedestrian frames of different pedestrians. The pedestrian detection method provided by the embodiment of the invention can solve the pedestrian detection problem in a crowded (dense crowd) scene to a great extent, and therefore, the pedestrian detection method can be well applied to the monitoring field.
First, an example electronic device 100 for implementing a pedestrian detection method and apparatus according to an embodiment of the present invention is described with reference to fig. 1.
As shown in FIG. 1, electronic device 100 includes one or more processors 102, one or more memory devices 104, an input device 106, an output device 108, and an image capture device 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.
The processor 102 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.
The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.
The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.
The output device 108 may output various information (e.g., images and/or sounds) to an external (e.g., user), and may include one or more of a display, a speaker, etc.
The image capture device 110 may capture images (including video frames) and store the captured images in the storage device 104 for use by other components. The image capture device 110 may be a surveillance camera. It should be understood that the image capture device 110 is merely an example, and the electronic device 100 may not include the image capture device 110. In this case, an image for pedestrian detection may be captured using another image capturing device and the captured image may be transmitted to the electronic apparatus 100.
Illustratively, an example electronic device for implementing the pedestrian detection method and apparatus in accordance with embodiments of the present invention may be implemented on a device such as a personal computer or a remote server.
Next, a pedestrian detection method according to an embodiment of the invention will be described with reference to fig. 2. FIG. 2 shows a schematic flow diagram of a pedestrian detection method 200 according to one embodiment of the invention. As shown in fig. 2, the pedestrian detection method 200 includes the following steps.
In step S210, an image to be processed is acquired.
The image to be processed may be any suitable image that requires pedestrian detection, such as an image captured for a monitored area. The image to be processed may be an original image acquired by an image acquisition device such as a camera, or may be an image obtained after preprocessing the original image.
The image to be processed may be sent to the electronic device 100 by a client device (such as a security device including a monitoring camera) to be processed by the processor 102 of the electronic device 100, or may be collected by an image collecting device 110 (e.g., a camera) included in the electronic device 100 and transmitted to the processor 102 for processing.
In step S220, a pedestrian in the image to be processed is detected to obtain a pedestrian detection preliminary result, wherein the pedestrian detection preliminary result includes one or more pedestrian frames, and each pedestrian frame is used for indicating an area in the image to be processed where the pedestrian may exist.
Step S220 may be implemented using any conventional pedestrian detection algorithm, such as a pedestrian detection algorithm of HOG (histogram of oriented gradients) in combination with SVM (support vector machine). The pedestrian detection algorithm is used for detecting the pedestrian in the object to be processed, and a pedestrian detection preliminary result can be obtained. The pedestrian detection preliminary result may include several pedestrian frames. The pedestrian frame is a rectangular frame for indicating an area in which a pedestrian may be present in the image to be processed. In addition, the pedestrian detection preliminary result may further include a pedestrian frame confidence corresponding to each pedestrian frame, which indicates a probability that a pedestrian exists in the pedestrian frame.
Note that the pedestrian frame obtained in step S220 is a pedestrian frame that is not processed with the NMS. That is, in the pedestrian frame obtained in step S220, the same pedestrian may correspond to a plurality of different pedestrian frames. In addition, there may be two pedestrians at a short distance in the image to be processed, and in this case, the pedestrian frames of the two pedestrians may have a large overlapping area.
In one example, after detecting a pedestrian in the image to be processed using an existing or future possibly implemented pedestrian detection algorithm, several pedestrian frames are obtained, and all the obtained pedestrian frames may be directly regarded as pedestrian frames in the preliminary result of pedestrian detection for subsequent skeletal analysis. In another example, after detecting a pedestrian in the image to be processed using an existing or future possibly implemented pedestrian detection algorithm, a number of pedestrian frames are obtained, from which at least a portion of the pedestrian frames may be selected as pedestrian frames in the preliminary result of pedestrian detection for subsequent skeletal analysis. For example, a pedestrian frame with a pedestrian frame confidence greater than a preset threshold may be selected from the obtained pedestrian frames as a pedestrian frame in the preliminary result of pedestrian detection for subsequent skeleton analysis, and the pedestrian frame with the pedestrian frame confidence not greater than the preset threshold may be discarded.
In step S230, skeleton analysis is performed on the pedestrians contained in each of the one or more pedestrian frames to obtain skeleton information respectively associated with each of the one or more pedestrian frames.
The skeleton of the pedestrian can be represented by some key points (or joint points) on the pedestrian, and the key points can be, for example, the head, the neck, the left shoulder, the right shoulder, the left hand, the right hand, the left foot, the right foot, and the like. The location of the keypoints representing the skeleton of the pedestrian on the pedestrian body and the number of keypoints (i.e., the specific number described herein) can be set as desired, and the invention is not limited thereto. In one example, the skeletal information described herein may include the locations of a particular number of keypoints of a pedestrian. In another example, the skeletal information described herein may include a keypoint feature map that corresponds one-to-one to a particular number of keypoints for pedestrians. The location of each keypoint may be determined from the keypoint signature corresponding thereto. The key point feature map will be described in detail below, and will not be described herein.
The skeleton analysis algorithm is a pedestrian posture estimation algorithm, and the positions of certain key points of the pedestrian can be determined by using the algorithm, so that the skeleton information of the pedestrian is obtained.
In step S240, one or more pedestrian frames are filtered according to the skeleton information respectively associated with each of the one or more pedestrian frames, so as to obtain at least one pedestrian frame corresponding to at least a part of pedestrians in the image to be processed.
It is understood that if the number of pedestrian frames obtained in step S220 is only one, the pedestrian frame may be directly retained in step S240, i.e., only one pedestrian in the image to be processed is detected. If the number of pedestrian frames obtained in step S220 is more than one, the obtained pedestrian frames may be subjected to filtering. For the latter case, it may be determined whether different pedestrian frames contain the same pedestrian first, and then, according to the determination result, two cases may be further processed. First, for a case where the number of pedestrian frames obtained in step S220 is more than one and pedestrians included in the obtained pedestrian frames are different from each other, all the pedestrian frames obtained in step S220 may be retained. In addition, for the case where there is more than one pedestrian frame obtained in step S220 and there is at least two pedestrians contained in the obtained pedestrian frames that are the same pedestrian, a plurality of pedestrian frames containing the same pedestrian may be filtered, only one pedestrian frame is reserved for each pedestrian, and finally at least one pedestrian frame corresponding to at least part of the pedestrians in the image to be processed one by one may be obtained.
It can be understood that if the pedestrians contained in the two pedestrian frames are the same pedestrian, the skeleton information respectively related to the two pedestrian frames should be consistent, that is, the positions of the key points of the pedestrians contained in the two pedestrian frames are the same. Therefore, the pedestrian frames can be screened according to the skeleton information related to the pedestrian frames, one pedestrian frame is selected from a plurality of pedestrian frames containing the same pedestrian, and the rest pedestrian frames containing the same pedestrian are discarded. Through the above operation, only one pedestrian frame can be reserved for each pedestrian, so that at least one pedestrian frame corresponding to at least one pedestrian in the image to be processed in a one-to-one mode is obtained. In contrast, if the pedestrians included in the two pedestrian frames are not the same pedestrian, the skeleton information respectively related to the two pedestrian frames is different, and even if the two pedestrian frames have a large overlapping area, the other pedestrian frame is not filtered out by one pedestrian frame. Thus, the manner in which pedestrian frames are screened using skeletal information rather than NMS may avoid erroneously filtering pedestrian frames containing different pedestrians.
For example, at least one pedestrian frame obtained in step S240 may be regarded as a final result of pedestrian detection. Of course, some subsequent processing may be performed on at least one pedestrian frame obtained in step S240, for example, as described below, it may be determined whether a pedestrian included in the pedestrian frame is a real pedestrian and the pedestrian frame may be filtered according to the determination result. Subsequently, the pedestrian frame obtained through the subsequent processing is taken as a final result of the pedestrian detection.
According to the pedestrian detection method provided by the embodiment of the invention, because the pedestrian frames are screened by using the skeleton information of the pedestrians contained in the pedestrian frames, the aims of filtering redundant pedestrian frames of the same pedestrian and simultaneously reserving pedestrian frames of different pedestrians can be achieved. The pedestrian detection method can avoid the problem that the pedestrian frame of a certain pedestrian is used for filtering the pedestrian frames of other pedestrians brought by NMS, so that the accuracy of the pedestrian detection result can be improved, and the pedestrian detection method has very important value for pedestrian monitoring (particularly pedestrian monitoring in a pedestrian dense scene).
Illustratively, the pedestrian detection method according to the embodiment of the invention may be implemented in a device, apparatus or system having a memory and a processor.
The pedestrian detection method can be deployed at an image acquisition end, for example, the pedestrian detection method can be deployed at the image acquisition end of a community access control system or the image acquisition end of a security monitoring system in public places such as stations, shopping malls, banks and the like. Alternatively, the pedestrian detection method according to the embodiment of the present invention may also be distributively deployed at the server side (or cloud side) and the client side. For example, an image may be collected at a client, and the client transmits the collected image to a server (or a cloud), so that the server (or the cloud) performs pedestrian detection.
According to an embodiment of the present invention, step S220 may include: inputting an image to be processed into a first convolution neural network so as to extract the characteristics of the image to be processed; and inputting the characteristics of the image to be processed into a second convolutional neural network to obtain a pedestrian detection preliminary result.
The first Convolutional Neural Network (CNN) and the second convolutional Neural Network may be pre-trained with a large number of training images.
Referring to fig. 3, a schematic diagram illustrating a data processing flow of a pedestrian detection method according to an embodiment of the present invention is shown. As shown in fig. 3, after the to-be-processed image is acquired, the to-be-processed image is input into the first convolutional neural network for feature extraction. The image to be processed may be a static image or any video frame in a video. At the output of the first convolutional neural network, a plurality of feature maps (feature maps) may be obtained. The characteristic graphs output by the first convolution neural network are the characteristics of the image to be processed. Illustratively, the first convolutional neural network may be implemented using a VGG model or a residual network (ResNet) model obtained by pre-training on an ImagNet data set. Using the first convolutional neural network, valuable information in the image to be processed can be extracted, and window prediction can then be performed based on this information, as described below.
All feature maps output by the first convolutional neural network can be input into the second convolutional neural network for processing. The second convolutional neural network is a classifier that can implement the sliding window method described above. That is, the processing procedure of the second convolutional neural network may be understood as extracting windows of various scales from the image to be processed and determining the probability of the existence of a pedestrian in each window (i.e., the pedestrian frame confidence). For example, a window with a probability greater than a preset threshold may be selected, and the selected window is one or more pedestrian frames in the preliminary result of pedestrian detection described herein. As described above, the pedestrian detection preliminary result may include one or more pedestrian frames and pedestrian frame confidences in one-to-one correspondence with the one or more pedestrian frames.
The convolutional neural network is a network capable of autonomous learning, and the first convolutional neural network and the second convolutional neural network can be used for accurately and efficiently detecting the pedestrians in the image to be processed.
According to the embodiment of the present invention, step S230 may include: features of the image to be processed and the preliminary pedestrian detection result are input into a full convolution network to obtain skeletal information respectively associated with each of the one or more pedestrian frames.
The full-Convolutional Network (FCN) described herein may be similar to a full-Convolutional Network for semantic segmentation. With continued reference to fig. 3, the features of the image to be processed output by the first convolutional neural network and the preliminary pedestrian detection result output by the second convolutional neural network may be input to a full convolutional network for skeleton analysis. The pedestrian frames in the preliminary result of pedestrian detection may be input into the full convolution network one by one. After the features of the image to be processed and a certain pedestrian frame are input into the full convolution network, skeleton information related to the pedestrian frame can be obtained at the output end of the full convolution network. For example, assuming that the skeleton of a pedestrian is represented by 15 keypoints (including head, neck, left hand, left foot, etc.), for each pedestrian frame, 15 keypoint feature maps (including head feature map, neck feature map, left hand feature map, left foot feature map, etc.) can be obtained at the output of the full convolution network. Each keypoint feature map may represent the location of the corresponding keypoint. Illustratively, any keypoint feature map is consistent with the size of the image to be processed, and the pixel value of each pixel on any keypoint feature map represents the probability that the keypoint corresponding to the keypoint feature map exists at the pixel on the image to be processed, which is consistent with the position of the pixel.
Similar to the first convolutional neural network and the second convolutional neural network, the full convolutional network may be trained in advance with a large number of training images. The training modes of the first convolutional neural network, the second convolutional neural network and the full convolutional network will be described below, and are not described herein again.
According to the embodiment of the present invention, step S240 may include: determining whether pedestrians contained in any two pedestrian frames in the multiple pedestrian frames are the same pedestrian according to skeleton information respectively related to each of the two pedestrian frames under the condition that the preliminary pedestrian detection result comprises the multiple pedestrian frames; and for two or more pedestrian frames containing the same pedestrian, selecting a unique pedestrian frame from the two or more pedestrian frames as one of the at least one pedestrian frame.
In step S240, it may be determined whether two pedestrians respectively included in the two pedestrian frames are the same pedestrian according to the skeleton information related to the two pedestrian frames. For example, the similarity between the skeletons of two pedestrians contained in two pedestrian frames can be determined by using the keypoint feature maps respectively related to the two pedestrian frames, two pedestrians with sufficient similarity are regarded as the same pedestrian, and two pedestrians with insufficient similarity are regarded as different pedestrians.
If two or more pedestrian frames containing the same pedestrian exist among the pedestrian frames obtained in step S220, a unique pedestrian frame is selected from the pedestrian frames as the pedestrian frame corresponding to the pedestrian, and the remaining pedestrian frames containing the pedestrian can be discarded.
In this way, a plurality of pedestrian frames including the same pedestrian may not exist in the pedestrian frames obtained by screening.
According to an embodiment of the present invention, the skeleton information associated with any one of the one or more pedestrian frames includes a key point feature map in one-to-one correspondence with a specific number of key points of pedestrians included in the pedestrian frame, and for any two of the plurality of pedestrian frames, determining whether the pedestrians included in the two pedestrian frames are the same pedestrian according to the skeleton information associated with each of the two pedestrian frames, respectively, includes: for any two pedestrian frames in the multiple pedestrian frames, calculating the similarity between skeletons of pedestrians contained in the two pedestrian frames by using a key point feature map which is in one-to-one correspondence with the specific number of key points of the pedestrians contained in each of the two pedestrian frames; and comparing the calculated similarity with a similarity threshold value for any two pedestrian frames in the plurality of pedestrian frames, if the calculated similarity is greater than the similarity threshold value, determining that the pedestrians contained in the two pedestrian frames are the same pedestrian, otherwise determining that the pedestrians contained in the two pedestrian frames are not the same pedestrian.
The number of key points, i.e. the specific number, used to represent the skeleton of the pedestrian may be any suitable number, e.g. 5, 10, 15, etc. For example, assuming that the skeleton of a pedestrian is represented by 15 key points, each pedestrian frame corresponds to 15 key point feature maps, which are respectively used to represent the positions of the corresponding key points.
In one example, for each of a certain number of key points, the distance between the positions of the key points of two pedestrians can be calculated by using the key point feature maps corresponding to the key points of the two pedestrians respectively contained in the two pedestrian frames. For example, the distance between the head of the pedestrian a and the head of the pedestrian B may be calculated using the head feature map corresponding to the head of the pedestrian a included in the pedestrian frame a and the head feature map corresponding to the head of the pedestrian B included in the pedestrian frame B, the distance … … between the left hand of the pedestrian a and the left hand of the pedestrian B may be calculated using the left-hand feature map corresponding to the left hand of the pedestrian a included in the pedestrian frame a and the left-hand feature map corresponding to the left hand of the pedestrian B included in the pedestrian frame B, and so on, and finally a certain number (e.g., 15) of distances may be obtained. These distances may reflect the difference between the skeleton of the pedestrian a and the skeleton of the pedestrian b, and therefore the similarity between the two may be calculated based on these distances. And comparing the calculated similarity with a similarity threshold, if the calculated similarity is greater than the similarity threshold, determining that the pedestrian a and the pedestrian b are the same pedestrian, and otherwise, determining that the pedestrian a and the pedestrian b are not the same pedestrian. The similarity threshold may be any suitable value, which may be set as desired.
In another example, differences between other skeletal features of pedestrians may be calculated from the keypoint feature map. For example, a head-neck connecting line from the head to the neck of the pedestrian can be calculated according to the head characteristic diagram and the neck characteristic diagram of the pedestrian contained in each pedestrian frame, a neck-back connecting line from the neck to the back center of the pedestrian can be calculated according to the neck characteristic diagram and the waist characteristic diagram of the pedestrian contained in each pedestrian frame, and the like. These lines between certain parts of the person in the row can also be considered as skeletal features. Subsequently, the difference between the skeleton features of the two pedestrians respectively contained in the two pedestrian frames can be calculated, and the similarity between the skeletons of the two pedestrians can be determined. For example, the distance between the head-neck connecting line of the pedestrian a included in the pedestrian frame a and the head-neck connecting line of the pedestrian B included in the pedestrian frame B may be calculated, the distance … … between the nape connecting line of the pedestrian a included in the pedestrian frame a and the nape connecting line of the pedestrian B included in the pedestrian frame B may be calculated, and so on, and finally the distances of a plurality of skeleton features may be obtained. Similarly to the above example, the similarity between the skeleton of the pedestrian a and the skeleton of the pedestrian b may be calculated based on these distances, thereby determining whether the pedestrian a and the pedestrian b are the same pedestrian.
Whether the pedestrians contained in the two pedestrian frames are the same pedestrian can be simply and accurately judged through the similarity between the skeletons.
According to an embodiment of the present invention, the preliminary result of pedestrian detection may further include a confidence level of a pedestrian frame corresponding to one of the one or more pedestrian frames, the skeleton information related to any one of the one or more pedestrian frames may include confidence levels of key points corresponding to one-to-one with a specific number of key points of pedestrians included in the pedestrian frame, and for two or more pedestrian frames including the same pedestrian, selecting a unique pedestrian frame from the two or more pedestrian frames as one of the at least one pedestrian frame may include: for two or more pedestrian frames containing the same pedestrian, a unique pedestrian frame is selected from the two or more pedestrian frames according to the pedestrian frame confidence levels corresponding to the two or more pedestrian frames one to one and the keypoint confidence levels corresponding to a specific number of keypoints of the pedestrian contained in each of the two or more pedestrian frames one to one.
Illustratively, in addition to outputting the keypoint feature map, the full convolution network may also output a keypoint confidence corresponding to each keypoint for representing the probability that the keypoint is a true keypoint.
Following the example above, the skeleton of the pedestrian is represented with 15 keypoints, then for each pedestrian frame it has 1 pedestrian frame confidence and 15 keypoint confidence. Some arithmetic operations may be performed on the 1 pedestrian box confidence and the 15 keypoint confidence to take these confidences into account in combination. For example, for each pedestrian box, 1 pedestrian box confidence and 15 keypoint confidence may simply be added to obtain one total confidence (i.e., the second total confidence described herein). For another example, for each pedestrian frame, the 1 pedestrian frame confidence and the 15 keypoint confidence may be arithmetically averaged to obtain a total confidence (i.e., the second total confidence described herein). For another example, for each pedestrian frame, the confidence levels of 1 pedestrian frame and 15 keypoint confidence levels may be weighted and averaged to obtain a total confidence level (i.e., the second total confidence level described herein), wherein the weight of each confidence level may be set as needed. Of course, the above calculation manner of the second total confidence is only an example and is not limited, and the invention may also adopt other suitable manners to calculate the second total confidence for measuring whether the pedestrian frame can be selected as the unique pedestrian frame corresponding to the pedestrian contained therein.
As described above, the pedestrian frame confidence may represent a probability that a pedestrian exists in the pedestrian frame, and the keypoint confidence may represent a probability that the corresponding keypoint is a true keypoint, and thus, when selecting a unique pedestrian frame corresponding to each pedestrian, the pedestrian frame confidence and the keypoint confidence may be considered in combination, so that the reliability of the selected unique pedestrian frame is high.
According to an embodiment of the present invention, for two or more pedestrian frames containing the same pedestrian, selecting a unique pedestrian frame from the two or more pedestrian frames according to the pedestrian frame confidence levels corresponding to the two or more pedestrian frames one to one and the keypoint confidence levels corresponding to a specific number of keypoints of the pedestrian contained in each of the two or more pedestrian frames includes: for two or more pedestrian frames containing the same pedestrian, aiming at each of the two or more pedestrian frames, carrying out summation or averaging on the confidence coefficient of the pedestrian frame corresponding to the pedestrian frame and the confidence coefficients of the key points which are in one-to-one correspondence with the pedestrian and the key points in a specific number contained in the pedestrian frame so as to obtain a second total confidence coefficient; and for two or more pedestrian frames containing the same pedestrian, selecting the pedestrian frame with the highest second total confidence as the only pedestrian frame.
The calculation manner of the second total confidence has already been described above, and is not described herein again. The selection of the unique pedestrian frame corresponding to each pedestrian is described below. For example, it is assumed that 5 pedestrian frames in the preliminary result of pedestrian detection contain the same pedestrian, which is denoted by pedestrian X, i.e., 5 pedestrian frames correspond to pedestrian X. Further, assuming that the second total confidence degrees of the 5 pedestrian frames are 0.8, 0.65, 0.9, 0.75, and 0.7, respectively, the pedestrian frame having the second total confidence degree of 0.9 is selected as the unique pedestrian frame corresponding to the pedestrian X, and the remaining 4 pedestrian frames are discarded.
The only pedestrian frame corresponding to each pedestrian is selected through the confidence coefficient, and the most accurate and reasonable pedestrian frame corresponding to each pedestrian can be obtained.
FIG. 4 shows a schematic flow diagram of a pedestrian detection method 400 according to another embodiment of the invention. In fig. 4, steps S410 to S440 correspond to steps S210 to S240 of the pedestrian detection method 200 shown in fig. 2, respectively. The embodiments of steps S410 to S440 shown in fig. 4 can be understood by referring to the above description about fig. 2, and are not repeated. According to the present embodiment, the pedestrian detection method 400 further includes step S450.
In step S450, for each of at least one pedestrian frame, it is determined whether the pedestrian included in the pedestrian frame is a real pedestrian according to the skeleton information related to the pedestrian frame, and if not, the pedestrian frame is filtered.
Some false positives in pedestrian detection (false positives) can be filtered using the results of the skeletal analysis. For example, if a certain pedestrian frame is falsely reported to contain a pedestrian, but the skeleton of the pedestrian contained in the pedestrian frame is unreasonable, which indicates that the pedestrian contained in the pedestrian frame is not a real pedestrian. Therefore, whether the skeleton of the pedestrian contained in the pedestrian frame is reasonable or not can be judged by utilizing the skeleton information, and if the skeleton is not reasonable, the pedestrian frame is filtered. The accuracy of the pedestrian detection result can be improved by using the mode of filtering the pedestrian frame by the skeleton information.
According to the embodiment of the present invention, the skeleton information related to any one of the one or more pedestrian frames includes the confidence degrees of the key points corresponding to the specific number of key points of the pedestrians included in the pedestrian frame, and step S450 may include: for each of at least one pedestrian frame, carrying out summation or averaging on the key point confidence coefficients which are in one-to-one correspondence with the key points of a specific number of pedestrians contained in the pedestrian frame so as to obtain a first total confidence coefficient; and for each of the at least one pedestrian frame, comparing the first total confidence with the corresponding confidence threshold, if the first total confidence is greater than the corresponding confidence threshold, determining that the pedestrian contained in the pedestrian frame is a real pedestrian, otherwise determining that the pedestrian contained in the pedestrian frame is not a real pedestrian.
The first overall confidence is calculated in a similar manner as the second overall confidence, except that the keypoint confidence is primarily considered in calculating the first overall confidence. For example, assuming that the skeleton of a pedestrian is represented by 15 keypoints, some arithmetic operations, such as simple addition, arithmetic average, weighted average, or the like, may be performed on the confidence degrees of the 15 keypoints related to a certain pedestrian frame to obtain a first total confidence degree of the pedestrian frame. Those skilled in the art can refer to the above calculation method of the second total confidence level to understand the calculation method of the first total confidence level, which is not described herein again.
The confidence threshold may be any suitable value, which may be determined by experimental testing or theoretical calculations, etc. It will be appreciated that the confidence thresholds may differ as to the manner in which the first overall confidence is calculated. Thus, the "corresponding confidence threshold" described herein is the confidence threshold corresponding to the manner in which the first total confidence is calculated. When the first total confidence degree is compared with the confidence degree threshold value, the corresponding confidence degree threshold value can be selected to participate in the comparison according to the calculation mode of the first total confidence degree. And regarding the pedestrian frames with the first overall confidence degrees larger than the confidence degree threshold value, regarding the contained pedestrians as real pedestrians, and regarding the pedestrian frames not as real pedestrians otherwise.
It is understood that the pedestrian frames obtained in step S440 are in one-to-one correspondence with pedestrians, and therefore, after filtering in step S450, the pedestrian frames that include pedestrians that are not real pedestrians are excluded, and the remainder is at least one pedestrian frame in one-to-one correspondence with at least one real pedestrian. Of course, it is also possible that all pedestrian frames are discarded after the filtering in step S450.
Illustratively, before the step S240 (or S440), the method 200 (or 400) may further include: and for each of one or more pedestrian frames, judging whether the pedestrian contained in the pedestrian frame is a real pedestrian according to the skeleton information related to the pedestrian frame, and if not, filtering the pedestrian frame.
That is to say, the step of determining whether the pedestrian included in the pedestrian frame is a real pedestrian and filtering the pedestrian frame according to the determination result may also be performed before the pedestrian frame is screened, and the implementation manner is similar to that in step S450, and is not described again. Compared with the method of filtering the misjudged pedestrian frame before screening the pedestrian frame, the method of filtering the misjudged pedestrian frame after screening the pedestrian frame has smaller data amount, can avoid meaningless operation and improve the pedestrian detection efficiency.
As described above, the first convolutional neural network, the second convolutional neural network, and the full convolutional network may be obtained by training in advance using a large number of training images, and exemplary training steps are described below.
According to an embodiment of the invention, the method 200 (or 400) may further comprise: acquiring a training image, wherein a target pedestrian frame corresponding to each pedestrian in the training image and target positions of a specific number of key points of each pedestrian are marked; constructing a first loss function by taking target pedestrian frames which are in one-to-one correspondence with pedestrians in the training image as target values of one or more pedestrian frames in a pedestrian detection preliminary result obtained by a second convolutional neural network aiming at the training image, and constructing a second loss function by taking target positions of a specific number of key points of the pedestrians in the training image as target values of skeleton information obtained by a full convolutional network aiming at the training image; and training parameters in the first convolutional neural network, the second convolutional neural network and the full convolutional network by using the first loss function and the second loss function.
Illustratively, in the process of training the first convolutional neural network, the second convolutional neural network and the full convolutional network, pre-training may be performed on the ImageNet data set first, and then fine-tuning (fine-tune) may be performed on the pedestrian-specific data set, so that the convergence speed of the network may be increased, and meanwhile, some underlying network information learned for general (general) images is also effective for pedestrian images.
A loss function (i.e., the first loss function) may be added at the output of the second convolutional neural network to help it learn some valuable information. In addition, a loss function (i.e., a second loss function) can be added to the output end of the full convolution network to train the skeleton analysis model. Illustratively, the second loss function may be a cross-entropy loss function. Compared with the conventional Euclidean distance loss function, the penalty term of the cross entropy loss function is more reasonable in design, for example, the penalty degree can be reduced along with the increase of the confidence coefficient of a pedestrian frame, and therefore the network can be trained better.
In the process of training the first convolutional neural network, the second convolutional neural network and the full convolutional network, a conventional back propagation algorithm may be used for training, and those skilled in the art can understand the implementation manner of the back propagation algorithm, which is not described herein in detail.
According to another aspect of the present invention, a pedestrian detection apparatus is provided. Fig. 5 shows a schematic block diagram of a pedestrian detection apparatus 500 according to one embodiment of the invention.
As shown in fig. 5, the pedestrian detection apparatus 500 according to the embodiment of the present invention includes a to-be-processed image acquisition module 510, a detection module 520, a skeleton analysis module 530, and a screening module 540. The various modules may perform the various steps/functions of the pedestrian detection method described above in connection with fig. 2-4, respectively. Only the main functions of the respective components of the pedestrian detection apparatus 500 will be described below, and details that have been described above will be omitted.
The to-be-processed image obtaining module 510 is configured to obtain an image to be processed. The pending image acquisition module 510 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.
The detection module 520 is configured to detect a pedestrian in the image to be processed to obtain a preliminary result of pedestrian detection, where the preliminary result of pedestrian detection includes one or more pedestrian frames, and each pedestrian frame is used to indicate an area in the image to be processed where a pedestrian may exist. The detection module 520 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.
The skeleton analysis module 530 is configured to perform skeleton analysis on the pedestrians included in each of the one or more pedestrian frames to obtain skeleton information respectively related to each of the one or more pedestrian frames. Skeletal analysis module 530 may be implemented by processor 102 in the electronic device shown in fig. 1 executing program instructions stored in storage 104.
The screening module 540 is configured to screen the one or more pedestrian frames according to the skeleton information respectively associated with each of the one or more pedestrian frames, so as to obtain at least one pedestrian frame corresponding to at least some pedestrians in the image to be processed one to one. The screening module 540 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.
According to an embodiment of the present invention, the pedestrian detection apparatus 500 further includes: and a real pedestrian judging module (not shown) for judging, for each of the at least one pedestrian frame, whether a pedestrian contained in the pedestrian frame is a real pedestrian according to the skeleton information related to the pedestrian frame, and if not, filtering the pedestrian frame.
According to an embodiment of the present invention, the skeleton information related to any one of the one or more pedestrian frames includes a keypoint confidence degree in one-to-one correspondence with a specific number of keypoints of pedestrians included in the pedestrian frame, and the real pedestrian determination module includes: a first total confidence obtaining submodule, configured to sum or average, for each of the at least one pedestrian frame, the confidence levels of the keypoints that correspond to the pedestrians included in the pedestrian frame in the specific number one to one, so as to obtain a first total confidence level; and the confidence degree comparison submodule is used for comparing the first total confidence degree with a corresponding confidence degree threshold value for each of the at least one pedestrian frame, if the first total confidence degree is greater than the corresponding confidence degree threshold value, the pedestrian contained in the pedestrian frame is determined to be a real pedestrian, otherwise, the pedestrian contained in the pedestrian frame is determined not to be a real pedestrian.
According to an embodiment of the present invention, the screening module 540 includes: the same pedestrian determination submodule is used for determining whether pedestrians contained in any two pedestrian frames in the pedestrian frames are the same pedestrian or not according to skeleton information respectively relevant to each of the two pedestrian frames under the condition that the preliminary pedestrian detection result comprises the multiple pedestrian frames; and a pedestrian frame selection sub-module for selecting, for two or more pedestrian frames containing the same pedestrian, a unique pedestrian frame from among the two or more pedestrian frames as one of the at least one pedestrian frame, in a case where the preliminary result of pedestrian detection includes a plurality of pedestrian frames.
According to an embodiment of the present invention, the skeleton information related to any one of the one or more pedestrian frames includes a key point feature map in one-to-one correspondence with a specific number of key points of pedestrians included in the pedestrian frame, and the same pedestrian determination submodule includes: a similarity calculation unit configured to calculate, for any two of the plurality of pedestrian frames, a similarity between skeletons of pedestrians included in the two pedestrian frames using a keypoint feature map that is one-to-one corresponding to the specific number of keypoints of the pedestrians included in each of the two pedestrian frames; and the similarity comparison unit is used for comparing the calculated similarity with a similarity threshold value for any two pedestrian frames in the plurality of pedestrian frames, if the calculated similarity is greater than the similarity threshold value, the pedestrians contained in the two pedestrian frames are determined to be the same pedestrian, otherwise, the pedestrians contained in the two pedestrian frames are determined not to be the same pedestrian.
According to an embodiment of the present invention, the preliminary result of pedestrian detection further includes a confidence level of a pedestrian frame corresponding to the one or more pedestrian frames, the skeleton information related to any one of the one or more pedestrian frames includes a confidence level of a keypoint corresponding to a specific number of keypoints of pedestrians included in the pedestrian frame, and the pedestrian frame selection submodule includes: a pedestrian frame selection unit configured to, for two or more pedestrian frames including the same pedestrian, select the unique pedestrian frame from the two or more pedestrian frames according to a pedestrian frame confidence corresponding to the two or more pedestrian frames one to one and a keypoint confidence corresponding to the specific number of keypoints of the pedestrian included in each of the two or more pedestrian frames one to one.
According to an embodiment of the present invention, the pedestrian frame selecting unit includes: a second total confidence obtaining subunit, configured to, for each of two or more pedestrian frames that include the same pedestrian, sum or average a pedestrian frame confidence corresponding to the pedestrian frame and a keypoint confidence corresponding to the pedestrian included in the pedestrian frame and corresponding to the specific number of keypoints one by one, to obtain a second total confidence; and a pedestrian frame selection subunit, configured to select, as the unique pedestrian frame, the pedestrian frame with the highest second total confidence, for two or more pedestrian frames that include the same pedestrian.
According to an embodiment of the present invention, the detecting module 520 includes: the first input submodule is used for inputting the image to be processed into a first convolution neural network so as to extract the characteristics of the image to be processed; and the second input submodule is used for inputting the characteristics of the image to be processed into a second convolutional neural network so as to obtain the pedestrian detection preliminary result.
According to an embodiment of the present invention, the skeleton analysis module 530 includes: and the third input submodule is used for inputting the characteristics of the image to be processed and the preliminary pedestrian detection result into a full convolution network so as to obtain the skeleton information respectively related to each of the one or more pedestrian frames.
According to an embodiment of the present invention, the pedestrian detection apparatus 500 further includes: the training image acquisition module is used for acquiring a training image, wherein a target pedestrian frame corresponding to each pedestrian in the training image and target positions of a specific number of key points of each pedestrian are marked; a loss function constructing module, configured to construct a first loss function by using a target pedestrian frame corresponding to a pedestrian in the training image in a one-to-one manner as a target value of one or more pedestrian frames in a pedestrian detection preliminary result obtained by the second convolutional neural network for the training image, and construct a second loss function by using a target position of the pedestrian in the training image and the key points of the specific number as a target value of skeleton information obtained by the full convolutional network for the training image; and a training module for training parameters in the first convolutional neural network, the second convolutional neural network and the full convolutional network by using the first loss function and the second loss function.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
FIG. 6 shows a schematic block diagram of a pedestrian detection system 600 according to one embodiment of the invention. The pedestrian detection system 600 includes an image capture device 610, a storage device 620, and a processor 630.
The image capturing device 610 is used for capturing an image to be processed. The image capture device 610 is optional and the pedestrian detection system 600 may not include the image capture device 610. In this case, an image for pedestrian detection may be acquired using another image acquisition device and the acquired image may be transmitted to the pedestrian detection system 600.
The storage device 620 stores program codes for implementing respective steps in the pedestrian detection method according to the embodiment of the invention.
The processor 630 is configured to run the program codes stored in the storage device 620 to execute the corresponding steps of the pedestrian detection method according to the embodiment of the invention, and is configured to be implemented in the pedestrian detection device according to the embodiment of the invention.
In one embodiment, the program code, when executed by the processor 630, causes the pedestrian detection system 600 to perform the steps of: acquiring an image to be processed; detecting pedestrians in the image to be processed to obtain a pedestrian detection preliminary result, wherein the pedestrian detection preliminary result comprises one or more pedestrian frames, and each pedestrian frame is used for indicating an area in which a pedestrian is likely to exist in the image to be processed; performing skeleton analysis on the pedestrians contained in each of the one or more pedestrian frames to obtain skeleton information respectively related to each of the one or more pedestrian frames; and screening the one or more pedestrian frames according to the skeleton information respectively related to each of the one or more pedestrian frames to obtain at least one pedestrian frame corresponding to at least part of pedestrians in the image to be processed one by one.
In one embodiment, after the program code being executed by the processor 630 causes the pedestrian detection system 600 to perform the step of filtering the one or more pedestrian frames according to the skeletal information respectively associated with each of the one or more pedestrian frames, the program code being executed by the processor 630 further causes the pedestrian detection system 600 to perform: and for each of the at least one pedestrian frame, judging whether the pedestrian contained in the pedestrian frame is a real pedestrian according to the skeleton information related to the pedestrian frame, and if not, filtering the pedestrian frame.
In one embodiment, the skeleton information related to any one of the one or more pedestrian frames includes a confidence level of a key point corresponding to a specific number of key points of the pedestrian included in the pedestrian frame, and the program code, when executed by the processor 630, causes the pedestrian detection system 600 to perform, for each of the at least one pedestrian frame, the step of determining whether the pedestrian included in the pedestrian frame is a real pedestrian according to the skeleton information related to the pedestrian frame includes: for each of the at least one pedestrian frame, performing summation or averaging on the key point confidence coefficients corresponding to the specific number of key points of the pedestrians contained in the pedestrian frame one by one to obtain a first total confidence coefficient; and for each of the at least one pedestrian frame, comparing the first total confidence with a corresponding confidence threshold, if the first total confidence is greater than the corresponding confidence threshold, determining that the pedestrian contained in the pedestrian frame is a real pedestrian, otherwise determining that the pedestrian contained in the pedestrian frame is not a real pedestrian.
In one embodiment, the program code, when executed by the processor 630, causes the pedestrian detection system 600 to perform the step of filtering the one or more pedestrian frames according to the skeleton information respectively associated with each of the one or more pedestrian frames to obtain at least one pedestrian frame corresponding to at least a part of pedestrians in the image to be processed, including: determining, for any two pedestrian frames of the plurality of pedestrian frames, whether pedestrians contained in the two pedestrian frames are the same pedestrian according to skeleton information respectively associated with each of the two pedestrian frames, if the preliminary result of pedestrian detection includes the plurality of pedestrian frames; and for two or more pedestrian frames containing the same pedestrian, selecting a unique pedestrian frame from the two or more pedestrian frames as one of the at least one pedestrian frame.
In one embodiment, the skeleton information associated with any one of the one or more pedestrian frames includes a key point feature map corresponding to a specific number of key points of pedestrians included in the pedestrian frame, and the step of determining, by the processor 630, whether the pedestrians included in the two pedestrian frames are the same pedestrian according to the skeleton information associated with each of the two pedestrian frames, for any two of the pedestrian frames, executed by the pedestrian detection system 600, includes: for any two pedestrian frames in the plurality of pedestrian frames, calculating the similarity between the skeletons of the pedestrians contained in the two pedestrian frames by using the key point feature maps which are in one-to-one correspondence with the specific number of key points of the pedestrians contained in each of the two pedestrian frames; and comparing the calculated similarity with a similarity threshold value for any two pedestrian frames in the plurality of pedestrian frames, if the calculated similarity is greater than the similarity threshold value, determining that the pedestrians contained in the two pedestrian frames are the same pedestrian, otherwise determining that the pedestrians contained in the two pedestrian frames are not the same pedestrian.
In one embodiment, the preliminary pedestrian detection result further includes a pedestrian frame confidence level corresponding to one-to-one to the one or more pedestrian frames, the skeleton information associated with any one of the one or more pedestrian frames includes a keypoint confidence level corresponding to one-to-one to a certain number of keypoints of the pedestrians included in the pedestrian frame, and the step of selecting a unique pedestrian frame from the two or more pedestrian frames as one of the at least one pedestrian frame, executed by the pedestrian detection system 600 when the program code is executed by the processor 630, includes: for two or more pedestrian frames containing the same pedestrian, selecting the unique pedestrian frame from the two or more pedestrian frames according to a pedestrian frame confidence corresponding to the two or more pedestrian frames one to one and a keypoint confidence corresponding to the specific number of keypoints of the pedestrian contained in each of the two or more pedestrian frames.
In one embodiment, the step of selecting the unique pedestrian frame from the two or more pedestrian frames according to the pedestrian frame confidence levels corresponding to the two or more pedestrian frames and the keypoint confidence levels corresponding to the specific number of keypoints of the pedestrians contained in each of the two or more pedestrian frames, for two or more pedestrian frames containing the same pedestrian, executed by the pedestrian detection system 600, by the processor 630, comprises: for two or more pedestrian frames containing the same pedestrian, for each of the two or more pedestrian frames, performing summation or averaging on the confidence coefficient of the pedestrian frame corresponding to the pedestrian frame and the confidence coefficients of the key points, corresponding to the specific number of the key points, of the pedestrian contained in the pedestrian frame to obtain a second total confidence coefficient; and for two or more pedestrian frames containing the same pedestrian, selecting the pedestrian frame with the highest second total confidence as the unique pedestrian frame.
In one embodiment, the program code, when executed by the processor 630, causes the pedestrian detection system 600 to perform the step of detecting a pedestrian in the image to be processed to obtain a pedestrian detection preliminary result comprising: inputting the image to be processed into a first convolution neural network so as to extract the characteristics of the image to be processed; and inputting the characteristics of the image to be processed into a second convolutional neural network to obtain the preliminary result of pedestrian detection.
In one embodiment, the step of performing skeleton analysis on the pedestrians contained in each of the one or more pedestrian frames to obtain skeleton information respectively related to each of the one or more pedestrian frames, which is executed by the pedestrian detection system 600 by the processor 630, includes: inputting the features of the image to be processed and the preliminary result of pedestrian detection into a full convolution network to obtain the skeleton information respectively associated with each of the one or more pedestrian frames.
In one embodiment, the program code when executed by the processor 630 further causes the pedestrian detection system 600 to perform: acquiring a training image, wherein a target pedestrian frame corresponding to each pedestrian in the training image and target positions of a specific number of key points of each pedestrian are marked; constructing a first loss function by taking target pedestrian frames corresponding to pedestrians in the training image in a one-to-one mode as target values of one or more pedestrian frames in a pedestrian detection preliminary result obtained by the second convolutional neural network aiming at the training image, and constructing a second loss function by taking target positions of the pedestrians in the training image and the key points of the specific number as target values of skeleton information obtained by the full convolutional network aiming at the training image; and training parameters in the first convolutional neural network, the second convolutional neural network, and the full convolutional network using the first loss function and the second loss function.
Furthermore, according to an embodiment of the present invention, there is also provided a storage medium on which program instructions are stored, which when executed by a computer or a processor are used for executing the respective steps of the pedestrian detection method according to an embodiment of the present invention and for implementing the respective modules in the pedestrian detection apparatus according to an embodiment of the present invention. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media.
In one embodiment, the computer program instructions, when executed by a computer or processor, may cause the computer or processor to implement the various functional modules of the pedestrian detection apparatus according to the embodiment of the invention, and/or may perform the pedestrian detection method according to the embodiment of the invention.
In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the steps of: acquiring an image to be processed; detecting pedestrians in the image to be processed to obtain a pedestrian detection preliminary result, wherein the pedestrian detection preliminary result comprises one or more pedestrian frames, and each pedestrian frame is used for indicating an area in which a pedestrian is likely to exist in the image to be processed; performing skeleton analysis on the pedestrians contained in each of the one or more pedestrian frames to obtain skeleton information respectively related to each of the one or more pedestrian frames; and screening the one or more pedestrian frames according to the skeleton information respectively related to each of the one or more pedestrian frames to obtain at least one pedestrian frame corresponding to at least part of pedestrians in the image to be processed one by one.
In one embodiment, after the computer program instructions, when executed by a computer, cause the computer to perform the step of filtering the one or more pedestrian frames according to the skeletal information respectively associated with each of the one or more pedestrian frames, the computer program instructions, when executed by a computer, further cause the computer to perform: and for each of the at least one pedestrian frame, judging whether the pedestrian contained in the pedestrian frame is a real pedestrian according to the skeleton information related to the pedestrian frame, and if not, filtering the pedestrian frame.
In one embodiment, the skeleton information associated with any one of the one or more pedestrian frames includes a keypoint confidence level in one-to-one correspondence with a certain number of keypoints of the pedestrian included in the pedestrian frame, and the computer program instructions, when executed by the computer, cause the computer to perform, for each of the at least one pedestrian frame, the step of determining whether the pedestrian included in the pedestrian frame is a real pedestrian according to the skeleton information associated with the pedestrian frame includes: for each of the at least one pedestrian frame, performing summation or averaging on the key point confidence coefficients corresponding to the specific number of key points of the pedestrians contained in the pedestrian frame one by one to obtain a first total confidence coefficient; and for each of the at least one pedestrian frame, comparing the first total confidence with a corresponding confidence threshold, if the first total confidence is greater than the corresponding confidence threshold, determining that the pedestrian contained in the pedestrian frame is a real pedestrian, otherwise determining that the pedestrian contained in the pedestrian frame is not a real pedestrian.
In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the step of filtering the one or more pedestrian frames according to the skeleton information respectively associated with each of the one or more pedestrian frames to obtain at least one pedestrian frame corresponding to at least part of pedestrians in the image to be processed one by one, including: determining, for any two pedestrian frames of the plurality of pedestrian frames, whether pedestrians contained in the two pedestrian frames are the same pedestrian according to skeleton information respectively associated with each of the two pedestrian frames, if the preliminary result of pedestrian detection includes the plurality of pedestrian frames; and for two or more pedestrian frames containing the same pedestrian, selecting a unique pedestrian frame from the two or more pedestrian frames as one of the at least one pedestrian frame.
In one embodiment, the skeleton information associated with any one of the one or more pedestrian frames includes a key point feature map in one-to-one correspondence with a specific number of key points of pedestrians contained in the pedestrian frame, and the step of determining, for any two of the plurality of pedestrian frames, whether the pedestrians contained in the two pedestrian frames are the same pedestrian according to the skeleton information associated with each of the two pedestrian frames, which is executed by the computer, includes: for any two pedestrian frames in the plurality of pedestrian frames, calculating the similarity between the skeletons of the pedestrians contained in the two pedestrian frames by using the key point feature maps which are in one-to-one correspondence with the specific number of key points of the pedestrians contained in each of the two pedestrian frames; and comparing the calculated similarity with a similarity threshold value for any two pedestrian frames in the plurality of pedestrian frames, if the calculated similarity is greater than the similarity threshold value, determining that the pedestrians contained in the two pedestrian frames are the same pedestrian, otherwise determining that the pedestrians contained in the two pedestrian frames are not the same pedestrian.
In one embodiment, the preliminary pedestrian detection result further includes a pedestrian frame confidence level corresponding to one-to-one to the one or more pedestrian frames, the skeleton information associated with any one of the one or more pedestrian frames includes a keypoint confidence level corresponding to one-to-one to a certain number of keypoints of pedestrians included in the pedestrian frame, and the step of selecting a unique pedestrian frame from the two or more pedestrian frames as one of the at least one pedestrian frame, which is executed by the computer, includes, for two or more pedestrian frames including the same pedestrian, when executed by the computer: for two or more pedestrian frames containing the same pedestrian, selecting the unique pedestrian frame from the two or more pedestrian frames according to a pedestrian frame confidence corresponding to the two or more pedestrian frames one to one and a keypoint confidence corresponding to the specific number of keypoints of the pedestrian contained in each of the two or more pedestrian frames.
In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the step of selecting the unique pedestrian frame from the two or more pedestrian frames for two or more pedestrian frames containing the same pedestrian according to a pedestrian frame confidence corresponding to the two or more pedestrian frames one-to-one and a keypoint confidence corresponding to the certain number of keypoints for the pedestrian contained in each of the two or more pedestrian frames, comprising: for two or more pedestrian frames containing the same pedestrian, for each of the two or more pedestrian frames, performing summation or averaging on the confidence coefficient of the pedestrian frame corresponding to the pedestrian frame and the confidence coefficients of the key points, corresponding to the specific number of the key points, of the pedestrian contained in the pedestrian frame to obtain a second total confidence coefficient; and for two or more pedestrian frames containing the same pedestrian, selecting the pedestrian frame with the highest second total confidence as the unique pedestrian frame.
In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the step of detecting a pedestrian in the image to be processed to obtain a pedestrian detection preliminary result comprising: inputting the image to be processed into a first convolution neural network so as to extract the characteristics of the image to be processed; and inputting the characteristics of the image to be processed into a second convolutional neural network to obtain the preliminary result of pedestrian detection.
In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the step of performing a skeleton analysis on the pedestrians contained in each of the one or more pedestrian frames to obtain skeleton information respectively associated with each of the one or more pedestrian frames, including: inputting the features of the image to be processed and the preliminary result of pedestrian detection into a full convolution network to obtain the skeleton information respectively associated with each of the one or more pedestrian frames.
In one embodiment, the computer program instructions, when executed by a computer, further cause the computer to perform: acquiring a training image, wherein a target pedestrian frame corresponding to each pedestrian in the training image and target positions of a specific number of key points of each pedestrian are marked; constructing a first loss function by taking target pedestrian frames corresponding to pedestrians in the training image in a one-to-one mode as target values of one or more pedestrian frames in a pedestrian detection preliminary result obtained by the second convolutional neural network aiming at the training image, and constructing a second loss function by taking target positions of the pedestrians in the training image and the key points of the specific number as target values of skeleton information obtained by the full convolutional network aiming at the training image; and training parameters in the first convolutional neural network, the second convolutional neural network, and the full convolutional network using the first loss function and the second loss function.
The modules in the pedestrian detection system according to the embodiment of the invention may be implemented by the processor of the electronic device implementing pedestrian detection according to the embodiment of the invention running computer program instructions stored in the memory, or may be implemented when computer instructions stored in the computer-readable storage medium of the computer program product according to the embodiment of the invention are run by a computer.
According to the pedestrian detection method and device provided by the embodiment of the invention, the pedestrian frames are screened by using the skeleton information of the pedestrians contained in the pedestrian frames, so that the aims of filtering redundant pedestrian frames of the same pedestrian and retaining pedestrian frames of different pedestrians can be achieved. The pedestrian detection method and the device can avoid the problem that the pedestrian frame of a certain pedestrian is used for filtering the pedestrian frames of other pedestrians, which is brought by NMS (network management system), so that the accuracy of the pedestrian detection result can be improved, and the method and the device have very important value for pedestrian monitoring (particularly pedestrian monitoring in a pedestrian dense scene).
Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the foregoing illustrative embodiments are merely exemplary and are not intended to limit the scope of the invention thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present invention should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some of the modules in a pedestrian detection arrangement according to embodiments of the present invention. The present invention may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
The above description is only for the specific embodiment of the present invention or the description thereof, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (20)

1. A pedestrian detection method, comprising:
acquiring an image to be processed;
detecting pedestrians in the image to be processed by a sliding window method to obtain a pedestrian detection preliminary result, wherein the pedestrian detection preliminary result comprises one or more pedestrian frames, and each pedestrian frame is used for indicating an area in which a pedestrian is likely to exist in the image to be processed;
performing skeleton analysis on the pedestrians contained in each of the one or more pedestrian frames to obtain skeleton information respectively related to each of the one or more pedestrian frames; and
and screening the one or more pedestrian frames according to the skeleton information respectively related to each of the one or more pedestrian frames to obtain at least one pedestrian frame corresponding to at least part of pedestrians in the image to be processed one by one.
2. The pedestrian detection method of claim 1, wherein, after the screening the one or more pedestrian frames according to the skeletal information respectively associated with each of the one or more pedestrian frames, the pedestrian detection method further comprises:
and for each of the at least one pedestrian frame, judging whether the pedestrian contained in the pedestrian frame is a real pedestrian according to the skeleton information related to the pedestrian frame, and if not, filtering the pedestrian frame.
3. The pedestrian detection method according to claim 2, wherein the skeleton information relating to any one of the one or more pedestrian frames includes a keypoint confidence degree in one-to-one correspondence with a specific number of keypoints of pedestrians included in the pedestrian frame,
for each of the at least one pedestrian frame, judging whether the pedestrian contained in the pedestrian frame is a real pedestrian according to the skeleton information related to the pedestrian frame includes:
for each of the at least one pedestrian box,
the confidence degrees of the key points which are in one-to-one correspondence with the key points of the specific number of the pedestrians contained in the pedestrian frame are subjected to summation or averaging to obtain a first total confidence degree; and
and comparing the first total confidence with a corresponding confidence threshold, if the first total confidence is greater than the corresponding confidence threshold, determining that the pedestrian contained in the pedestrian frame is a real pedestrian, otherwise, determining that the pedestrian contained in the pedestrian frame is not a real pedestrian.
4. The pedestrian detection method according to claim 1, wherein the screening the one or more pedestrian frames according to the skeleton information respectively associated with each of the one or more pedestrian frames to obtain at least one pedestrian frame in one-to-one correspondence with at least some pedestrians in the image to be processed comprises:
in the case where the preliminary result of pedestrian detection includes a plurality of pedestrian frames,
for any two pedestrian frames in the plurality of pedestrian frames, determining whether pedestrians contained in the two pedestrian frames are the same pedestrian according to skeleton information respectively related to each of the two pedestrian frames; and
for two or more pedestrian frames containing the same pedestrian, selecting a unique pedestrian frame from the two or more pedestrian frames as one of the at least one pedestrian frame.
5. The pedestrian detection method according to claim 4, wherein the skeleton information relating to any one of the one or more pedestrian frames includes a key point feature map in one-to-one correspondence with a specific number of key points of pedestrians included in the pedestrian frame,
the determining, for any two of the plurality of pedestrian frames, whether the pedestrians contained in the two pedestrian frames are the same pedestrian according to the skeleton information respectively associated with each of the two pedestrian frames includes:
for any two pedestrian frames of the plurality of pedestrian frames,
calculating the similarity between skeletons of the pedestrians contained in the two pedestrian frames by using the key point feature maps which are in one-to-one correspondence with the specific number of key points of the pedestrians contained in each of the two pedestrian frames; and
and comparing the calculated similarity with a similarity threshold, if the calculated similarity is greater than the similarity threshold, determining that the pedestrians contained in the two pedestrian frames are the same pedestrian, and otherwise determining that the pedestrians contained in the two pedestrian frames are not the same pedestrian.
6. The pedestrian detection method according to claim 4 or 5, wherein the pedestrian detection preliminary result further includes pedestrian frame confidence degrees that correspond one-to-one to the one or more pedestrian frames, the skeleton information relating to any one of the one or more pedestrian frames includes keypoint confidence degrees that correspond one-to-one to a specific number of keypoints of pedestrians included in the pedestrian frame,
the selecting, for two or more pedestrian frames containing the same pedestrian, a unique pedestrian frame from the two or more pedestrian frames as one of the at least one pedestrian frame includes:
for two or more pedestrian frames containing the same pedestrian, selecting the unique pedestrian frame from the two or more pedestrian frames according to a pedestrian frame confidence corresponding to the two or more pedestrian frames one to one and a keypoint confidence corresponding to the specific number of keypoints of the pedestrian contained in each of the two or more pedestrian frames.
7. The pedestrian detection method according to claim 6, wherein the selecting, for two or more pedestrian frames containing the same pedestrian, the unique pedestrian frame from the two or more pedestrian frames according to a pedestrian frame confidence corresponding to the two or more pedestrian frames one to one and a keypoint confidence corresponding to the certain number of keypoints for the pedestrian contained in each of the two or more pedestrian frames, comprises:
for two or more pedestrian frames containing the same pedestrian,
for each of the two or more pedestrian frames, performing summation or averaging on the confidence coefficient of the pedestrian frame corresponding to the pedestrian frame and the confidence coefficients of the key points, corresponding to the specific number of the key points, of the pedestrians contained in the pedestrian frame one by one to obtain a second total confidence coefficient; and
and selecting the pedestrian frame with the highest second total confidence coefficient as the unique pedestrian frame.
8. The pedestrian detection method according to claim 1, wherein the detecting a pedestrian in the image to be processed to obtain a pedestrian detection preliminary result comprises:
inputting the image to be processed into a first convolution neural network so as to extract the characteristics of the image to be processed; and
inputting the features of the image to be processed into a second convolutional neural network to obtain the preliminary result of pedestrian detection, wherein the second convolutional neural network is used for realizing the sliding window method.
9. The pedestrian detection method according to claim 8, wherein the performing skeleton analysis on the pedestrians included in each of the one or more pedestrian frames to obtain skeleton information respectively associated with each of the one or more pedestrian frames comprises:
inputting the features of the image to be processed and the preliminary result of pedestrian detection into a full convolution network to obtain the skeleton information respectively associated with each of the one or more pedestrian frames.
10. The pedestrian detection method according to claim 9, wherein the pedestrian detection method further comprises:
acquiring a training image, wherein a target pedestrian frame corresponding to each pedestrian in the training image and target positions of a specific number of key points of each pedestrian are marked;
constructing a first loss function by taking target pedestrian frames corresponding to pedestrians in the training image in a one-to-one mode as target values of one or more pedestrian frames in a pedestrian detection preliminary result obtained by the second convolutional neural network aiming at the training image, and constructing a second loss function by taking target positions of the pedestrians in the training image and the key points of the specific number as target values of skeleton information obtained by the full convolutional network aiming at the training image; and
training parameters in the first convolutional neural network, the second convolutional neural network, and the full convolutional network using the first loss function and the second loss function.
11. A pedestrian detection apparatus comprising:
the image to be processed acquisition module is used for acquiring an image to be processed;
the detection module is used for detecting pedestrians in the image to be processed through a sliding window method to obtain a pedestrian detection preliminary result, wherein the pedestrian detection preliminary result comprises one or more pedestrian frames, and each pedestrian frame is used for indicating an area where pedestrians are possibly present in the image to be processed;
the skeleton analysis module is used for carrying out skeleton analysis on the pedestrians contained in each of the one or more pedestrian frames so as to obtain skeleton information respectively related to each of the one or more pedestrian frames; and
and the screening module is used for screening the one or more pedestrian frames according to the skeleton information respectively related to each of the one or more pedestrian frames so as to obtain at least one pedestrian frame in one-to-one correspondence with at least part of pedestrians in the image to be processed.
12. The pedestrian detection device according to claim 11, wherein the pedestrian detection device further comprises:
and the real pedestrian judgment module is used for judging whether the pedestrian contained in the pedestrian frame is a real pedestrian or not according to the skeleton information related to the pedestrian frame for each of the at least one pedestrian frame, and filtering the pedestrian frame if the pedestrian contained in the pedestrian frame is not the real pedestrian.
13. The pedestrian detection apparatus of claim 12, wherein the skeletal information associated with any of the one or more pedestrian frames includes keypoint confidences that are one-to-one corresponding to a particular number of keypoints for the pedestrian contained by that pedestrian frame,
the real pedestrian judgment module includes:
a first total confidence obtaining submodule, configured to sum or average, for each of the at least one pedestrian frame, the confidence levels of the keypoints that correspond to the pedestrians included in the pedestrian frame in the specific number one to one, so as to obtain a first total confidence level; and
and the confidence degree comparison submodule is used for comparing the first total confidence degree with a corresponding confidence degree threshold value for each of the at least one pedestrian frame, and if the first total confidence degree is greater than the corresponding confidence degree threshold value, determining that the pedestrian contained in the pedestrian frame is a real pedestrian, otherwise, determining that the pedestrian contained in the pedestrian frame is not a real pedestrian.
14. The pedestrian detection apparatus of claim 11, wherein the screening module comprises:
the same pedestrian determination submodule is used for determining whether pedestrians contained in any two pedestrian frames in the pedestrian frames are the same pedestrian or not according to skeleton information respectively relevant to each of the two pedestrian frames under the condition that the preliminary pedestrian detection result comprises the multiple pedestrian frames; and
a pedestrian frame selection sub-module configured to, in a case where the preliminary result of pedestrian detection includes a plurality of pedestrian frames, select, as one of the at least one pedestrian frame, a unique pedestrian frame from among two or more pedestrian frames for two or more pedestrian frames containing the same pedestrian.
15. The pedestrian detection device according to claim 14, wherein the skeleton information relating to any one of the one or more pedestrian frames includes a keypoint feature map in one-to-one correspondence with a specific number of keypoints of pedestrians included in the pedestrian frame,
the same pedestrian determination submodule includes:
a similarity calculation unit configured to calculate, for any two of the plurality of pedestrian frames, a similarity between skeletons of pedestrians included in the two pedestrian frames using a keypoint feature map that is one-to-one corresponding to the specific number of keypoints of the pedestrians included in each of the two pedestrian frames; and
and the similarity comparison unit is used for comparing the calculated similarity with a similarity threshold value for any two pedestrian frames in the plurality of pedestrian frames, if the calculated similarity is greater than the similarity threshold value, determining that the pedestrians contained in the two pedestrian frames are the same pedestrian, and otherwise, determining that the pedestrians contained in the two pedestrian frames are not the same pedestrian.
16. The pedestrian detection device according to claim 14 or 15, wherein the preliminary pedestrian detection result further includes pedestrian frame confidence degrees that correspond one-to-one to the one or more pedestrian frames, the skeleton information relating to any one of the one or more pedestrian frames includes keypoint confidence degrees that correspond one-to-one to a specific number of keypoints of pedestrians included in that pedestrian frame,
the pedestrian frame selection submodule comprises:
a pedestrian frame selection unit configured to, for two or more pedestrian frames including the same pedestrian, select the unique pedestrian frame from the two or more pedestrian frames according to a pedestrian frame confidence corresponding to the two or more pedestrian frames one to one and a keypoint confidence corresponding to the specific number of keypoints of the pedestrian included in each of the two or more pedestrian frames one to one.
17. The pedestrian detection apparatus according to claim 16, wherein the pedestrian frame selection unit includes:
a second total confidence obtaining subunit, configured to, for each of two or more pedestrian frames that include the same pedestrian, sum or average a pedestrian frame confidence corresponding to the pedestrian frame and a keypoint confidence corresponding to the pedestrian included in the pedestrian frame and corresponding to the specific number of keypoints one by one, to obtain a second total confidence; and
and the pedestrian frame selection subunit is used for selecting the pedestrian frame with the highest second total confidence coefficient as the unique pedestrian frame for two or more pedestrian frames containing the same pedestrian.
18. The pedestrian detection apparatus of claim 11, wherein the detection module comprises:
the first input submodule is used for inputting the image to be processed into a first convolution neural network so as to extract the characteristics of the image to be processed; and
and the second input submodule is used for inputting the characteristics of the image to be processed into a second convolutional neural network so as to obtain the preliminary result of the pedestrian detection, and the second convolutional neural network is used for realizing the sliding window method.
19. The pedestrian detection apparatus of claim 18, wherein the skeletal analysis module comprises:
and the third input submodule is used for inputting the characteristics of the image to be processed and the preliminary pedestrian detection result into a full convolution network so as to obtain the skeleton information respectively related to each of the one or more pedestrian frames.
20. The pedestrian detection device according to claim 19, wherein the pedestrian detection device further comprises:
the training image acquisition module is used for acquiring a training image, wherein a target pedestrian frame corresponding to each pedestrian in the training image and target positions of a specific number of key points of each pedestrian are marked;
a loss function constructing module, configured to construct a first loss function by using a target pedestrian frame corresponding to a pedestrian in the training image in a one-to-one manner as a target value of one or more pedestrian frames in a pedestrian detection preliminary result obtained by the second convolutional neural network for the training image, and construct a second loss function by using a target position of the pedestrian in the training image and the key points of the specific number as a target value of skeleton information obtained by the full convolutional network for the training image; and
a training module, configured to train parameters in the first convolutional neural network, the second convolutional neural network, and the full convolutional network using the first loss function and the second loss function.
CN201610971349.9A 2016-10-28 2016-10-28 Pedestrian detection method and device Active CN108009466B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610971349.9A CN108009466B (en) 2016-10-28 2016-10-28 Pedestrian detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610971349.9A CN108009466B (en) 2016-10-28 2016-10-28 Pedestrian detection method and device

Publications (2)

Publication Number Publication Date
CN108009466A CN108009466A (en) 2018-05-08
CN108009466B true CN108009466B (en) 2022-03-15

Family

ID=62047524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610971349.9A Active CN108009466B (en) 2016-10-28 2016-10-28 Pedestrian detection method and device

Country Status (1)

Country Link
CN (1) CN108009466B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960081B (en) * 2018-06-15 2021-07-30 熵基科技股份有限公司 Palm image recognition method and device and computer readable storage medium
CN109034124A (en) * 2018-08-30 2018-12-18 成都考拉悠然科技有限公司 A kind of intelligent control method and system
CN109657545B (en) * 2018-11-10 2022-12-20 天津大学 Pedestrian detection method based on multi-task learning
CN110046600B (en) * 2019-04-24 2021-02-26 北京京东尚科信息技术有限公司 Method and apparatus for human detection
CN110349184B (en) * 2019-06-06 2022-08-09 南京工程学院 Multi-pedestrian tracking method based on iterative filtering and observation discrimination
CN111160103B (en) * 2019-11-29 2024-04-23 中科曙光(南京)计算技术有限公司 Unmanned middle pedestrian detection method and device
CN111127520B (en) * 2019-12-26 2022-06-14 华中科技大学 Vehicle tracking method and system based on video analysis
CN111914704B (en) * 2020-07-20 2024-03-19 北京格灵深瞳信息技术有限公司 Tricycle manned identification method and device, electronic equipment and storage medium
CN115294515B (en) * 2022-07-05 2023-06-13 南京邮电大学 Comprehensive anti-theft management method and system based on artificial intelligence

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184541A (en) * 2011-05-04 2011-09-14 西安电子科技大学 Multi-objective optimized human body motion tracking method
CN102306290A (en) * 2011-10-14 2012-01-04 刘伟华 Face tracking recognition technique based on video
CN104239865A (en) * 2014-09-16 2014-12-24 宁波熵联信息技术有限公司 Pedestrian detecting and tracking method based on multi-stage detection
CN105138983A (en) * 2015-08-21 2015-12-09 燕山大学 Pedestrian detection method based on weighted part model and selective search segmentation
CN105518744A (en) * 2015-06-29 2016-04-20 北京旷视科技有限公司 Pedestrian re-identification method and equipment
CN105574506A (en) * 2015-12-16 2016-05-11 深圳市商汤科技有限公司 Intelligent face tracking system and method based on depth learning and large-scale clustering
CN105787439A (en) * 2016-02-04 2016-07-20 广州新节奏智能科技有限公司 Depth image human body joint positioning method based on convolution nerve network
CN105913003A (en) * 2016-04-07 2016-08-31 国家电网公司 Multi-characteristic multi-model pedestrian detection method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9152856B2 (en) * 2013-12-19 2015-10-06 Institute For Information Industry Pedestrian detection system and method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184541A (en) * 2011-05-04 2011-09-14 西安电子科技大学 Multi-objective optimized human body motion tracking method
CN102306290A (en) * 2011-10-14 2012-01-04 刘伟华 Face tracking recognition technique based on video
CN104239865A (en) * 2014-09-16 2014-12-24 宁波熵联信息技术有限公司 Pedestrian detecting and tracking method based on multi-stage detection
CN105518744A (en) * 2015-06-29 2016-04-20 北京旷视科技有限公司 Pedestrian re-identification method and equipment
CN105138983A (en) * 2015-08-21 2015-12-09 燕山大学 Pedestrian detection method based on weighted part model and selective search segmentation
CN105574506A (en) * 2015-12-16 2016-05-11 深圳市商汤科技有限公司 Intelligent face tracking system and method based on depth learning and large-scale clustering
CN105787439A (en) * 2016-02-04 2016-07-20 广州新节奏智能科技有限公司 Depth image human body joint positioning method based on convolution nerve network
CN105913003A (en) * 2016-04-07 2016-08-31 国家电网公司 Multi-characteristic multi-model pedestrian detection method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Joint Deep Learning for Pedestrian Detection;Wanli Ouyang 等;《2013 IEEE International Conference on Computer Vision》;20131231;第2056-2063页 *
Pedestrian Detection in Low-resolution Imagery by Learning Multi-scale Intrinsic Motion Structures (MIMS);Jiejie Zhu 等;《2014 IEEE Conference on Computer Vision and Pattern Recognition 2014 IEEE Conference on Computer Vision and Pattern Recognition 2014 IEEE Conference on Computer Vision and Pattern Recognition》;20141231;第3510-3517页 *
基于CNN的监控视频事件检测;王梦来 等;《自动化学报》;20160630;第42卷(第6期);第892-903页 *
基于深度与视觉信息融合的行人检测与再识别研究;祝博荟;《中国博士学位论文全文数据库 信息科技辑》;20140515;第2014年卷(第05期);第I138-65页 *

Also Published As

Publication number Publication date
CN108009466A (en) 2018-05-08

Similar Documents

Publication Publication Date Title
CN108009466B (en) Pedestrian detection method and device
CN108256404B (en) Pedestrian detection method and device
CN109255352B (en) Target detection method, device and system
CN108629791B (en) Pedestrian tracking method and device and cross-camera pedestrian tracking method and device
CN107808111B (en) Method and apparatus for pedestrian detection and attitude estimation
CN107358149B (en) Human body posture detection method and device
CN108875732B (en) Model training and instance segmentation method, device and system and storage medium
CN106845352B (en) Pedestrian detection method and device
CN106952303B (en) Vehicle distance detection method, device and system
CN108875481B (en) Method, device, system and storage medium for pedestrian detection
CN109918987B (en) Video subtitle keyword identification method and device
CN109299646B (en) Crowd abnormal event detection method, device, system and storage medium
CN107844794B (en) Image recognition method and device
CN106650662B (en) Target object shielding detection method and device
US8744125B2 (en) Clustering-based object classification
CN109815843B (en) Image processing method and related product
US20170213080A1 (en) Methods and systems for automatically and accurately detecting human bodies in videos and/or images
US9183431B2 (en) Apparatus and method for providing activity recognition based application service
CN108875537B (en) Object detection method, device and system and storage medium
CN109492577B (en) Gesture recognition method and device and electronic equipment
CN109727275B (en) Object detection method, device, system and computer readable storage medium
CN109426785B (en) Human body target identity recognition method and device
CN108875750B (en) Object detection method, device and system and storage medium
CN109241888B (en) Neural network training and object recognition method, device and system and storage medium
CN110263680B (en) Image processing method, device and system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant