CN108256404B

CN108256404B - Pedestrian detection method and device

Info

Publication number: CN108256404B
Application number: CN201611249204.4A
Authority: CN
Inventors: 俞刚; 彭雨翔
Original assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2016-12-29
Filing date: 2016-12-29
Publication date: 2021-12-10
Anticipated expiration: 2036-12-29
Also published as: CN108256404A

Abstract

The embodiment of the invention provides a pedestrian detection method and device. The pedestrian detection method includes: acquiring an image to be processed; detecting pedestrians in an image to be processed to obtain pedestrian information, wherein the pedestrian information comprises a plurality of pedestrian frames; detecting the heads of pedestrians in an image to be processed to obtain head information, wherein the head information comprises head positions corresponding to at least part of pedestrian frames in a plurality of pedestrian frames in a one-to-one mode; detecting the shoulders of the pedestrians in the image to be processed to obtain shoulder information, wherein the shoulder information comprises left shoulder positions and right shoulder positions which are in one-to-one correspondence with at least part of the pedestrian frames; calculating the overlapping condition of the head and shoulder parts corresponding to at least part of the pedestrian frames one by one according to the head position, the left shoulder position and the right shoulder position corresponding to at least part of the pedestrian frames one by one; and filtering the pedestrian frame according to the overlapping condition to obtain a pedestrian detection result. The method and the device can avoid the problem that the pedestrian frame of a certain pedestrian filters out the pedestrian frames of other pedestrians.

Description

Pedestrian detection method and device

Technical Field

The invention relates to the field of computers, in particular to a pedestrian detection method and device.

Background

In the field of monitoring, pedestrian detection plays a very important role. The conventional pedestrian detection method tends to have the following problems. The traditional pedestrian detection method usually extracts a plurality of windows (each window is a rectangular frame and may also be referred to as a pedestrian frame) with different scales from an image to be processed by a sliding-window (sliding-window) method, and determines whether a pedestrian exists in each window. In addition, conventional pedestrian detection methods typically use non-maximum suppression (NMS) to filter multiple windows on the same pedestrian. This often results in a crowded situation where windows containing other pedestrians are filtered out, resulting in severe false negatives (i.e. pedestrians are in any case, but not detected).

Disclosure of Invention

The present invention has been made in view of the above problems. The invention provides a pedestrian detection method and device.

According to an aspect of the present invention, there is provided a pedestrian detection method. The method comprises the following steps: acquiring an image to be processed; detecting pedestrians in an image to be processed to obtain pedestrian information, wherein the pedestrian information comprises a plurality of pedestrian frames; detecting the heads of pedestrians in an image to be processed to obtain head information, wherein the head information comprises head positions corresponding to at least part of pedestrian frames in a plurality of pedestrian frames in a one-to-one mode; detecting the shoulders of the pedestrians in the image to be processed to obtain shoulder information, wherein the shoulder information comprises left shoulder positions and right shoulder positions which are in one-to-one correspondence with at least part of the pedestrian frames; calculating the overlapping condition of the head and shoulder parts corresponding to at least part of the pedestrian frames one by one according to the head position, the left shoulder position and the right shoulder position corresponding to at least part of the pedestrian frames one by one; and filtering the pedestrian frame according to the overlapping condition to obtain a pedestrian detection result.

Illustratively, detecting the head of the pedestrian in the image to be processed to obtain the head information includes: detecting the head of a pedestrian in an image to be processed to obtain at least one head frame; for each of at least part of the pedestrian frames, selecting a head frame, of which the distance between the central point and the central point of the pedestrian frame is smaller than a preset head threshold value and the central point is closest to the central point of the pedestrian frame, from at least one head frame as the head frame corresponding to the pedestrian frame, and determining the central point of the head frame corresponding to the pedestrian frame as the head position corresponding to the pedestrian frame; detecting the shoulder of the pedestrian in the image to be processed to obtain the shoulder information includes: detecting shoulders of a pedestrian in the image to be processed to determine at least one left shoulder position and at least one right shoulder position; selecting a left shoulder position which is less than a preset left shoulder threshold value and closest to the center point of the pedestrian frame from at least one left shoulder position as a left shoulder position corresponding to the pedestrian frame; and selecting a right shoulder position which is less than a preset right shoulder threshold value from at least one right shoulder position and is closest to the center point of the pedestrian frame as a right shoulder position corresponding to the pedestrian frame.

Illustratively, for each of at least some of the pedestrian frames, the head-shoulder position corresponding to the pedestrian frame is a head-shoulder triangle having the head position, the left-shoulder position, and the right-shoulder position corresponding to the pedestrian frame as vertices.

Illustratively, the pedestrian information further includes a pedestrian confidence of each of the plurality of pedestrian frames, the head information further includes a head confidence of a head position corresponding to at least part of the pedestrian frames one to one, the shoulder information further includes a left shoulder confidence of a left shoulder position corresponding to at least part of the pedestrian frames one to one and a right shoulder confidence of a right shoulder position corresponding to at least part of the pedestrian frames one to one, before filtering the pedestrian frames according to the overlap condition to obtain the pedestrian detection result, the pedestrian detection method further includes: for each of at least part of the pedestrian frames, calculating the triangular confidence coefficient of the head-shoulder triangle corresponding to the pedestrian frame according to the pedestrian confidence coefficient of the pedestrian frame, the head confidence coefficient of the head position corresponding to the pedestrian frame, the left shoulder confidence coefficient of the left shoulder position corresponding to the pedestrian frame and the right shoulder confidence coefficient of the right shoulder position corresponding to the pedestrian frame; filtering the pedestrian frames according to the overlapping condition to obtain the pedestrian detection result comprises: if at least one group of head-shoulder triangles exist in the head-shoulder triangles corresponding to at least part of the pedestrian frames one by one, wherein each group of head-shoulder triangles in the at least one group of head-shoulder triangles comprises a plurality of head-shoulder triangles with the overlapping proportion larger than a preset overlapping threshold value, selecting the rest head-shoulder triangles except the head-shoulder triangle with the highest triangle confidence coefficient for each group of head-shoulder triangles in the at least one group of head-shoulder triangles; and filtering the pedestrian frames corresponding to all the selected head-shoulder triangles in the plurality of pedestrian frames to obtain a pedestrian detection result.

Illustratively, before detecting a pedestrian in the image to be processed to obtain the pedestrian information, the pedestrian detection method further comprises: extracting the characteristics of the image to be processed by utilizing a first convolution neural network; detecting a pedestrian in an image to be processed to obtain pedestrian information includes: inputting the characteristics of the image to be processed into a second convolutional neural network to obtain pedestrian information; detecting the head of the pedestrian in the image to be processed to obtain the head information includes: inputting the features of the image to be processed into a third convolutional neural network to obtain at least one head frame and a head confidence of each head frame; selecting a head frame corresponding to at least part of pedestrian frames one by one from the at least one head frame to obtain head information; detecting the shoulder of the pedestrian in the image to be processed to obtain the shoulder information includes: inputting the features of the image to be processed into a fourth convolutional neural network to obtain a left shoulder feature map and a right shoulder feature map, wherein the left shoulder feature map is consistent with the size of the image to be processed, the pixel value of each pixel in the left shoulder feature map represents the left shoulder confidence coefficient that the pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the left shoulder, the right shoulder feature map is consistent with the size of the image to be processed, and the pixel value of each pixel in the right shoulder feature map represents the right shoulder confidence coefficient that the pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the right shoulder; and selecting a left shoulder position corresponding to at least part of the pedestrian frames one by one from the at least one left shoulder position indicated by the left shoulder characteristic diagram, and selecting a right shoulder position corresponding to at least part of the pedestrian frames one by one from the at least one right shoulder position indicated by the right shoulder characteristic diagram to obtain shoulder information.

Illustratively, the pedestrian detection method further comprises: acquiring a training image and marking data, wherein the marking data comprise a pedestrian frame, a head frame, a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image; constructing a first loss function by taking a pedestrian frame corresponding to each pedestrian in the training image as a target value of the pedestrian frame obtained by processing the training image by using a first convolutional neural network and a second convolutional neural network, constructing a second loss function by taking a head frame corresponding to each pedestrian in the training image as a target value of the head frame obtained by processing the training image by using the first convolutional neural network and a third convolutional neural network, and constructing a third loss function by taking a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image as target values of a left shoulder position and a right shoulder position obtained by processing the training image by using the first convolutional neural network and a fourth convolutional neural network; and training parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network and the fourth convolutional neural network by using at least the first loss function, the second loss function and the third loss function.

Illustratively, the annotation data further includes a face frame corresponding to each pedestrian in the training image; the pedestrian detection method further includes: constructing a fourth loss function by taking a face frame corresponding to each pedestrian in the training image as a target value of the face frame obtained by processing the training image by using the first convolutional neural network and the fifth convolutional neural network, wherein an output layer of the first convolutional neural network is connected with an input layer of the fifth convolutional neural network; training parameters in the first, second, third, and fourth convolutional neural networks using at least the first, second, and third loss functions includes: and training parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network, the fourth convolutional neural network and the fifth convolutional neural network by using the first loss function, the second loss function, the third loss function and the fourth loss function.

According to another aspect of the present invention, a pedestrian detection apparatus is provided. The device includes: the image to be processed acquisition module is used for acquiring an image to be processed; the device comprises a first detection module, a second detection module and a processing module, wherein the first detection module is used for detecting pedestrians in an image to be processed so as to obtain pedestrian information, and the pedestrian information comprises a plurality of pedestrian frames; the second detection module is used for detecting the head of the pedestrian in the image to be processed to obtain head information, wherein the head information comprises head positions in one-to-one correspondence with at least part of pedestrian frames in the pedestrian frames; the third detection module is used for detecting the shoulders of the pedestrians in the image to be processed to obtain shoulder information, wherein the shoulder information comprises left shoulder positions and right shoulder positions which correspond to at least part of the pedestrian frames in a one-to-one mode; the overlapping calculation module is used for calculating the overlapping condition of the head and shoulder parts which correspond to at least part of the pedestrian frames one by one according to the head position, the left shoulder position and the right shoulder position which correspond to at least part of the pedestrian frames one by one; and the filtering module is used for filtering the pedestrian frames according to the overlapping condition so as to obtain a pedestrian detection result.

Illustratively, the second detection module includes: the head detection submodule is used for detecting the head of a pedestrian in the image to be processed so as to obtain at least one head frame; the head frame selection submodule is used for selecting a head frame, of which the distance between a central point and the central point of the pedestrian frame is smaller than a preset head threshold value and the central point is closest to the central point of the pedestrian frame, from at least one head frame as the head frame corresponding to the pedestrian frame for each of at least part of the pedestrian frames, and determining the central point of the head frame corresponding to the pedestrian frame as the head position corresponding to the pedestrian frame; the third detection module includes: the shoulder detection submodule is used for detecting the shoulders of the pedestrians in the image to be processed so as to determine at least one left shoulder position and at least one right shoulder position; the left shoulder selection submodule is used for selecting a left shoulder position which is less than a preset left shoulder threshold value and closest to the center point of the pedestrian frame from at least one left shoulder position as a left shoulder position corresponding to the pedestrian frame; and the right shoulder selection submodule is used for selecting a right shoulder position which is less than a preset right shoulder threshold value and closest to the center point of the pedestrian frame from at least one right shoulder position as the right shoulder position corresponding to the pedestrian frame.

Exemplarily, the pedestrian information further includes a pedestrian confidence of each of the plurality of pedestrian frames, the head information further includes a head confidence of a head position corresponding to at least some of the pedestrian frames one to one, the shoulder information further includes a left shoulder confidence of a left shoulder position corresponding to at least some of the pedestrian frames one to one and a right shoulder confidence of a right shoulder position corresponding to at least some of the pedestrian frames one to one, and the pedestrian detection apparatus further includes: the confidence coefficient calculation module is used for calculating the triangular confidence coefficient of a head-shoulder triangle corresponding to the pedestrian frame according to the pedestrian confidence coefficient of the pedestrian frame, the head confidence coefficient of the head position corresponding to the pedestrian frame, the left shoulder confidence coefficient of the left shoulder position corresponding to the pedestrian frame and the right shoulder confidence coefficient of the right shoulder position corresponding to the pedestrian frame for each of at least part of the pedestrian frames; the filtration module includes: the triangle selection submodule is used for selecting the rest head-shoulder triangles except the head-shoulder triangle with the highest triangle confidence coefficient for each group of head-shoulder triangles in at least one group of head-shoulder triangles if at least one group of head-shoulder triangles exist in the head-shoulder triangles corresponding to at least part of the pedestrian frames one by one, wherein each group of head-shoulder triangles in the at least one group of head-shoulder triangles comprises a plurality of head-shoulder triangles with the overlapping proportion larger than a preset overlapping threshold value; and the filtering submodule is used for filtering the pedestrian frames corresponding to all the selected head-shoulder triangles in the plurality of pedestrian frames to obtain a pedestrian detection result.

Exemplarily, the pedestrian detection device further includes: the feature extraction module is used for extracting features of the image to be processed by utilizing a first convolution neural network before the first detection module detects the pedestrian in the image to be processed to obtain pedestrian information; the first detection module includes: the first input submodule is used for inputting the characteristics of the image to be processed into the second convolutional neural network so as to obtain pedestrian information; the second detection module includes: the second input submodule is used for inputting the characteristics of the image to be processed into a third convolutional neural network so as to obtain at least one head frame and a head confidence coefficient of each head frame; the first selection submodule is used for selecting the head frames corresponding to at least part of pedestrian frames one by one from the at least one head frame so as to obtain head information; the third detection module includes: the third input submodule is used for inputting the features of the image to be processed into a fourth convolutional neural network so as to obtain a left shoulder feature map and a right shoulder feature map, wherein the left shoulder feature map is consistent with the size of the image to be processed, the pixel value of each pixel of the left shoulder feature map represents the left shoulder confidence coefficient that the pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the left shoulder, the right shoulder feature map is consistent with the size of the image to be processed, and the pixel value of each pixel of the right shoulder feature map represents the right shoulder confidence coefficient that the pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the right shoulder; and the second selection submodule is used for selecting left shoulder positions corresponding to at least part of the pedestrian frames one by one from the at least one left shoulder position indicated by the left shoulder characteristic diagram and selecting right shoulder positions corresponding to at least part of the pedestrian frames one by one from the at least one right shoulder position indicated by the right shoulder characteristic diagram so as to obtain shoulder information.

Exemplarily, the pedestrian detection device further includes: the training image acquisition module is used for acquiring training images and marking data, wherein the marking data comprise a pedestrian frame, a head frame, a left shoulder position and a right shoulder position corresponding to each pedestrian in the training images; the first construction module is used for constructing a first loss function by taking a pedestrian frame corresponding to each pedestrian in the training image as a target value of the pedestrian frame obtained by processing the training image by using the first convolutional neural network and the second convolutional neural network, constructing a second loss function by taking a head frame corresponding to each pedestrian in the training image as a target value of the head frame obtained by processing the training image by using the first convolutional neural network and the third convolutional neural network, and constructing a third loss function by taking a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image as target values of a left shoulder position and a right shoulder position obtained by processing the training image by using the first convolutional neural network and the fourth convolutional neural network; and the training module is used for training the parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network and the fourth convolutional neural network by using at least the first loss function, the second loss function and the third loss function.

Illustratively, the annotation data further includes a face frame corresponding to each pedestrian in the training image; the pedestrian detection device further includes: the second construction module is used for constructing a fourth loss function by taking the face frame corresponding to each pedestrian in the training image as a target value of the face frame obtained by processing the training image by using the first convolutional neural network and the fifth convolutional neural network, wherein the output layer of the first convolutional neural network is connected with the input layer of the fifth convolutional neural network; the training module comprises: and the training submodule is used for training the parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network, the fourth convolutional neural network and the fifth convolutional neural network by using the first loss function, the second loss function, the third loss function and the fourth loss function.

According to the pedestrian detection method and device provided by the embodiment of the invention, the pedestrian frames are filtered based on the overlapping condition of the head and shoulder parts rather than the overlapping condition of the pedestrian frames, so that the problem that the pedestrian frame of a certain pedestrian is used for filtering the pedestrian frames of other pedestrians can be avoided, the false negative in pedestrian detection can be reduced, and the pedestrian detection precision is improved.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 shows a schematic block diagram of an example electronic device for implementing a pedestrian detection method and apparatus in accordance with embodiments of the invention;

FIG. 2 shows a schematic flow diagram of a pedestrian detection method according to one embodiment of the invention;

FIG. 3 shows a schematic diagram of a data processing flow of a pedestrian detection method according to one embodiment of the invention;

FIG. 4 shows a schematic block diagram of a pedestrian detection arrangement, according to one embodiment of the present invention; and

FIG. 5 shows a schematic block diagram of a pedestrian detection system according to one embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the invention described herein without inventive step, shall fall within the scope of protection of the invention.

In order to solve the above-mentioned problems, embodiments of the present invention provide a pedestrian detection method and apparatus that filter pedestrian frames based on an overlapping situation of head and shoulder portions of pedestrians rather than an overlapping situation of pedestrian frames. Because the head and shoulder parts of different pedestrians usually have certain offset and the overlapping proportion of the head and shoulder parts is usually small, the method provided by the embodiment of the invention can avoid the problem that the pedestrian frame of a certain pedestrian filters the pedestrian frames of other pedestrians. The pedestrian detection method provided by the embodiment of the invention can obtain better pedestrian detection results under various complex conditions, so that the method can be well applied to various monitoring fields.

First, an example electronic device 100 for implementing a pedestrian detection method and apparatus according to an embodiment of the present invention is described with reference to fig. 1.

As shown in FIG. 1, electronic device 100 includes one or more processors 102, one or more memory devices 104, an input device 106, an output device 108, and an image capture device 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.

The processor 102 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.

The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images and/or sounds) to an external (e.g., user), and may include one or more of a display, a speaker, etc.

The image capture device 110 may capture images (including video frames) and store the captured images in the storage device 104 for use by other components. The image capture device 110 may be a surveillance camera. It should be understood that the image capture device 110 is merely an example, and the electronic device 100 may not include the image capture device 110. In this case, an image for pedestrian detection may be captured using another image capturing device and the captured image may be transmitted to the electronic apparatus 100.

Illustratively, an example electronic device for implementing the pedestrian detection method and apparatus in accordance with embodiments of the present invention may be implemented on a device such as a personal computer or a remote server.

Next, a pedestrian detection method according to an embodiment of the invention will be described with reference to fig. 2. FIG. 2 shows a schematic flow diagram of a pedestrian detection method 200 according to one embodiment of the invention. As shown in fig. 2, the pedestrian detection method 200 includes the following steps.

In step S210, an image to be processed is acquired.

The image to be processed may be any suitable image that requires pedestrian detection, such as an image captured for a monitored area. The image to be processed may be an original image acquired by an image acquisition device such as a camera, or may be an image obtained after preprocessing the original image.

The image to be processed may be sent to the electronic device 100 by a client device (such as a security device including a monitoring camera) to be processed by the processor 102 of the electronic device 100, or may be collected by an image collecting device 110 (e.g., a camera) included in the electronic device 100 and transmitted to the processor 102 for processing.

In step S220, a pedestrian in the image to be processed is detected to obtain pedestrian information, wherein the pedestrian information includes a plurality of pedestrian frames.

By way of example and not limitation, step S220 may be implemented using any suitable existing or future possible implementation of pedestrian detection algorithms. In one example, a trained convolutional neural network may be utilized to detect pedestrians in the image to be processed to obtain pedestrian information. The pedestrian information obtained at step S220 may include several pedestrian frames. The pedestrian frame is a rectangular frame for indicating an area in which a pedestrian may be present in the image to be processed. For example, the pedestrian information may further include a pedestrian confidence of each pedestrian frame, which is used to indicate a probability that a pedestrian exists in the pedestrian frame.

In step S230, the head of a pedestrian in the image to be processed is detected to obtain head information, wherein the head information includes head positions in one-to-one correspondence with at least some of the plurality of pedestrian frames.

By way of example and not limitation, step S230 may be implemented using any suitable existing or future possible head detection algorithm. In one example, several header boxes may be obtained using a header detection algorithm. In one example, the head of a pedestrian in the image to be processed may be detected using a trained convolutional neural network to obtain a head box.

Similarly to the pedestrian frame, the head frame may also be a rectangular frame for indicating an area in the image to be processed where the head may exist. In addition, a head confidence level of each head box can be obtained by using a head detection algorithm, and the head confidence level is used for representing the probability that the head exists in the head box. The head position may be obtained based on the head box, which may be a center point of the head box, for example. Of course, it is understood that the head position may be other points in the head frame, and the present invention is not limited thereto. A plurality of pedestrian frames may be associated with all of the detected head frames to determine head frames corresponding to at least some of the pedestrian frames one-to-one, and thus determine head positions corresponding to at least some of the pedestrian frames one-to-one. Of course, it is also possible to directly associate a plurality of pedestrian frames with all the determined head positions to determine the head positions corresponding to at least some of the pedestrian frames one to one.

In step S240, the shoulders of the pedestrian in the image to be processed are detected to obtain shoulder information, wherein the shoulder information includes left and right shoulder positions in one-to-one correspondence with at least a part of the pedestrian frame.

In one example, a trained convolutional neural network may be utilized to detect the shoulders of a pedestrian in the image to be processed.

After determining a number of left and right shoulder positions by shoulder detection, a plurality of pedestrian frames may be associated with the determined left and right shoulder positions, whereby left and right shoulder positions corresponding one-to-one to at least part of the pedestrian frames may be obtained.

In step S250, the overlap condition of the head-shoulder portions corresponding to at least some pedestrian frames is calculated according to the head position, the left shoulder position and the right shoulder position corresponding to at least some pedestrian frames.

For example, for each of at least some of the pedestrian frames, the head-shoulder position corresponding to the pedestrian frame may be a head-shoulder triangle having the head position, the left-shoulder position, and the right-shoulder position corresponding to the pedestrian frame as vertices. For example, in the case where a head frame is obtained by head detection, the center point of the head frame can be found with the center point as the head position. The left shoulder position and the right shoulder position may be two points, respectively. Thus, a triangle called a head-shoulder triangle can be obtained with the three points of the head position, the left shoulder position, and the right shoulder position as vertexes. The head and shoulder triangle can be used to represent the head and shoulder part. Illustratively, an IoU (Intersection over Union) value of two shoulder triangles can be calculated, and the IoU value can be used to measure the overlap condition of the two. IoU values represent the ratio of the area of the intersection portion (i.e., the overlap portion) of two shoulder triangles to the area of the union portion. The larger the IoU value, the larger the overlap ratio of the two shoulder triangles can be considered. Since the heads and shoulders of different pedestrians usually have a certain offset, the IoU value of the head-shoulder triangle of different pedestrians is not usually very high, so that the head-shoulder triangle corresponding to two pedestrians can distinguish two pedestrians more easily than the rectangular pedestrian frame corresponding to two pedestrians. Of course, the above-described head-shoulder triangle is merely an example, and the region representing the head-shoulder portion may be a region having other shapes.

In step S260, the pedestrian frame is filtered according to the overlapping condition to obtain the pedestrian detection result.

Steps S250 and S260 may be considered to be a head-shoulder based NMS. For example, if the overlap ratio of two head-shoulder portions exceeds a predetermined threshold, the two head-shoulder portions may be considered to belong to the same pedestrian, and the head-shoulder portion with high confidence of the head-shoulder portion and the pedestrian frame corresponding thereto may be retained, while the head-shoulder portion with low confidence of the head-shoulder portion and the pedestrian frame corresponding thereto may be discarded. The confidence of the head-shoulder position may be calculated based on the confidence of the head position, the confidence of the left shoulder position, the confidence of the right shoulder position, and the confidence of the pedestrian box, which will be described later. Further, for pedestrian frames of the plurality of pedestrian frames that do not have corresponding head, left and right shoulder positions, these pedestrian frames may be retained or filtered using a conventional pedestrian frame-based NMS.

Those skilled in the art will appreciate that the execution sequence of the steps of the pedestrian detection method 200 shown in fig. 2 is merely an example and not a limitation, and for example, the execution sequence of the steps S220, S230, and S240 may be arbitrarily set.

According to the pedestrian detection method provided by the embodiment of the invention, because the pedestrian frames are filtered based on the overlapping condition of the head and shoulder parts rather than the overlapping condition of the pedestrian frames, the problem that the pedestrian frame of a certain pedestrian is used for filtering the pedestrian frames of other pedestrians can be avoided, so that false negatives in pedestrian detection can be reduced, and the pedestrian detection precision is improved.

Illustratively, the pedestrian detection method according to the embodiment of the invention may be implemented in a device, apparatus or system having a memory and a processor.

The pedestrian detection method can be deployed at an image acquisition end, for example, the pedestrian detection method can be deployed at the image acquisition end of a community access control system or the image acquisition end of a security monitoring system in public places such as stations, shopping malls, banks and the like. Alternatively, the pedestrian detection method according to the embodiment of the present invention may also be distributively deployed at the server side (or cloud side) and the client side. For example, an image may be collected at a client, and the client transmits the collected image to a server (or a cloud), so that the server (or the cloud) performs pedestrian detection.

According to the embodiment of the present invention, step S230 may include: detecting the head of a pedestrian in an image to be processed to obtain at least one head frame; for each of at least part of the pedestrian frames, selecting a head frame, of which the distance between the central point and the central point of the pedestrian frame is smaller than a preset head threshold value and the central point is closest to the central point of the pedestrian frame, from at least one head frame as the head frame corresponding to the pedestrian frame, and determining the central point of the head frame corresponding to the pedestrian frame as the head position corresponding to the pedestrian frame; step S240 may include: detecting shoulders of a pedestrian in the image to be processed to determine at least one left shoulder position and at least one right shoulder position; selecting a left shoulder position which is less than a preset left shoulder threshold value and closest to the center point of the pedestrian frame from at least one left shoulder position as a left shoulder position corresponding to the pedestrian frame; and selecting a right shoulder position which is less than a preset right shoulder threshold value from at least one right shoulder position and is closest to the center point of the pedestrian frame as a right shoulder position corresponding to the pedestrian frame.

As described above, several head frames may be obtained through head detection, and pedestrian frames may be associated with the head frames to determine head frames that correspond one-to-one to at least some pedestrian frames, and thus determine head positions that correspond one-to-one to at least some pedestrian frames. In association, a head frame corresponding to each pedestrian frame may be determined according to a distance between the head frame and a center point of the pedestrian frame. For example, for a certain pedestrian frame, a head frame whose distance between the center point and the center point of the pedestrian frame is smaller than a preset head threshold may be found first. If not, the pedestrian frame does not have a corresponding head frame. If so, the head frame with the center point closest to the center point of the pedestrian frame can be found out from all the found head frames to serve as the head frame corresponding to the pedestrian frame. Subsequently, the center point of the head frame may be determined as the head position, thereby determining the head position corresponding to the pedestrian frame.

Several left and right shoulder positions may be obtained by shoulder detection and a pedestrian frame may be associated with the left and right shoulder positions, respectively. The association method is similar to that of the head frame and the pedestrian frame, and is not described in detail.

The preset head threshold, the preset left shoulder threshold and the preset right shoulder threshold can be set according to needs, and the invention does not limit the above.

The above-mentioned mode based on the head frame, left shoulder position and right shoulder position that the pedestrian frame is related to distance is very simple, easily realizes.

According to the embodiment of the present invention, the pedestrian information may further include a pedestrian confidence of each of the plurality of pedestrian frames, the head information may further include a head confidence of a head position corresponding to at least part of the pedestrian frames one to one, the shoulder information may further include a left shoulder confidence of a left shoulder position corresponding to at least part of the pedestrian frames one to one and a right shoulder confidence of a right shoulder position corresponding to the at least part of the pedestrian frames one to one,

prior to step S260, the pedestrian detection method 200 may further include: and for each of at least part of the pedestrian frames, calculating the triangular confidence coefficient of the head-shoulder triangle corresponding to the pedestrian frame according to the pedestrian confidence coefficient of the pedestrian frame, the head confidence coefficient of the head position corresponding to the pedestrian frame, the left shoulder confidence coefficient of the left shoulder position corresponding to the pedestrian frame and the right shoulder confidence coefficient of the right shoulder position corresponding to the pedestrian frame. In one embodiment, the pedestrian confidence of the pedestrian frame, the head confidence of the head position corresponding to the pedestrian frame, the left shoulder confidence of the left shoulder position corresponding to the pedestrian frame, and the right shoulder confidence of the right shoulder position corresponding to the pedestrian frame may be summed, averaged, weighted summed, or weighted averaged to calculate the triangle confidence of the head-shoulder triangle corresponding to the pedestrian frame. It should be understood that the present invention is not limited thereto, and according to practical application and requirements, other calculation methods may be adopted to calculate the triangle confidence of the head-shoulder triangle corresponding to the pedestrian frame according to the pedestrian confidence of the pedestrian frame, the head confidence of the head position corresponding to the pedestrian frame, the left shoulder confidence of the left shoulder position corresponding to the pedestrian frame, and the right shoulder confidence of the right shoulder position corresponding to the pedestrian frame. Step S260 may include: if at least one group of head-shoulder triangles exist in the head-shoulder triangles corresponding to at least part of the pedestrian frames one by one, wherein each group of head-shoulder triangles in the at least one group of head-shoulder triangles comprises a plurality of head-shoulder triangles with the overlapping proportion larger than a preset overlapping threshold value, selecting the rest head-shoulder triangles except the head-shoulder triangle with the highest triangle confidence coefficient for each group of head-shoulder triangles in the at least one group of head-shoulder triangles; and filtering the pedestrian frames corresponding to all the selected head-shoulder triangles in the plurality of pedestrian frames to obtain a pedestrian detection result.

Illustratively, the whole pedestrian, the head and the shoulder of the pedestrian are respectively detected by using a plurality of convolutional neural networks, and when the pedestrian frame, the head frame and the head position, and the left shoulder position and the right shoulder position are obtained, the respective confidence degrees can also be obtained. For each pedestrian box, an overall confidence may be calculated by performing some operation, such as summing, averaging, weighted summing, or weighted averaging, on the pedestrian confidence, head confidence, left shoulder confidence, and right shoulder confidence associated therewith. The overall confidence level may be used to represent a confidence level of the head-shoulder region of the corresponding pedestrian. And under the condition that the head-shoulder part is a head-shoulder triangle, the total confidence coefficient is the triangle confidence coefficient of the head-shoulder triangle.

If the head-shoulder triangles are overlapped and the overlapping proportion (IoU value) is greater than the preset overlapping threshold value, the head-shoulder triangles can be considered to belong to the same pedestrian, so that one head-shoulder triangle in the head-shoulder triangles and the pedestrian frame corresponding to the head-shoulder triangle can be reserved. Among the head-shoulder triangles corresponding to at least part of the pedestrian frames, there may be one or more groups of a plurality of head-shoulder triangles which overlap and have an overlap ratio larger than a preset overlap threshold. The head-shoulder triangle with the highest triangle confidence coefficient and the rest head-shoulder triangles can be found out from each group of head-shoulder triangles. And then, finding out all pedestrian frames corresponding to the selected remaining head-shoulder triangles from the plurality of pedestrian frames and discarding the found pedestrian frames, wherein the remaining pedestrian frames are the final needed pedestrian detection results.

For example, the overlap ratio of the plurality of head-shoulder triangles may refer to the overlap ratio between any two head-shoulder triangles in the plurality of head-shoulder triangles, and may also refer to the overlap ratio between the remaining head-shoulder triangles other than the specific triangle in the plurality of head-shoulder triangles and the specific triangle.

The preset overlap threshold may be set as needed, which is not limited by the present invention.

By the method, the pedestrian frames corresponding to the head-shoulder parts with high confidence coefficient can be reserved, redundant pedestrian frames corresponding to the head-shoulder parts with high confidence coefficient and overlapped head-shoulder parts are discarded, and the purpose of filtering the pedestrian frames based on the overlapping condition of the head-shoulder parts instead of the overlapping condition of the pedestrian frames is achieved. The above method may be understood as NMS based on head and shoulder position.

According to an embodiment of the present invention, before step S220, the pedestrian detection method 200 may further include: extracting the characteristics of the image to be processed by utilizing a first convolution neural network; step S220 may include: inputting the characteristics of the image to be processed into a second convolutional neural network to obtain pedestrian information; step S230 includes: inputting the features of the image to be processed into a third convolutional neural network to obtain at least one head frame and a head confidence of each head frame; selecting a head frame corresponding to at least part of pedestrian frames one by one from the at least one head frame to obtain head information; step S240 may include: inputting the features of the image to be processed into a fourth convolutional neural network to obtain a left shoulder feature map and a right shoulder feature map, wherein the left shoulder feature map is consistent with the size of the image to be processed, the pixel value of each pixel of the left shoulder feature map represents the left shoulder confidence coefficient that the pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the left shoulder, the right shoulder feature map is consistent with the size of the image to be processed, and the pixel value of each pixel of the right shoulder feature map represents the right shoulder confidence coefficient that the pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the right shoulder; and selecting a left shoulder position corresponding to at least part of the pedestrian frames one by one from the at least one left shoulder position indicated by the left shoulder characteristic diagram, and selecting a right shoulder position corresponding to at least part of the pedestrian frames one by one from the at least one right shoulder position indicated by the right shoulder characteristic diagram to obtain shoulder information.

Referring to fig. 3, a schematic diagram illustrating a data processing flow of a pedestrian detection method according to an embodiment of the present invention is shown. As shown in fig. 3, after the to-be-processed image is acquired, the to-be-processed image may be input into a first Convolutional Neural Network (CNN) for feature extraction. The image to be processed may be a static image or any video frame in a video. At the output of the first convolutional neural network, at least one image feature map (feature map) may be obtained. The image feature map output by the first convolutional neural network may represent the features of the image to be processed. Illustratively, the first convolutional neural network may be implemented using a VGG model or a residual network (ResNet) model obtained by pre-training on an ImagNet data set. The first convolutional neural network can be used to extract valuable information in the image to be processed, and then the information of the whole body, the head, the shoulders and the like of the pedestrian in the image to be processed can be detected based on the information, as described below. The first convolutional neural network may be pre-trained using a large number of training images.

Then, a pedestrian in the image to be processed is detected based on the feature of the image to be processed. The pedestrian in the image to be processed can be detected by inputting the image feature map output by the first convolution neural network into the second convolution neural network. The second convolutional neural network may output a number of pedestrian frames and a pedestrian confidence for each pedestrian frame. Illustratively, the second convolutional neural network may be implemented using an accelerated version of the region-based convolutional neural network (fast-RCNN).

Subsequently, the head of the pedestrian in the image to be processed is detected based on the features of the image to be processed. Similarly to the manner of detecting a pedestrian, the head of the pedestrian in the image to be processed may be detected by inputting the image feature map output by the first convolutional neural network into the third convolutional neural network. The third convolutional neural network may output a number of head boxes and a head confidence for each head box. Illustratively, the third convolutional neural network may be implemented using fast-RCNN. The manner of selecting the head frame corresponding to each pedestrian frame from the detected head frames and determining the corresponding head position may refer to the above description, which is not repeated herein.

Subsequently, the shoulder of the pedestrian in the image to be processed is detected based on the feature of the image to be processed. With continued reference to fig. 3, the shoulders of the pedestrian in the image to be processed may be detected by inputting the image feature map output by the first Convolutional neural Network into a fourth Convolutional neural Network, e.g., a full-Convolutional-Network (FCN). The full convolutional network described herein may be similar to that used for semantic segmentation. After inputting the features of the image to be processed into the fourth convolutional neural network (e.g., full convolutional network), two feature maps, i.e., a left shoulder feature map and a right shoulder feature map, may be obtained at the output of the fourth convolutional neural network (e.g., full convolutional network). Taking the left shoulder feature map as an example, a pixel of which the pixel value is greater than a preset confidence threshold value may be selected, and a position of a pixel of the image to be processed, which is consistent with the pixel coordinate selected from the left shoulder feature map, is determined as a left shoulder position. Thereby, at least one left shoulder position may be determined. Subsequently, a left shoulder position corresponding to each pedestrian frame may be found from the determined at least one left shoulder position. The manner of determining the left shoulder position corresponding to each pedestrian frame may refer to the above description, and is not described in detail. The processing of the right shoulder position feature map is similar to the processing of the left shoulder feature map and is not repeated.

Similar to the first convolutional neural network, the second convolutional neural network, the third convolutional neural network, and the fourth convolutional neural network may be trained in advance using a large number of training images. Each convolutional neural network training mode will be described below, and will not be described herein.

Convolutional neural networks are network models capable of autonomous learning, with which valuable information in an image can be extracted and very accurate detection results (e.g., the above-described pedestrian frame, head frame, left shoulder position, right shoulder position, etc.) can be obtained.

According to an embodiment of the present invention, the pedestrian detection method 200 may further include: acquiring a training image and marking data, wherein the marking data comprise a pedestrian frame, a head frame, a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image; constructing a first loss function by taking a pedestrian frame corresponding to each pedestrian in the training image as a target value of the pedestrian frame obtained by processing the training image by using a first convolutional neural network and a second convolutional neural network, constructing a second loss function by taking a head frame corresponding to each pedestrian in the training image as a target value of the head frame obtained by processing the training image by using the first convolutional neural network and a third convolutional neural network, and constructing a third loss function by taking a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image as target values of a left shoulder position and a right shoulder position obtained by processing the training image by using the first convolutional neural network and a fourth convolutional neural network; and training parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network and the fourth convolutional neural network by using at least the first loss function, the second loss function and the third loss function.

Using the previously labeled pedestrian position of each pedestrian in the training image, a first loss function, which is a loss function of the preliminary results of pedestrian detection (multiple pedestrian frames described herein), can be calculated. Using the head position of each pedestrian in the previously labeled training image, a loss function of the head detection result, i.e., a second loss function, can be calculated. Using the left shoulder position and the right shoulder position of each pedestrian in the previously labeled training image, a loss function of the shoulder detection result, i.e., a third loss function, can be calculated. Referring to fig. 3, the locations of the first, second and third loss functions are shown.

And performing multiple rounds of training by using the loss function, wherein the parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network and the fourth convolutional neural network gradually converge to a reasonable value. And finally, the network model obtained by training can be used for pedestrian detection of the image to be processed. In the process of training the parameters in each convolutional neural network and the fourth convolutional neural network, a conventional back propagation algorithm may be used for training, and those skilled in the art can understand the implementation manner of the back propagation algorithm, which is not described herein again.

The traditional pedestrian detection method mainly depends on global features to detect pedestrians, but the pedestrians have certain degrees of freedom, for example, hands and feet have different postures (positions), so that the global features are often difficult to cover all situations. Therefore, the conventional pedestrian detection method has a certain missing (miss) and may cause some false positives (i.e., the part that is not a pedestrian originally is mistakenly considered as a pedestrian and output). According to the embodiment of the invention, the network model used for pedestrian detection can be trained in a multitask learning mode by considering the characteristics of the head, the shoulders and other parts besides the global characteristics of the pedestrian, so that the detection results of all parts can be optimized by better utilizing the correlation of information of multiple parts of the body. Since the tasks of pedestrian overall detection, head detection and shoulder detection are complementary and mutually influenced, the method has the obvious advantage of using a set of network models to train and predict the pedestrians and the head and the shoulders together, and a relatively excellent network model can be trained in this way. When the trained network model is used for pedestrian detection, false positives in pedestrian detection can be reduced, and the pedestrian detection precision is improved.

According to the embodiment of the invention, the labeling data can also comprise a face frame corresponding to each pedestrian in the training image; the pedestrian detection method may further include: constructing a fourth loss function by taking a face frame corresponding to each pedestrian in the training image as a target value of the face frame obtained by processing the training image by using the first convolutional neural network and the fifth convolutional neural network, wherein an output layer of the first convolutional neural network is connected with an input layer of the fifth convolutional neural network; training the parameters in the first, second, third, and fourth convolutional neural networks with at least the first, second, and third loss functions may include: and training parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network, the fourth convolutional neural network and the fifth convolutional neural network by using the first loss function, the second loss function, the third loss function and the fourth loss function.

Face detection may be incorporated in the process of training a network model for pedestrian detection. That is, the image feature map output by the first convolutional neural network after processing the training image may be input to a fifth convolutional neural network, which may output several face boxes and a face confidence for each face box. Similarly to the pedestrian frame and the head frame, the face frame may also be a rectangular frame for indicating a region in which a face may exist in the image to be processed. Face confidence is used to represent the probability that a face is present in the face box.

Using the face position of each pedestrian in the previously labeled training image, a loss function of the face detection result, i.e., a fourth loss function (not shown in fig. 3) can be calculated. In the training process, the first convolutional neural network, the second convolutional neural network, the third convolutional neural network, the fourth convolutional neural network and the fifth convolutional neural network are trained together by utilizing the first loss function, the second loss function, the third loss function and the fourth loss function.

The addition of facial features may further help training to obtain a more excellent network model. The more accurate pedestrian frame, head frame, left shoulder position and right shoulder position can be extracted by utilizing the first convolutional neural network, the second convolutional neural network, the third convolutional neural network and the fourth convolutional neural network obtained by the overall pedestrian detection, the head detection, the shoulder detection and the face detection training, and then the more accurate pedestrian detection result can be obtained.

In an actual pedestrian detection process, when the image to be processed is processed, a face detection result related to the image to be processed may be obtained at an output end of the fifth convolutional neural network, and the result may be used for other processing.

According to another aspect of the present invention, a pedestrian detection apparatus is provided. Fig. 4 shows a schematic block diagram of a pedestrian detection apparatus 400 according to one embodiment of the invention.

As shown in fig. 4, the pedestrian detection apparatus 400 according to the embodiment of the invention includes a to-be-processed image acquisition module 410, a first detection module 420, a second detection module 430, a third detection module 440, an overlap calculation module 450, and a filtering module 460. The various modules may perform the various steps/functions of the pedestrian detection method described above in connection with fig. 2-3, respectively. Only the main functions of the respective components of the pedestrian detection apparatus 400 will be described below, and details that have been described above will be omitted.

The to-be-processed image obtaining module 410 is used for obtaining an image to be processed. The pending image acquisition module 410 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.

The first detection module 420 is configured to detect a pedestrian in the image to be processed to obtain pedestrian information, where the pedestrian information includes a plurality of pedestrian frames. The first detection module 420 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.

The second detecting module 430 is configured to detect the head of a pedestrian in the image to be processed to obtain head information, where the head information includes head positions corresponding to at least some pedestrian frames in the multiple pedestrian frames. The second detection module 430 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.

The third detecting module 440 is configured to detect a shoulder of a pedestrian in the image to be processed to obtain shoulder information, where the shoulder information includes left and right shoulder positions corresponding to at least some pedestrian frames one to one. The third detection module 440 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.

The overlap calculation module 450 is configured to calculate an overlap condition of the head and shoulder positions corresponding to at least some pedestrian frames according to the head position, the left shoulder position, and the right shoulder position corresponding to at least some pedestrian frames. Overlap calculation module 450 may be implemented by processor 102 in the electronic device shown in fig. 1 executing program instructions stored in storage 104.

The filtering module 460 is configured to filter the pedestrian frames according to the overlapping condition to obtain a pedestrian detection result. The filtering module 460 may be implemented by the processor 102 in the electronic device shown in fig. 1 executing program instructions stored in the storage 104.

According to an embodiment of the present invention, the second detection module 430 includes: the head detection submodule is used for detecting the head of a pedestrian in the image to be processed so as to obtain at least one head frame; the head frame selection submodule is used for selecting a head frame, of which the distance between a central point and the central point of the pedestrian frame is smaller than a preset head threshold value and the central point is closest to the central point of the pedestrian frame, from at least one head frame as the head frame corresponding to the pedestrian frame for each of at least part of the pedestrian frames, and determining the central point of the head frame corresponding to the pedestrian frame as the head position corresponding to the pedestrian frame; the third detection module 440 includes: the shoulder detection submodule is used for detecting the shoulders of the pedestrians in the image to be processed so as to determine at least one left shoulder position and at least one right shoulder position; the left shoulder selection submodule is used for selecting a left shoulder position which is less than a preset left shoulder threshold value and closest to the center point of the pedestrian frame from at least one left shoulder position as a left shoulder position corresponding to the pedestrian frame; and the right shoulder selection submodule is used for selecting a right shoulder position which is less than a preset right shoulder threshold value and closest to the center point of the pedestrian frame from at least one right shoulder position as the right shoulder position corresponding to the pedestrian frame.

According to the embodiment of the invention, for each of at least part of the pedestrian frames, the head-shoulder position corresponding to the pedestrian frame is a head-shoulder triangle with the head position, the left shoulder position and the right shoulder position corresponding to the pedestrian frame as vertexes.

According to the embodiment of the present invention, the pedestrian information further includes a pedestrian confidence of each of the plurality of pedestrian frames, the head information further includes a head confidence of a head position corresponding to at least some of the pedestrian frames one to one, the shoulder information further includes a left shoulder confidence of a left shoulder position corresponding to at least some of the pedestrian frames one to one and a right shoulder confidence of a right shoulder position corresponding to at least some of the pedestrian frames one to one, the pedestrian detection apparatus 400 further includes: the confidence coefficient calculation module is used for calculating the triangular confidence coefficient of a head-shoulder triangle corresponding to the pedestrian frame according to the pedestrian confidence coefficient of the pedestrian frame, the head confidence coefficient of the head position corresponding to the pedestrian frame, the left shoulder confidence coefficient of the left shoulder position corresponding to the pedestrian frame and the right shoulder confidence coefficient of the right shoulder position corresponding to the pedestrian frame for each of at least part of the pedestrian frames; the filtering module 460 includes: the triangle selection submodule is used for selecting the rest head-shoulder triangles except the head-shoulder triangle with the highest triangle confidence coefficient for each group of head-shoulder triangles in at least one group of head-shoulder triangles if at least one group of head-shoulder triangles exist in the head-shoulder triangles corresponding to at least part of the pedestrian frames one by one, wherein each group of head-shoulder triangles in the at least one group of head-shoulder triangles comprises a plurality of head-shoulder triangles with the overlapping proportion larger than a preset overlapping threshold value; and the filtering submodule is used for filtering the pedestrian frames corresponding to all the selected head-shoulder triangles in the plurality of pedestrian frames to obtain a pedestrian detection result.

According to an embodiment of the present invention, the pedestrian detection apparatus 400 further includes: the feature extraction module is used for extracting features of the image to be processed by utilizing a first convolution neural network before the first detection module detects the pedestrian in the image to be processed to obtain pedestrian information; the first detection module 420 includes: the first input submodule is used for inputting the characteristics of the image to be processed into the second convolutional neural network so as to obtain pedestrian information; the second detection module 430 includes: the second input submodule is used for inputting the characteristics of the image to be processed into a third convolutional neural network so as to obtain at least one head frame and a head confidence coefficient of each head frame; the first selection submodule is used for selecting the head frames corresponding to at least part of pedestrian frames one by one from the at least one head frame so as to obtain head information; the third detection module 440 includes: a third input submodule, configured to input features of the image to be processed into a fourth convolutional neural network, for example, a full convolutional network, so as to obtain a left shoulder feature map and a right shoulder feature map, where the left shoulder feature map is consistent with the size of the image to be processed, and a pixel value of each pixel of the left shoulder feature map indicates a left shoulder confidence that a pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the left shoulder, and the right shoulder feature map is consistent with the size of the image to be processed, and a pixel value of each pixel of the right shoulder feature map indicates a right shoulder confidence that a pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the right shoulder; and the second selection submodule is used for selecting left shoulder positions corresponding to at least part of the pedestrian frames one by one from the at least one left shoulder position indicated by the left shoulder characteristic diagram and selecting right shoulder positions corresponding to at least part of the pedestrian frames one by one from the at least one right shoulder position indicated by the right shoulder characteristic diagram so as to obtain shoulder information.

According to an embodiment of the present invention, the pedestrian detection apparatus 400 further includes: the training image acquisition module is used for acquiring training images and marking data, wherein the marking data comprise a pedestrian frame, a head frame, a left shoulder position and a right shoulder position corresponding to each pedestrian in the training images; the first construction module is used for constructing a first loss function by taking a pedestrian frame corresponding to each pedestrian in the training image as a target value of the pedestrian frame obtained by processing the training image by using the first convolutional neural network and the second convolutional neural network, constructing a second loss function by taking a head frame corresponding to each pedestrian in the training image as a target value of the head frame obtained by processing the training image by using the first convolutional neural network and the third convolutional neural network, and constructing a third loss function by taking a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image as target values of a left shoulder position and a right shoulder position obtained by processing the training image by using the first convolutional neural network and the fourth convolutional neural network; and the training module is used for training the parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network and the fourth convolutional neural network by using at least the first loss function, the second loss function and the third loss function.

According to the embodiment of the invention, the labeling data further comprises a face frame corresponding to each pedestrian in the training image; the pedestrian detection device 400 further includes: the second construction module is used for constructing a fourth loss function by taking the face frame corresponding to each pedestrian in the training image as a target value of the face frame obtained by processing the training image by using the first convolutional neural network and the fifth convolutional neural network, wherein the output layer of the first convolutional neural network is connected with the input layer of the fifth convolutional neural network; the training module comprises: and the training submodule is used for training the parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network, the fourth convolutional neural network and the fifth convolutional neural network by using the first loss function, the second loss function, the third loss function and the fourth loss function.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

FIG. 5 shows a schematic block diagram of a pedestrian detection system 500 according to one embodiment of the invention. The pedestrian detection system 500 includes an image capture device 510, a storage device 520, and a processor 530.

The image capturing device 510 is used for capturing an image to be processed. The image capture device 510 is optional and the pedestrian detection system 500 may not include the image capture device 510. In this case, an image for pedestrian detection may be acquired using another image acquisition device and the acquired image may be transmitted to the pedestrian detection system 500.

The storage device 520 stores program codes for implementing respective steps in the pedestrian detection method according to the embodiment of the invention.

The processor 530 is configured to run the program codes stored in the storage device 520 to perform the corresponding steps of the pedestrian detection method according to the embodiment of the present invention, and is configured to implement the to-be-processed image acquisition module 410, the first detection module 420, the second detection module 430, the third detection module 440, the overlap calculation module 450, and the filtering module 460 in the pedestrian detection device 400 according to the embodiment of the present invention.

In one embodiment, the program code, when executed by the processor 530, causes the pedestrian detection system 500 to perform the steps of: acquiring an image to be processed; detecting pedestrians in an image to be processed to obtain pedestrian information, wherein the pedestrian information comprises a plurality of pedestrian frames; detecting the heads of pedestrians in an image to be processed to obtain head information, wherein the head information comprises head positions corresponding to at least part of pedestrian frames in a plurality of pedestrian frames in a one-to-one mode; detecting the shoulders of the pedestrians in the image to be processed to obtain shoulder information, wherein the shoulder information comprises left shoulder positions and right shoulder positions which are in one-to-one correspondence with at least part of the pedestrian frames; calculating the overlapping condition of the head and shoulder parts corresponding to at least part of the pedestrian frames one by one according to the head position, the left shoulder position and the right shoulder position corresponding to at least part of the pedestrian frames one by one; and filtering the pedestrian frame according to the overlapping condition to obtain a pedestrian detection result.

In one embodiment, the program code when executed by the processor 530 causes the pedestrian detection system 500 to perform the steps of detecting the head of a pedestrian in an image to be processed to obtain head information comprising: detecting the head of a pedestrian in an image to be processed to obtain at least one head frame; for each of at least part of the pedestrian frames, selecting a head frame, of which the distance between the central point and the central point of the pedestrian frame is smaller than a preset head threshold value and the central point is closest to the central point of the pedestrian frame, from at least one head frame as the head frame corresponding to the pedestrian frame, and determining the central point of the head frame corresponding to the pedestrian frame as the head position corresponding to the pedestrian frame; the program code, when executed by the processor 530, causes the step of detecting a shoulder of a pedestrian in an image to be processed, performed by the pedestrian detection system 500, to obtain shoulder information, to include: detecting shoulders of a pedestrian in the image to be processed to determine at least one left shoulder position and at least one right shoulder position; selecting a left shoulder position which is less than a preset left shoulder threshold value and closest to the center point of the pedestrian frame from at least one left shoulder position as a left shoulder position corresponding to the pedestrian frame; and selecting a right shoulder position which is less than a preset right shoulder threshold value from at least one right shoulder position and is closest to the center point of the pedestrian frame as a right shoulder position corresponding to the pedestrian frame.

In one embodiment, for each of at least some of the pedestrian frames, the head-shoulder position corresponding to the pedestrian frame is a head-shoulder triangle having the head position, the left-shoulder position, and the right-shoulder position corresponding to the pedestrian frame as vertices.

In one embodiment, the pedestrian information further includes a pedestrian confidence for each of the plurality of pedestrian frames, the head information further includes a head confidence for a head position corresponding one-to-one to at least some of the pedestrian frames, the shoulder information further includes a left shoulder confidence for a left shoulder position corresponding one-to-one to at least some of the pedestrian frames and a right shoulder confidence for a right shoulder position corresponding one-to-one to at least some of the pedestrian frames, the program code when executed by the processor 530 further causes the pedestrian detection system 500 to perform, prior to the step of filtering the pedestrian frames according to overlap performed by the pedestrian detection system 500 to obtain the pedestrian detection result: for each of at least part of the pedestrian frames, calculating the triangular confidence coefficient of the head-shoulder triangle corresponding to the pedestrian frame according to the pedestrian confidence coefficient of the pedestrian frame, the head confidence coefficient of the head position corresponding to the pedestrian frame, the left shoulder confidence coefficient of the left shoulder position corresponding to the pedestrian frame and the right shoulder confidence coefficient of the right shoulder position corresponding to the pedestrian frame; the program code, when executed by the processor 530, causes the pedestrian detection system 500 to perform the steps of filtering pedestrian frames according to overlap to obtain pedestrian detection results, including: if at least one group of head-shoulder triangles exist in the head-shoulder triangles corresponding to at least part of the pedestrian frames one by one, wherein each group of head-shoulder triangles in the at least one group of head-shoulder triangles comprises a plurality of head-shoulder triangles with the overlapping proportion larger than a preset overlapping threshold value, selecting the rest head-shoulder triangles except the head-shoulder triangle with the highest triangle confidence coefficient for each group of head-shoulder triangles in the at least one group of head-shoulder triangles; and filtering the pedestrian frames corresponding to all the selected head-shoulder triangles in the plurality of pedestrian frames to obtain a pedestrian detection result.

In one embodiment, before the step of detecting a pedestrian in an image to be processed performed by the pedestrian detection system 500 to obtain pedestrian information is executed by the processor 530, the program code when executed by the processor 530 further causes the pedestrian detection system 500 to perform: extracting the characteristics of the image to be processed by utilizing a first convolution neural network; the program code, when executed by the processor 530, causes the pedestrian detection system 500 to perform the steps of detecting a pedestrian in an image to be processed to obtain pedestrian information comprising: inputting the characteristics of the image to be processed into a second convolutional neural network to obtain pedestrian information; the program code, when executed by the processor 530, causes the pedestrian detection system 500 to perform the steps of detecting the head of a pedestrian in an image to be processed to obtain head information, including: inputting the features of the image to be processed into a third convolutional neural network to obtain at least one head frame and a head confidence of each head frame; selecting a head frame corresponding to at least part of pedestrian frames one by one from the at least one head frame to obtain head information; the program code, when executed by the processor 530, causes the step of detecting a shoulder of a pedestrian in an image to be processed, performed by the pedestrian detection system 500, to obtain shoulder information, to include: inputting the features of the image to be processed into a fourth convolutional neural network to obtain a left shoulder feature map and a right shoulder feature map, wherein the left shoulder feature map is consistent with the size of the image to be processed, the pixel value of each pixel of the left shoulder feature map represents the left shoulder confidence coefficient that the pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the left shoulder, the right shoulder feature map is consistent with the size of the image to be processed, and the pixel value of each pixel of the right shoulder feature map represents the right shoulder confidence coefficient that the pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the right shoulder; and selecting a left shoulder position corresponding to at least part of the pedestrian frames one by one from the at least one left shoulder position indicated by the left shoulder characteristic diagram, and selecting a right shoulder position corresponding to at least part of the pedestrian frames one by one from the at least one right shoulder position indicated by the right shoulder characteristic diagram to obtain shoulder information.

In one embodiment, the program code, when executed by the processor 530, further causes the pedestrian detection system 500 to perform: acquiring a training image and marking data, wherein the marking data comprise a pedestrian frame, a head frame, a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image; constructing a first loss function by taking a pedestrian frame corresponding to each pedestrian in the training image as a target value of the pedestrian frame obtained by processing the training image by using a first convolutional neural network and a second convolutional neural network, constructing a second loss function by taking a head frame corresponding to each pedestrian in the training image as a target value of the head frame obtained by processing the training image by using the first convolutional neural network and a third convolutional neural network, and constructing a third loss function by taking a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image as target values of a left shoulder position and a right shoulder position obtained by processing the training image by using the first convolutional neural network and a fourth convolutional neural network; and training parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network and the fourth convolutional neural network by using at least the first loss function, the second loss function and the third loss function.

In one embodiment, the annotation data further includes a face frame corresponding to each pedestrian in the training image; the program code, when executed by the processor 530, further causes the pedestrian detection system 500 to perform: constructing a fourth loss function by taking a face frame corresponding to each pedestrian in the training image as a target value of the face frame obtained by processing the training image by using the first convolutional neural network and the fifth convolutional neural network, wherein an output layer of the first convolutional neural network is connected with an input layer of the fifth convolutional neural network; the program code when executed by the processor 530 causes the pedestrian detection system 500 to perform the step of training the parameters in the first, second, third, and fourth convolutional neural networks using at least the first, second, and third loss functions, including: and training parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network, the fourth convolutional neural network and the fifth convolutional neural network by using the first loss function, the second loss function, the third loss function and the fourth loss function.

Furthermore, according to an embodiment of the present invention, there is also provided a storage medium on which program instructions are stored, which when executed by a computer or a processor are used for executing the respective steps of the pedestrian detection method according to an embodiment of the present invention and for implementing the respective modules in the pedestrian detection apparatus according to an embodiment of the present invention. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media.

In one embodiment, the computer program instructions, when executed by a computer or processor, may cause the computer or processor to implement the various functional modules of the pedestrian detection apparatus according to the embodiment of the invention, and/or may perform the pedestrian detection method according to the embodiment of the invention.

In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the steps of: acquiring an image to be processed; detecting pedestrians in an image to be processed to obtain pedestrian information, wherein the pedestrian information comprises a plurality of pedestrian frames; detecting the heads of pedestrians in an image to be processed to obtain head information, wherein the head information comprises head positions corresponding to at least part of pedestrian frames in a plurality of pedestrian frames in a one-to-one mode; detecting the shoulders of the pedestrians in the image to be processed to obtain shoulder information, wherein the shoulder information comprises left shoulder positions and right shoulder positions which are in one-to-one correspondence with at least part of the pedestrian frames; calculating the overlapping condition of the head and shoulder parts corresponding to at least part of the pedestrian frames one by one according to the head position, the left shoulder position and the right shoulder position corresponding to at least part of the pedestrian frames one by one; and filtering the pedestrian frame according to the overlapping condition to obtain a pedestrian detection result.

In one embodiment, the computer program instructions, when executed by a computer, cause the computer to perform the step of detecting the head of a pedestrian in an image to be processed to obtain head information, comprising: detecting the head of a pedestrian in an image to be processed to obtain at least one head frame; for each of at least part of the pedestrian frames, selecting a head frame, of which the distance between the central point and the central point of the pedestrian frame is smaller than a preset head threshold value and the central point is closest to the central point of the pedestrian frame, from at least one head frame as the head frame corresponding to the pedestrian frame, and determining the central point of the head frame corresponding to the pedestrian frame as the head position corresponding to the pedestrian frame; the computer program instructions, when executed by a computer, cause the computer to perform the step of detecting a shoulder of a pedestrian in an image to be processed to obtain shoulder information, comprising: detecting shoulders of a pedestrian in the image to be processed to determine at least one left shoulder position and at least one right shoulder position; selecting a left shoulder position which is less than a preset left shoulder threshold value and closest to the center point of the pedestrian frame from at least one left shoulder position as a left shoulder position corresponding to the pedestrian frame; and selecting a right shoulder position which is less than a preset right shoulder threshold value from at least one right shoulder position and is closest to the center point of the pedestrian frame as a right shoulder position corresponding to the pedestrian frame.

In one embodiment, the pedestrian information further comprises a pedestrian confidence for each of the plurality of pedestrian frames, the head information further comprises a head confidence for a head position corresponding one-to-one to at least some of the pedestrian frames, the shoulder information further comprises a left shoulder confidence for a left shoulder position corresponding one-to-one to at least some of the pedestrian frames and a right shoulder confidence for a right shoulder position corresponding one-to-one to at least some of the pedestrian frames, the computer program instructions, when executed by the computer, further cause the computer to perform, prior to the step of filtering the pedestrian frames according to overlap, performed by the computer, to obtain the pedestrian detection result: for each of at least part of the pedestrian frames, calculating the triangular confidence coefficient of the head-shoulder triangle corresponding to the pedestrian frame according to the pedestrian confidence coefficient of the pedestrian frame, the head confidence coefficient of the head position corresponding to the pedestrian frame, the left shoulder confidence coefficient of the left shoulder position corresponding to the pedestrian frame and the right shoulder confidence coefficient of the right shoulder position corresponding to the pedestrian frame; the computer program instructions, when executed by a computer, cause the computer to perform the step of filtering pedestrian frames according to overlap to obtain pedestrian detection results, comprising: if at least one group of head-shoulder triangles exist in the head-shoulder triangles corresponding to at least part of the pedestrian frames one by one, wherein each group of head-shoulder triangles in the at least one group of head-shoulder triangles comprises a plurality of head-shoulder triangles with the overlapping proportion larger than a preset overlapping threshold value, selecting the rest head-shoulder triangles except the head-shoulder triangle with the highest triangle confidence coefficient for each group of head-shoulder triangles in the at least one group of head-shoulder triangles; and filtering the pedestrian frames corresponding to all the selected head-shoulder triangles in the plurality of pedestrian frames to obtain a pedestrian detection result.

In one embodiment, prior to the step of detecting a pedestrian in an image to be processed performed by the computer to obtain pedestrian information, the computer program instructions, when executed by the computer, further cause the computer to perform: extracting the characteristics of the image to be processed by utilizing a first convolution neural network; the computer program instructions, when executed by a computer, cause the computer to perform the steps of detecting a pedestrian in an image to be processed to obtain pedestrian information, comprising: inputting the characteristics of the image to be processed into a second convolutional neural network to obtain pedestrian information; the computer program instructions, when executed by a computer, cause the computer to perform the steps of detecting the head of a pedestrian in an image to be processed to obtain head information, comprising: inputting the features of the image to be processed into a third convolutional neural network to obtain at least one head frame and a head confidence of each head frame; selecting a head frame corresponding to at least part of pedestrian frames one by one from the at least one head frame to obtain head information; the computer program instructions, when executed by a computer, cause the computer to perform the step of detecting a shoulder of a pedestrian in an image to be processed to obtain shoulder information, comprising: inputting the features of the image to be processed into a fourth convolutional neural network to obtain a left shoulder feature map and a right shoulder feature map, wherein the left shoulder feature map is consistent with the size of the image to be processed, the pixel value of each pixel of the left shoulder feature map represents the left shoulder confidence coefficient that the pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the left shoulder, the right shoulder feature map is consistent with the size of the image to be processed, and the pixel value of each pixel of the right shoulder feature map represents the right shoulder confidence coefficient that the pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the right shoulder; and selecting a left shoulder position corresponding to at least part of the pedestrian frames one by one from the at least one left shoulder position indicated by the left shoulder characteristic diagram, and selecting a right shoulder position corresponding to at least part of the pedestrian frames one by one from the at least one right shoulder position indicated by the right shoulder characteristic diagram to obtain shoulder information.

In one embodiment, the computer program instructions, when executed by a computer, further cause the computer to perform: acquiring a training image and marking data, wherein the marking data comprise a pedestrian frame, a head frame, a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image; constructing a first loss function by taking a pedestrian frame corresponding to each pedestrian in the training image as a target value of the pedestrian frame obtained by processing the training image by using a first convolutional neural network and a second convolutional neural network, constructing a second loss function by taking a head frame corresponding to each pedestrian in the training image as a target value of the head frame obtained by processing the training image by using the first convolutional neural network and a third convolutional neural network, and constructing a third loss function by taking a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image as target values of a left shoulder position and a right shoulder position obtained by processing the training image by using the first convolutional neural network and a fourth convolutional neural network; and training parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network and the fourth convolutional neural network by using at least the first loss function, the second loss function and the third loss function.

In one embodiment, the annotation data further includes a face frame corresponding to each pedestrian in the training image; the computer program instructions, when executed by a computer, further cause the computer to perform: constructing a fourth loss function by taking a face frame corresponding to each pedestrian in the training image as a target value of the face frame obtained by processing the training image by using the first convolutional neural network and the fifth convolutional neural network, wherein an output layer of the first convolutional neural network is connected with an input layer of the fifth convolutional neural network; the computer program instructions, when executed by a computer, cause the computer to perform the step of training parameters in at least a first convolutional neural network, a second convolutional neural network, a third convolutional neural network, and a fourth convolutional neural network with at least a first loss function, a second loss function, and a third loss function, comprising: and training parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network, the fourth convolutional neural network and the fifth convolutional neural network by using the first loss function, the second loss function, the third loss function and the fourth loss function.

The modules in the pedestrian detection system according to the embodiment of the invention may be implemented by the processor of the electronic device implementing pedestrian detection according to the embodiment of the invention running computer program instructions stored in the memory, or may be implemented when computer instructions stored in the computer-readable storage medium of the computer program product according to the embodiment of the invention are run by a computer.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the foregoing illustrative embodiments are merely exemplary and are not intended to limit the scope of the invention thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present invention should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some of the modules in a pedestrian detection arrangement according to embodiments of the present invention. The present invention may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only for the specific embodiment of the present invention or the description thereof, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A pedestrian detection method, comprising:

acquiring an image to be processed;

detecting pedestrians in the image to be processed to obtain pedestrian information, wherein the pedestrian information comprises a plurality of pedestrian frames;

detecting the heads of pedestrians in the image to be processed to obtain head information, wherein the head information comprises head positions corresponding to at least part of the pedestrian frames in the plurality of pedestrian frames in a one-to-one mode;

detecting the shoulders of the pedestrians in the image to be processed to obtain shoulder information, wherein the shoulder information comprises left shoulder positions and right shoulder positions which are in one-to-one correspondence with the at least part of the pedestrian frames;

calculating the overlapping condition of the head and shoulder parts corresponding to the at least part of the pedestrian frames one by one according to the head position, the left shoulder position and the right shoulder position corresponding to the at least part of the pedestrian frames one by one; and

and filtering the pedestrian frame according to the overlapping condition to obtain a pedestrian detection result.

2. The pedestrian detection method according to claim 1,

the detecting the head of the pedestrian in the image to be processed to obtain the head information comprises:

detecting the head of a pedestrian in the image to be processed to obtain at least one head frame; and

for each of the at least part of the pedestrian frames, selecting a head frame, of which the distance between the central point and the central point of the pedestrian frame is smaller than a preset head threshold value and the central point is closest to the central point of the pedestrian frame, from the at least one head frame as the head frame corresponding to the pedestrian frame, and determining the central point of the head frame corresponding to the pedestrian frame as the head position corresponding to the pedestrian frame;

the detecting the shoulder of the pedestrian in the image to be processed to obtain the shoulder information comprises:

detecting shoulders of a pedestrian in the image to be processed to determine at least one left shoulder position and at least one right shoulder position;

selecting a left shoulder position which is less than a preset left shoulder threshold value and closest to the center point of the pedestrian frame from the at least one left shoulder position as a left shoulder position corresponding to the pedestrian frame; and

and selecting a right shoulder position which is less than a preset right shoulder threshold value and closest to the center point of the pedestrian frame from the at least one right shoulder position as a right shoulder position corresponding to the pedestrian frame.

3. The pedestrian detection method according to claim 1, wherein, for each of the at least some pedestrian frames, the head-shoulder position corresponding to the pedestrian frame is a head-shoulder triangle having a head position, a left-shoulder position, and a right-shoulder position corresponding to the pedestrian frame as vertices.

4. The pedestrian detection method according to claim 3, wherein the pedestrian information further includes a pedestrian confidence for each of the plurality of pedestrian frames, the head information further includes a head confidence for a head position in one-to-one correspondence with the at least partial pedestrian frame, the shoulder information further includes a left shoulder confidence for a left shoulder position in one-to-one correspondence with the at least partial pedestrian frame and a right shoulder confidence for a right shoulder position in one-to-one correspondence with the at least partial pedestrian frame,

before the filtering pedestrian frames according to the overlapping condition to obtain the pedestrian detection result, the pedestrian detection method further comprises the following steps:

for each of the at least part of the pedestrian frames, calculating a triangular confidence coefficient of a head-shoulder triangle corresponding to the pedestrian frame according to the pedestrian confidence coefficient of the pedestrian frame, the head confidence coefficient of the head position corresponding to the pedestrian frame, the left shoulder confidence coefficient of the left shoulder position corresponding to the pedestrian frame and the right shoulder confidence coefficient of the right shoulder position corresponding to the pedestrian frame;

the filtering the pedestrian frame according to the overlapping condition to obtain a pedestrian detection result comprises:

if at least one group of head-shoulder triangles exist in the head-shoulder triangles corresponding to at least part of the pedestrian frames one by one, wherein each group of head-shoulder triangles in the at least one group of head-shoulder triangles comprises a plurality of head-shoulder triangles with the overlapping proportion larger than a preset overlapping threshold value, selecting the rest head-shoulder triangles except the head-shoulder triangle with the highest triangle confidence coefficient for each group of head-shoulder triangles in the at least one group of head-shoulder triangles; and

and filtering pedestrian frames corresponding to all the selected head-shoulder triangles in the plurality of pedestrian frames to obtain the pedestrian detection result.

5. The pedestrian detection method according to claim 1,

before the detecting the pedestrian in the image to be processed to obtain the pedestrian information, the pedestrian detection method further comprises:

extracting the characteristics of the image to be processed by utilizing a first convolution neural network;

the detecting the pedestrian in the image to be processed to obtain the pedestrian information comprises:

inputting the characteristics of the image to be processed into a second convolutional neural network to obtain the pedestrian information;

inputting the features of the image to be processed into a third convolutional neural network to obtain at least one head frame and a head confidence of each head frame; and

selecting a head frame corresponding to the at least partial pedestrian frame one by one from the at least one head frame to obtain the head information;

inputting the features of the image to be processed into a fourth convolutional neural network to obtain a left shoulder feature map and a right shoulder feature map, wherein the left shoulder feature map is consistent with the size of the image to be processed, the pixel value of each pixel in the left shoulder feature map represents the left shoulder confidence coefficient that the pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the left shoulder, and the right shoulder feature map is consistent with the size of the image to be processed, and the pixel value of each pixel in the right shoulder feature map represents the right shoulder confidence coefficient that the pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to the right shoulder; and

selecting a left shoulder position corresponding to the at least partial pedestrian frame one by one from the at least one left shoulder position indicated by the left shoulder feature map, and selecting a right shoulder position corresponding to the at least partial pedestrian frame one by one from the at least one right shoulder position indicated by the right shoulder feature map to obtain the shoulder information.

6. The pedestrian detection method according to claim 5, wherein the pedestrian detection method further comprises:

acquiring a training image and marking data, wherein the marking data comprise a pedestrian frame, a head frame, a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image;

constructing a first loss function by taking a pedestrian frame corresponding to each pedestrian in the training image as a target value of a pedestrian frame obtained by processing the training image by using the first convolutional neural network and the second convolutional neural network, constructing a second loss function by taking a head frame corresponding to each pedestrian in the training image as a target value of a head frame obtained by processing the training image by using the first convolutional neural network and the third convolutional neural network, and constructing a third loss function by taking a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image as target values of a left shoulder position and a right shoulder position obtained by processing the training image by using the first convolutional neural network and the fourth convolutional neural network; and

training parameters in the first, second, third, and fourth convolutional neural networks with at least the first, second, and third loss functions.

7. The pedestrian detection method according to claim 6, wherein the annotation data further includes a face frame corresponding to each pedestrian in the training image;

the pedestrian detection method further includes:

constructing a fourth loss function by taking a face frame corresponding to each pedestrian in the training image as a target value of the face frame obtained by processing the training image by using the first convolutional neural network and a fifth convolutional neural network, wherein an output layer of the first convolutional neural network is connected with an input layer of the fifth convolutional neural network;

the training the parameters in the first, second, third, and fourth convolutional neural networks with at least the first, second, and third loss functions comprises:

training parameters in the first, second, third, fourth, and fifth convolutional neural networks using the first, second, third, and fourth loss functions.

8. A pedestrian detection apparatus comprising:

the image to be processed acquisition module is used for acquiring an image to be processed;

the first detection module is used for detecting pedestrians in the image to be processed to obtain pedestrian information, wherein the pedestrian information comprises a plurality of pedestrian frames;

the second detection module is used for detecting the head of the pedestrian in the image to be processed to obtain head information, wherein the head information comprises head positions in one-to-one correspondence with at least part of pedestrian frames in the plurality of pedestrian frames;

the third detection module is used for detecting the shoulders of the pedestrians in the image to be processed to obtain shoulder information, wherein the shoulder information comprises left shoulder positions and right shoulder positions which are in one-to-one correspondence with at least part of the pedestrian frames;

the overlapping calculation module is used for calculating the overlapping condition of the head and shoulder parts which correspond to the at least part of the pedestrian frames one by one according to the head position, the left shoulder position and the right shoulder position which correspond to the at least part of the pedestrian frames one by one; and

and the filtering module is used for filtering the pedestrian frame according to the overlapping condition so as to obtain a pedestrian detection result.

9. The pedestrian detection device according to claim 8,

the second detection module includes:

the head detection submodule is used for detecting the head of a pedestrian in the image to be processed so as to obtain at least one head frame; and

the head frame selection submodule is used for selecting a head frame, of which the distance between a central point and the central point of the pedestrian frame is smaller than a preset head threshold value and the central point is closest to the central point of the pedestrian frame, from the at least one head frame as the head frame corresponding to the pedestrian frame for each of the at least part of pedestrian frames, and determining the central point of the head frame corresponding to the pedestrian frame as the head position corresponding to the pedestrian frame;

the third detection module includes:

the shoulder detection submodule is used for detecting the shoulders of the pedestrians in the image to be processed so as to determine at least one left shoulder position and at least one right shoulder position;

the left shoulder selection submodule is used for selecting a left shoulder position which is less than a preset left shoulder threshold value and closest to the center point of the pedestrian frame from the at least one left shoulder position as a left shoulder position corresponding to the pedestrian frame; and

and the right shoulder selection submodule is used for selecting a right shoulder position which is less than a preset right shoulder threshold value and closest to the center point of the pedestrian frame from the at least one right shoulder position as the right shoulder position corresponding to the pedestrian frame.

10. The pedestrian detection device according to claim 8, wherein, for each of the at least some pedestrian frames, the head-shoulder position corresponding to the pedestrian frame is a head-shoulder triangle having a head position, a left-shoulder position, and a right-shoulder position corresponding to the pedestrian frame as vertices.

11. The pedestrian detection apparatus according to claim 10, wherein the pedestrian information further includes a pedestrian confidence for each of the plurality of pedestrian frames, the head information further includes a head confidence for a head position in one-to-one correspondence with the at least partial pedestrian frame, the shoulder information further includes a left shoulder confidence for a left shoulder position in one-to-one correspondence with the at least partial pedestrian frame and a right shoulder confidence for a right shoulder position in one-to-one correspondence with the at least partial pedestrian frame,

the pedestrian detection device further includes:

the confidence coefficient calculation module is used for calculating the triangular confidence coefficient of a head-shoulder triangle corresponding to the pedestrian frame according to the pedestrian confidence coefficient of the pedestrian frame, the head confidence coefficient of the head position corresponding to the pedestrian frame, the left shoulder confidence coefficient of the left shoulder position corresponding to the pedestrian frame and the right shoulder confidence coefficient of the right shoulder position corresponding to the pedestrian frame for each of the at least part of pedestrian frames;

the filtration module includes:

a triangle selection submodule, configured to select, if at least one group of head-shoulder triangles exists in the head-shoulder triangles that correspond to the at least part of the pedestrian frames one to one, a remaining head-shoulder triangle other than the head-shoulder triangle with the highest triangle confidence level for each group of the at least one group of head-shoulder triangles, where each group of the at least one group of head-shoulder triangles includes a plurality of head-shoulder triangles whose overlap ratios are greater than a preset overlap threshold; and

and the filtering submodule is used for filtering the pedestrian frames corresponding to all the selected head-shoulder triangles in the plurality of pedestrian frames to obtain the pedestrian detection result.

12. The pedestrian detection device according to claim 8,

the pedestrian detection device further includes:

the feature extraction module is used for extracting features of the image to be processed by utilizing a first convolution neural network before the first detection module detects the pedestrian in the image to be processed to obtain pedestrian information;

the first detection module includes:

the first input submodule is used for inputting the characteristics of the image to be processed into a second convolutional neural network so as to obtain the pedestrian information;

the second detection module includes:

a second input submodule, configured to input features of the image to be processed into a third convolutional neural network, so as to obtain at least one head box and a head confidence of each head box; and

a first selection submodule, configured to select a header frame corresponding to the at least partial pedestrian frame one to one from the at least one header frame, so as to obtain the header information;

the third detection module includes:

a third input submodule, configured to input features of the image to be processed into a fourth convolutional neural network, so as to obtain a left shoulder feature map and a right shoulder feature map, where the left shoulder feature map is consistent with the size of the image to be processed, and a pixel value of each pixel in the left shoulder feature map indicates a left shoulder confidence that a pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to a left shoulder, and the right shoulder feature map is consistent with the size of the image to be processed, and a pixel value of each pixel in the right shoulder feature map indicates a right shoulder confidence that a pixel of the image to be processed, which is consistent with the pixel coordinate, belongs to a right shoulder; and

and the second selection submodule is used for selecting left shoulder positions corresponding to the at least part of the pedestrian frames one by one from the at least one left shoulder position indicated by the left shoulder characteristic diagram and selecting right shoulder positions corresponding to the at least part of the pedestrian frames one by one from the at least one right shoulder position indicated by the right shoulder characteristic diagram so as to obtain the shoulder information.

13. The pedestrian detection device according to claim 12, wherein the pedestrian detection device further comprises:

the system comprises a training image acquisition module, a marking module and a judging module, wherein the training image acquisition module is used for acquiring a training image and marking data, and the marking data comprises a pedestrian frame, a head frame, a left shoulder position and a right shoulder position which correspond to each pedestrian in the training image;

a first constructing module, configured to construct a first loss function with a pedestrian frame corresponding to each pedestrian in the training image as a target value of a pedestrian frame obtained by processing the training image using the first convolutional neural network and the second convolutional neural network, construct a second loss function with a head frame corresponding to each pedestrian in the training image as a target value of a head frame obtained by processing the training image using the first convolutional neural network and the third convolutional neural network, and construct a third loss function with a left shoulder position and a right shoulder position corresponding to each pedestrian in the training image as target values of a left shoulder position and a right shoulder position obtained by processing the training image using the first convolutional neural network and the fourth convolutional neural network; and

a training module, configured to train parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network, and the fourth convolutional neural network with at least the first loss function, the second loss function, and the third loss function.

14. The pedestrian detection device according to claim 13, wherein the annotation data further includes a face frame corresponding to each pedestrian in the training image;

the pedestrian detection device further includes:

a second constructing module, configured to construct a fourth loss function with a face frame corresponding to each pedestrian in the training image as a target value of the face frame obtained by processing the training image using the first convolutional neural network and a fifth convolutional neural network, where an output layer of the first convolutional neural network is connected to an input layer of the fifth convolutional neural network;

the training module comprises:

a training sub-module, configured to train parameters in the first convolutional neural network, the second convolutional neural network, the third convolutional neural network, the fourth convolutional neural network, and the fifth convolutional neural network using the first loss function, the second loss function, the third loss function, and the fourth loss function.