CN111753625B

CN111753625B - Pedestrian detection method, device, equipment and medium

Info

Publication number: CN111753625B
Application number: CN202010192213.4A
Authority: CN
Inventors: 马事伟; 吴江旭; 胡淼枫; 王璟璟; 聂铭君; 刘永文; 戚龙雨; 石金玉; 徐达炜; 张然; 赵旭民
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2024-04-09
Anticipated expiration: 2040-03-18
Also published as: CN111753625A

Abstract

The embodiment of the invention discloses a pedestrian detection method, a device, equipment and a medium, wherein the method comprises the following steps: acquiring an image to be detected; inputting an image to be detected into a trained single-time target detector, and obtaining output information of the single-time target detector; determining a pedestrian detection result of the image to be detected according to the output information of the single target detector; the single-time target detector is obtained by training an original detection model comprising an initially constructed single-time target detector and a head detection network in advance. The pedestrian detection method provided by the embodiment of the invention realizes the improvement of the pedestrian detection precision on the basis of ensuring the pedestrian detection speed of the single target detector by using the original detection model which comprises the single target detector and the head detection network which are initially constructed to train to perform pedestrian detection on the single target detector.

Description

Pedestrian detection method, device, equipment and medium

Technical Field

The embodiment of the invention relates to the field of target detection, in particular to a pedestrian detection method, a pedestrian detection device, a pedestrian detection equipment and a pedestrian detection medium.

Background

Pedestrian detection has many application scenarios in the field of computer vision, such as security monitoring, autopilot, robotics, etc. The mainstream pedestrian detection methods at present are mostly based on deep learning, such as target detector fast RCNN based on candidate area, or single target detector SSD, YOLO, etc. The candidate region-based object detector is divided into two parts, one part is a region candidate Region Proposal Networks, RPN, network and one part is a region-based convolutional (Fast R-CNN) network. When the method is used, the RPN is used for roughly extracting the candidate region of the foreground frame, then Fast R-CNN is used for fine tuning the candidate region, and the final object coordinates and the object classification result are regressed. The single target detector has no RPN network, and directly returns the object coordinates and the object classification result.

In the process of implementing the present invention, the inventor finds that at least the following technical problems exist in the prior art: the above method has obtained good results on standard pedestrian detection data, but satisfactory results have not been obtained in occlusion scenes (including intra-class occlusion, person-to-person occlusion, inter-class occlusion, person-to-object occlusion, etc.). At present, in order to improve the pedestrian recognition precision in the shielding scene, some optimization methods applied to the target detector based on the candidate region are proposed, but the target detector based on the candidate region has high precision but low speed, and the speed of the single target detector is high, so how to improve the pedestrian detection precision on the basis of ensuring the pedestrian detection speed of the single target detector is a technical problem to be solved urgently.

Disclosure of Invention

The embodiment of the invention provides a pedestrian detection method, a device, equipment and a medium, which are used for improving the pedestrian detection precision on the basis of ensuring the pedestrian detection speed of a single target detector.

In a first aspect, an embodiment of the present invention provides a pedestrian detection method, including:

acquiring an image to be detected;

inputting the image to be detected into a trained single-time target detector, and obtaining output information of the single-time target detector;

determining a pedestrian detection result of the image to be detected according to the output information of the single target detector;

the single-time target detector is obtained by training an original detection model comprising an initially constructed single-time target detector and a head detection network in advance.

In a second aspect, an embodiment of the present invention further provides a pedestrian detection apparatus, including:

the image acquisition module to be detected is used for acquiring an image to be detected;

the image pedestrian detection module is used for inputting the image to be detected into a trained single-time target detector to acquire output information of the single-time target detector, wherein the single-time target detector is obtained by training an original detection model comprising an initially constructed single-time target detector and a head detection network in advance;

And the detection result determining module is used for determining the pedestrian detection result of the image to be detected according to the output information of the single target detector.

In a third aspect, an embodiment of the present invention further provides a computer apparatus, the apparatus including:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the pedestrian detection method as provided by any embodiment of the present invention.

In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a pedestrian detection method as provided by any of the embodiments of the present invention.

The embodiment of the invention obtains the image to be detected; inputting an image to be detected into a trained single-time target detector, and obtaining output information of the single-time target detector; determining a pedestrian detection result of the image to be detected according to the output information of the single target detector; the single-time target detector is obtained by training an original detection model comprising the initially constructed single-time target detector and the head detection network in advance, and pedestrian detection is carried out on the single-time target detector obtained by training the original detection model comprising the initially constructed single-time target detector and the head detection network in advance, so that pedestrian detection precision is improved on the basis of guaranteeing the pedestrian detection speed of the single-time target detector.

Drawings

Fig. 1 is a flowchart of a pedestrian detection method according to a first embodiment of the present invention;

fig. 2 is a flowchart of a pedestrian detection method according to a second embodiment of the present invention;

fig. 3a is a flowchart of a pedestrian detection method according to a third embodiment of the present invention;

fig. 3b is a schematic diagram of a network architecture of an original detection model according to a third embodiment of the present invention;

fig. 4 is a schematic structural view of a pedestrian detection device according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1 is a flowchart of a pedestrian detection method according to an embodiment of the invention. The present embodiment is applicable to the case when pedestrian detection is performed. The method may be performed by a pedestrian detection device, which may be implemented in software and/or hardware, e.g. which may be configured in a computer apparatus. As shown in fig. 1, the method includes:

S110, acquiring an image to be detected.

In this embodiment, the image to be detected may be an image in which pedestrian detection is required. The method for acquiring the image to be detected is not limited herein. Optionally, the video frame shot by the camera can be directly obtained as the image to be detected, and the existing video can be processed to obtain the image to be detected for pedestrian detection.

S120, inputting an image to be detected into a trained single-time target detector to acquire output information of the single-time target detector, wherein the single-time target detector is obtained by training an original detection model comprising an initially constructed single-time target detector and a head detection network in advance.

After the image to be detected is obtained, the pre-trained single target detector is used for detecting the image to be detected, and the output information of the single target detector is obtained, so that the pedestrian detection result is determined according to the output information of the single target detector. The single-shot object detector may be a single-shot multi-frame detector (Single Shot MultiBox Detector, SSD), YOLO (You Only Live Once), retinaNet, or the like. Alternatively, the output information of the single target detector may be the identified pedestrian frames in the image to be detected, and the scores of the pedestrian frames.

The single-shot object detector is obtained by training an original detection model including an initially constructed single-shot object detector and a head detection network in advance. In order to solve the technical problem that the detection result of the single target detector in the shielding scene is inaccurate in the prior art, when the single target detector is trained, a head detection network and the single target detector are combined for training, the detection result of the head detection network and the detection result of the single target detector are combined, setting parameters in the single target detector are adjusted, optimization of parameters in the single target detector is achieved, and the trained single target detector identification result is more accurate.

S130, determining a pedestrian detection result of the image to be detected according to the output information of the single target detector.

In this embodiment, after obtaining output information of the single target detector after detecting the image to be detected, a pedestrian detection result of the image to be detected is determined according to the output information of the single target detector. Alternatively, the pedestrian detection result of the image to be detected may be set according to the detection requirement. If the detection requirement is that the number of pedestrians contained in the image to be detected is detected, counting the number of pedestrian frames output by the single target detector as a pedestrian detection result of the image to be detected; if the detection requirement is that the pedestrian position in the image to be detected is detected, determining the pedestrian position in the image to be detected according to the pedestrian frame position output by the single target detector.

The embodiment of the invention obtains the image to be detected; inputting an image to be detected into a trained single-time target detector, and obtaining output information of the single-time target detector; determining a pedestrian detection result of the image to be detected according to the output information of the single target detector; the single-time target detector is obtained by training an original detection model comprising an initially constructed single-time target detector and a head detection network in advance. By using an original detection model which is formed by pre-training an initial single-shot target detector and a head detection network, the single-shot target detector obtained through training is used for pedestrian detection, and the pedestrian detection precision is improved on the basis of ensuring the pedestrian detection speed of the single-shot target detector.

Example two

Fig. 2 is a flowchart of a pedestrian detection method according to a second embodiment of the present invention. The present embodiment embodies the training of a single target detector on the basis of the above embodiment. As shown in fig. 2, the method includes:

s210, acquiring a sample image, a pedestrian frame labeling result corresponding to the sample image and a pedestrian head labeling result corresponding to the sample image.

In this embodiment, the sample image may be an image including a pedestrian, and preferably, may be an image including a blocked pedestrian. And manually labeling the sample image, labeling a pedestrian frame and a pedestrian head in the sample image, and obtaining a pedestrian frame labeling result corresponding to the sample image and a pedestrian head labeling result corresponding to the sample image.

S220, generating a training sample pair based on the sample image, the pedestrian frame labeling result corresponding to the sample image and the pedestrian head labeling result corresponding to the sample image, and training a pre-constructed original detection model by using the training sample pair to obtain a trained original detection model.

After the sample image is marked, a training sample pair is generated based on the sample image, a pedestrian frame marking result corresponding to the sample image and a pedestrian head marking result corresponding to the sample image, and a pre-built original detection model is trained by using the training sample pair to obtain a trained original detection model. Wherein, the pre-built original detection model can comprise a single target detector and at least one head detection network. The output of each original feature network layer in the single-shot object detector is connected with the input of the head detection network.

In one embodiment of the present invention, training a pre-constructed original detection model using a training sample to obtain a trained original detection model includes: inputting the sample image into an initially constructed single-time target detector to obtain an original feature image, a pedestrian frame detection result and a detection score corresponding to the pedestrian frame detection result which are output by the single-time target detector; sequencing the pedestrian frame detection results according to the detection scores, and acquiring a preset number of target pedestrian frame detection results according to the sequencing results; inputting the original feature map and the target pedestrian frame detection result into a head detection network to obtain a pedestrian head detection result output by the head detection network; determining a first loss value according to the pedestrian frame detection result and the pedestrian frame marking result, determining a second loss value according to the pedestrian head detection result and the pedestrian head marking result, and determining a target loss value according to the first loss value and the second loss value; and training the original detection model by taking the target loss value reaching the convergence condition as a target.

Specifically, training the pre-constructed original detection model using the training sample may be: inputting the sample image into an initially constructed single-time target detector, obtaining pedestrian frame detection results output by the single-time target detector and detection scores corresponding to the pedestrian frame detection results, sorting the pedestrian frame detection results in a reverse order according to the detection scores, and taking the pedestrian frame detection results with the preset number before sorting as target pedestrian frame detection results; and inputting the target pedestrian frame detection result and the original feature images output by the original feature network layers into a head detection network to obtain a pedestrian head detection result output by the head detection network. And then calculating a first loss value corresponding to the pedestrian frame detection result based on the set pedestrian frame loss function, the pedestrian frame detection result and the pedestrian frame marking result, calculating a second loss value corresponding to the pedestrian head detection result based on the set pedestrian head loss function, the pedestrian head detection result and the pedestrian head marking result, and calculating a target loss value according to the first loss value and the second loss value. And when the target loss value does not meet the convergence condition, adjusting parameters in the head detection network and parameters in the single target detector, and predicting the sample image again based on the adjusted parameters until the target loss value meets the convergence condition, so as to obtain a trained original detection model. The pedestrian frame loss function and the pedestrian head loss function may be the same or different.

Optionally, calculating the target loss value according to the first loss value and the second loss value may be: and taking the sum of the first loss value and the second loss value as a target loss value. The target loss value satisfying the convergence condition may be: the iteration times meet the set times or the difference between the target loss values of two adjacent times is smaller than the set threshold value.

Optionally, the head detection network may include an area extraction module and a head marking module, where the area extraction module is configured to extract a head feature map corresponding to the target pedestrian frame detection result from the original feature map, and the head marking module is configured to mark the head feature map to obtain the pedestrian head detection result.

On the basis of the scheme, the single target detector comprises a plurality of original feature network layers, an upsampling module is further arranged between the target feature network layer and the head detection network in the original detection model, and the method further comprises: selecting at least one original characteristic network layer as a target characteristic network layer according to the image size of the original characteristic image output by each original characteristic network layer; and adding an up-sampling module after the target characteristic network layer, and adding a head detection network after the up-sampling module.

Optionally, considering that the original feature map output by a part of the original feature network layer of the single target detector is smaller, the head detection feature extraction based on the original feature map can cause pedestrian information loss, and affects the pedestrian head detection result. In this embodiment, according to the image size of the original feature map, a target feature network layer is selected from the original feature network layers, an up-sampling module is added after the target original feature network layer, a head detection network is added after the up-sampling module, and head detection is performed by using the up-sampled original feature map, so that the pedestrian head detection result is improved.

Optionally, selecting at least one original feature network layer as the target feature network layer according to the image size of the original feature map output by each original feature network layer, including: and taking the original characteristic network layer corresponding to the original characteristic map with the image size smaller than the set threshold value as a target characteristic network layer. In one embodiment, an image size threshold may be preset, and an original feature network layer corresponding to an original feature map with an image size smaller than the image size threshold is used as the target feature network layer. In the present embodiment, the upsampling module is not limited as long as upsampling of the original feature map into an upsampled feature map having an image size not smaller than the set image size threshold can be achieved. The upsampling module may be, for example, a transpose convolution module.

In one embodiment of the present invention, before inputting the original feature map and the target pedestrian frame detection result into the head detection network to obtain the pedestrian head detection result output by the head detection network, the method further includes: acquiring a target original feature map output by a target feature network layer, and inputting the target original feature map into an up-sampling module to obtain an up-sampling feature map output by the up-sampling module; correspondingly, inputting the original feature diagram and the target pedestrian frame detection result into a head detection network to obtain a pedestrian head detection result output by the head detection network, wherein the pedestrian head detection result comprises: and inputting the original feature images, the up-sampling feature images and the target pedestrian frame detection results output by other original feature network layers except the target feature network layer in the original feature network layers into a head detection network to obtain pedestrian head detection results output by the head detection network.

After an up-sampling module is added between the target network feature layer and the head detection network, correspondingly, when the head of the pedestrian is predicted, the up-sampling module up-samples the target original feature image output by the target network feature layer to obtain an up-sampling feature image, and then takes the up-sampling feature image and the original feature images output by other original network feature layers except the target feature network layer in the original feature network layer as the input of the head detection network, and the head detection network is used for detecting the head of the pedestrian to obtain a pedestrian head detection result output by the head detection network. After the target original feature map is up-sampled, the target original feature map is detected through a head detection network, so that pedestrian information in the original feature map can be reserved, and the influence on pedestrian head detection caused by pedestrian information loss is avoided.

S230, taking the single-time target detector in the trained original detection model as the trained single-time target detector.

In this embodiment, after the trained original detection model is obtained, the single-shot target detector in the trained original detection model is used as the trained single-shot target detector, and the trained single-shot target detector is used for pedestrian detection. The detection process of the single target detector is only used during detection, so that the detection precision of the single target detector is improved on the basis of ensuring the detection speed of the single target detector.

S240, acquiring an image to be detected.

S250, inputting the image to be detected into a trained single-time target detector, and obtaining output information of the single-time target detector.

S260, determining a pedestrian detection result of the image to be detected according to the output information of the single target detector.

According to the embodiment of the invention, training of a single target detector is embodied, and a sample image, a pedestrian frame marking result corresponding to the sample image and a pedestrian head marking result corresponding to the sample image are obtained; generating a training sample pair based on the sample image, a pedestrian frame labeling result corresponding to the sample image and a pedestrian head labeling result corresponding to the sample image, and training a pre-constructed original detection model by using the training sample pair to obtain a trained original detection model; the single target detector in the trained original detection model is used as the trained single target detector, and the training accuracy of the single target detector is improved by adding the head characteristics of the pedestrian as the training characteristics, so that the pedestrian detection result accuracy of the single target detector is improved.

Example III

Fig. 3a is a flowchart of a pedestrian detection method according to a third embodiment of the present invention. This embodiment provides a preferred embodiment on the basis of the above-described embodiments. As shown in fig. 3a, the method comprises:

s310, constructing an original detection model to be trained based on the single target detector.

In the embodiment, on the basis of a single target detector, head prediction is added to obtain a built original detection model. The whole original detection model predicts the pedestrian frame and the mark of the pedestrian head in the training process, and the pedestrian detection accuracy is improved by utilizing the head detection task. The single-shot object detector may be a detector such as SSD, YOLO, retinaNet.

Fig. 3b is a network architecture schematic diagram of an original detection model according to a third embodiment of the present invention, and fig. 3b schematically illustrates an original detection network model based on an SSD network. As shown in fig. 3b, the original detection model includes an SSD network 310, an up-sampling module 320, and a header detection module 330.

Wherein, the SSD network 310 includes an image input layer, a base network layer, a feature network layer, and a detection layer, the feature network layer includes a feature layer 1, a feature layer 2, and a feature layer 3. The input to SSD network 310 is the picture to be detected, and the output is the detected pedestrian box and three feature maps. The input image is processed by a basic network layer and a feature network layer to obtain feature images with different scales, and the obtained feature images are processed by a detection layer to obtain a prediction result of the pedestrian frame, wherein the prediction result comprises pedestrian frame coordinates and the score of the pedestrian frame.

The upsampling module 320 includes a target feature network layer and an upsampling layer, where the target feature network layer is a feature network layer corresponding to a feature map with an image size smaller than a set image size threshold. The input of the up-sampling module 320 is a small-scale feature map in the SSD, and the output is the up-sampled feature map. The feature map with small scale (the image size is smaller than the set threshold value), namely the feature map output by the feature layer 3 in fig. 3b, is up-sampled into a large feature map through transpose convolution, so that information loss caused by extracting the head feature map from the original feature map is avoided when the original feature map is smaller. For example, assuming that the original feature map has a size of 8×8, the upsampled feature map obtained by transpose convolution has a size of 64×64.

The head detection module 330 includes, in part, ROIAlign and a marker layer. The inputs of the head detection module 330 are the feature map 1 output by the feature layer 1, the feature map 2 output by the feature layer 2, the pedestrian detection result, and the up-sampled feature map 3 output by the up-sampling layer in the SSD, and the outputs are the pedestrian head detection result. The method comprises the steps of firstly sorting pedestrian frames according to scores, then extracting the top 100 pedestrian frames, then carrying out head feature images of corresponding positions of the top 100 pedestrian frames through ROIAlign, uniformly outputting the head feature images to be the head feature images with set sizes (such as 28 x 28) through size adjustment and scaling, and finally carrying out pedestrian head detection on the head feature images.

S320, acquiring sample data, and training the original detection model based on the acquired sample data to obtain a trained original detection model.

Specifically, after sample data (sample image) is obtained, the sample data is marked to obtain a training sample pair, wherein the training sample pair data needs to contain the sample image, the mark of a pedestrian frame and the mark of a pedestrian head, the mark of the pedestrian frame is used as standard data for pedestrian detection, and the mark of the pedestrian head is used as standard data for pedestrian head mark. When the original detection model is trained, a sample image is input into the single target detector, a predicted result of a pedestrian frame output by the single target detector is obtained, and meanwhile, a pedestrian frame detection loss value is calculated. And then sequencing the pedestrian frames according to the scores of the pedestrian frames to obtain the first 100 pedestrian frame detection results and the feature map corresponding to the pedestrian frame detection results. And then amplifying the small-scale feature map by using an upsampling module to obtain an upsampled feature map. Extracting head characteristic diagrams corresponding to the first 100 pedestrian frames from the original characteristic diagrams and the up-sampling characteristic diagrams with the dimensions meeting the set size requirements through the ROIAlign, marking the extracted head characteristic diagrams to obtain pedestrian head detection results, and comparing the pedestrian head detection results with pedestrian head marks to obtain pedestrian head detection loss values. And then adding the pedestrian frame detection loss value and the pedestrian head detection loss value to obtain an overall loss, converging the overall loss as a target, and training an original detection model to obtain a trained original detection model. By introducing head prediction into the single target detector, the pedestrian detection accuracy in the occlusion scene is improved.

S330, taking the single target detector in the trained original detection model as a single target detector to be tested, and testing the single target detector to be tested.

After the trained original detection model is obtained, extracting a single target detector in the trained original detection model, taking the single target detector as a single target detector to be tested, and testing the single target detector to be tested by using test data to obtain a test result.

And S340, after the single target detector to be tested passes the test, using the single target detector passing the test to detect the pedestrians.

When the single target detector to be tested passes the test, the single target detector passing the test can be directly used for pedestrian detection. And when the test of the single target detector to be tested fails, acquiring training data again to train the original detection model until the test of the single target detector in the trained original detection model passes.

According to the embodiment of the invention, the pedestrian head mark detection branch is added to the single target detector, and the up-sampling module is added at the same time, so that the small-scale feature map is increased, the feature map can also participate in the detection of the pedestrian head mark, and the pedestrian detection accuracy under the shielding scene based on the single target detector is improved.

Example IV

Fig. 4 is a schematic structural diagram of a pedestrian detection device according to a fourth embodiment of the present invention. The pedestrian detection arrangement may be implemented in software and/or hardware, for example the pedestrian detection arrangement may be configured in a computer device. As shown in fig. 4, the apparatus includes an image to be detected acquisition module 410, an image pedestrian detection module 420, and a detection result determination module 430, wherein:

the image to be detected acquisition module 410 is configured to acquire an image to be detected;

the image pedestrian detection module 420 is configured to input the image to be detected into a trained single-time target detector, and obtain output information of the single-time target detector, where the single-time target detector is obtained by training an original detection model including an initially constructed single-time target detector and a head detection network in advance;

the detection result determining module 430 is configured to determine a pedestrian detection result of the image to be detected according to the output information of the single target detector.

According to the embodiment of the invention, the image to be detected is acquired through the image acquisition module to be detected; the image pedestrian detection module inputs the image to be detected into a trained single target detector, and obtains output information of the single target detector; the detection result determining module determines a pedestrian detection result of the image to be detected according to the output information of the single target detector; the single-time target detector is obtained by training an original detection model comprising an initially constructed single-time target detector and a head detection network in advance. By using an original detection model which is formed by pre-training an initial single-shot target detector and a head detection network, the single-shot target detector obtained through training is used for pedestrian detection, and the pedestrian detection precision is improved on the basis of ensuring the pedestrian detection speed of the single-shot target detector.

Optionally, on the basis of the above solution, the apparatus further includes a single target detector determining module configured to:

acquiring a sample image, a pedestrian frame labeling result corresponding to the sample image and a pedestrian head labeling result corresponding to the sample image;

generating a training sample pair based on the sample image, a pedestrian frame labeling result corresponding to the sample image and a pedestrian head labeling result corresponding to the sample image, and training a pre-constructed original detection model by using the training sample pair to obtain a trained original detection model;

and taking the single-time target detector in the trained original detection model as the trained single-time target detector.

Optionally, on the basis of the above solution, the single-shot object detector determining module includes:

the pedestrian frame detection unit is used for inputting the sample image into an initially constructed single-time target detector to obtain an original feature map output by the single-time target detector, a pedestrian frame detection result and a detection score corresponding to the pedestrian frame detection result;

the target pedestrian frame determining unit is used for sequencing the pedestrian frame detection results according to the detection scores and acquiring a preset number of target pedestrian frame detection results according to the sequencing results;

The pedestrian head detection unit is used for inputting the original feature map and the target pedestrian frame detection result into the head detection network to obtain a pedestrian head detection result output by the head detection network;

the loss value determining unit is used for determining a first loss value according to the pedestrian frame detection result and the pedestrian frame marking result, determining a second loss value according to the pedestrian head detection result and the pedestrian head marking result, and determining a target loss value according to the first loss value and the second loss value;

and the original detection model training unit is used for training the original detection model by taking the target loss value as a target for reaching a convergence condition.

Optionally, on the basis of the above solution, the single target detector includes a plurality of original feature network layers, an upsampling module is further included between the target feature network layer and the head detection network in the original detection model, and the apparatus further includes an original detection model building module, configured to:

selecting at least one original characteristic network layer as a target characteristic network layer according to the image size of the original characteristic image output by each original characteristic network layer;

And adding an up-sampling module after the target characteristic network layer, and adding the head detection network after the up-sampling module.

Optionally, on the basis of the above solution, the original detection model building module is specifically configured to:

and taking an original characteristic network layer corresponding to the original characteristic diagram with the image size smaller than a set threshold value as the target characteristic network layer.

Optionally, on the basis of the above solution, the single-shot object detector determining module further includes an upsampling unit configured to:

acquiring a target original feature map output by the target feature network layer, and inputting the target original feature map into the up-sampling module to obtain an up-sampling feature map output by the up-sampling module;

correspondingly, the pedestrian head detection unit is specifically configured to:

and inputting the original feature images output by other original feature network layers except the target feature network layer in the original feature network layer, the up-sampling feature image and the target pedestrian frame detection result into the head detection network to obtain a pedestrian head detection result output by the head detection network.

Optionally, on the basis of the above scheme, the upsampling module is a transpose convolution module.

The pedestrian detection device provided by the embodiment of the invention can execute the pedestrian detection method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example five

Fig. 5 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention. Fig. 5 illustrates a block diagram of an exemplary computer device 512 suitable for use in implementing embodiments of the present invention. The computer device 512 shown in fig. 5 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in FIG. 5, computer device 512 is in the form of a general purpose computing device. Components of computer device 512 may include, but are not limited to: one or more processors 516, a system memory 528, a bus 518 that connects the various system components (including the system memory 528 and the processor 516).

Bus 518 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor 516, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 512 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 512 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 528 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 530 and/or cache memory 532. The computer device 512 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage 534 may be used to read from or write to a non-removable, non-volatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive"). Although not shown in fig. 5, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 518 through one or more data media interfaces. Memory 528 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.

A program/utility 540 having a set (at least one) of program modules 542 may be stored in, for example, memory 528, such program modules 542 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 542 generally perform the functions and/or methods in the described embodiments of the invention.

The computer device 512 may also communicate with one or more external devices 514 (e.g., keyboard, pointing device, display 524, etc.), one or more devices that enable a user to interact with the computer device 512, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 512 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 522. Also, the computer device 512 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through a network adapter 520. As shown, network adapter 520 communicates with other modules of computer device 512 via bus 518. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computer device 512, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processor 516 executes various functional applications and data processing by running programs stored in the system memory 528, for example, to implement a pedestrian detection method provided by an embodiment of the present invention, the method including:

acquiring an image to be detected;

Of course, those skilled in the art will understand that the processor may also implement the technical solution of the pedestrian detection method provided in any embodiment of the present invention.

Example six

The sixth embodiment of the present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the pedestrian detection method provided by the embodiment of the present invention, the method comprising:

acquiring an image to be detected;

Of course, the computer-readable storage medium provided by the embodiments of the present invention, on which the computer program stored, is not limited to the method operations described above, but may also perform the related operations of the pedestrian detection method provided by any of the embodiments of the present invention.

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A pedestrian detection method, characterized by comprising:

acquiring an image to be detected;

the single-time target detector is obtained by training an original detection model comprising an initially constructed single-time target detector and a head detection network in advance;

Wherein before inputting the image to be detected into the trained single-shot target detector, the method further comprises:

2. The method of claim 1, wherein training the pre-constructed raw detection model using training samples to obtain a trained raw detection model comprises:

inputting the sample image into an initially constructed single-time target detector, and obtaining an original feature image output by the single-time target detector, a pedestrian frame detection result and a detection score corresponding to the pedestrian frame detection result;

sequencing the pedestrian frame detection results according to the detection scores, and acquiring target pedestrian frame detection results with preset quantity according to the sequencing results;

Inputting the original feature map and the target pedestrian frame detection result into the head detection network to obtain a pedestrian head detection result output by the head detection network;

determining a first loss value according to the pedestrian frame detection result and the pedestrian frame marking result, determining a second loss value according to the pedestrian head detection result and the pedestrian head marking result, and determining a target loss value according to the first loss value and the second loss value;

and training the original detection model by taking the target loss value reaching a convergence condition as a target.

3. The method of claim 2, wherein the single-shot object detector includes a plurality of original feature network layers therein, wherein an upsampling module is further included between the object feature network layers in the original detection model and the head detection network, the method further comprising:

4. A method according to claim 3, wherein selecting at least one of the original feature network layers as the target feature network layer according to the image size of the original feature map output by each of the original feature network layers comprises:

5. The method according to claim 4, further comprising, before inputting the original feature map and the target pedestrian frame detection result into the head detection network, obtaining a pedestrian head detection result output by the head detection network:

correspondingly, the step of inputting the original feature map and the target pedestrian frame detection result into the head detection network to obtain a pedestrian head detection result output by the head detection network includes:

6. A method according to claim 3, wherein the upsampling module is a transpose convolution module.

7. A pedestrian detection apparatus characterized by comprising:

the detection result determining module is used for determining a pedestrian detection result of the image to be detected according to the output information of the single target detector;

the apparatus further comprises a single-shot object detector determination module for:

before inputting the image to be detected into a trained single target detector, acquiring a sample image, a pedestrian frame marking result corresponding to the sample image and a pedestrian head marking result corresponding to the sample image;

8. A computer device, the device comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the pedestrian detection method of any one of claims 1-6.

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the pedestrian detection method as claimed in any one of claims 1 to 6.