CN116363701A

CN116363701A - Pedestrian detection method, device, equipment and storage medium

Info

Publication number: CN116363701A
Application number: CN202310342106.9A
Authority: CN
Inventors: 刘伟华; 左勇; 肖恒玉; 林超超; 罗艳; 周敏
Original assignee: Athena Eyes Co Ltd
Current assignee: Athena Eyes Co Ltd
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-06-30

Abstract

The application discloses a pedestrian detection method, a device, equipment and a storage medium, which relate to the field of computer vision and comprise the following steps: acquiring an original pedestrian image, and performing feature detection on the original pedestrian image by utilizing a backbone network in a preset pedestrian detection network to obtain an initial feature image; determining a corresponding density characteristic image by using a crowd density estimation network in a preset pedestrian detection network and an initial characteristic image, and determining a corresponding pedestrian characteristic image by using a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic image so as to obtain the initial detection image based on the pedestrian characteristic image; and eliminating the labeling frames meeting the preset false detection conditions in the initial detection image by using a preset false detection reduction module so as to obtain a target detection result image. Therefore, the efficiency and performance of pedestrian detection can be improved, the robustness of the network is enhanced, and the false detection rate of pedestrian detection is reduced.

Description

Pedestrian detection method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer vision, and in particular, to a pedestrian detection method, device, apparatus, and storage medium.

Background

Pedestrian detection is a very important task in the field of computer vision, and is widely applied to the fields of automatic driving, smart cities and the like. When crowds are crowded in a large quantity, the statistics of the number of detected pedestrians in the crowded scene can reflect the density level of the crowds, at the moment, the situation can be controlled timely through proper management, and safety warning is provided, so that dangerous events such as driving, treading and the like are avoided.

However, the existing pedestrian detection technology for crowded scenes often only utilizes a pedestrian frame, discards other valuable pedestrian attributes, which makes their performances non-ideal, has poor robustness, and often causes false detection problems in the pedestrian detection network due to shielding between pedestrians and distance relation between pedestrians and cameras. Therefore, how to improve the detection performance of pedestrian detection and reduce the false detection rate is a current urgent problem to be solved.

Disclosure of Invention

Accordingly, the present invention is directed to a pedestrian detection method, apparatus, device, and storage medium, which can improve the efficiency and performance of pedestrian detection, enhance the robustness of the network, and reduce the false detection rate of pedestrian detection, thereby improving the pedestrian detection effect. The specific scheme is as follows:

in a first aspect, the present application provides a pedestrian detection method, including:

acquiring an original pedestrian image, and performing feature detection on the original pedestrian image by utilizing a backbone network in a preset pedestrian detection network to obtain an initial feature image;

determining a corresponding density characteristic image by using a crowd density estimation network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by using a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic image so as to obtain an initial detection image based on the pedestrian characteristic image;

and eliminating the labeling frame meeting the preset false detection condition in the initial detection image by utilizing a preset false detection reduction module so as to obtain a target detection result image.

Optionally, the feature detection of the original pedestrian image by using a backbone network in a preset pedestrian detection network to obtain an initial feature image includes:

and performing feature detection on the original pedestrian image by using a backbone network and a feature pyramid network in a preset pedestrian detection network, and processing the detected feature data by using a bilinear interpolation method to obtain an initial feature image.

Optionally, the point annotation data of the crowd density estimation network is data determined by using the midpoint of the upper boundary of the pedestrian detection annotation frame.

Optionally, the determining the corresponding density feature image by using the crowd density estimation network in the preset pedestrian detection network and the initial feature image includes:

performing preset density characteristic processing operation on the initial characteristic image by using a crowd density estimation network in the preset pedestrian detection network so as to determine a corresponding density characteristic image; the preset density characteristic processing operation comprises a dimension reduction operation and an up-sampling operation based on a bilinear interpolation method.

Optionally, the determining, by using the pedestrian feature detection network in the preset pedestrian detection network and the density feature image, a corresponding pedestrian feature image includes:

and determining corresponding pedestrian characteristic images based on a preset characteristic fusion strategy and by utilizing a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic images.

Optionally, the removing, by using a preset error detection reduction module, the labeling frame meeting a preset error detection condition in the initial detection image includes:

determining the false detection score of each marking frame in the initial detection image by using a preset false detection lowering module;

judging whether the false detection score is larger than a preset false detection score threshold value or not;

if the detection result is larger than the preset false detection condition, representing that the corresponding annotation frame meets the preset false detection condition, and eliminating the annotation frame meeting the preset false detection condition in the initial detection image.

Optionally, the determining, by using a preset false drop detection module, the false drop score of each label frame in the initial detection image includes:

determining a first false detection value of each marking frame in the initial detection image by using a preset false detection lowering module;

determining false detection scores of the marking frames according to the first false detection values and the second false detection values; the second false detection value is a value determined based on a detection result output by the pedestrian feature detection network.

In a second aspect, the present application provides a pedestrian detection apparatus comprising:

the system comprises an initial characteristic image acquisition module, a characteristic detection module and a characteristic detection module, wherein the initial characteristic image acquisition module is used for acquiring an original pedestrian image and carrying out characteristic detection on the original pedestrian image by utilizing a backbone network in a preset pedestrian detection network so as to obtain an initial characteristic image;

the initial detection image determining module is used for determining a corresponding density characteristic image by utilizing a crowd density estimating network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by utilizing a pedestrian characteristic detecting network in the preset pedestrian detection network and the density characteristic image so as to obtain an initial detection image based on the pedestrian characteristic image;

the false detection processing module is used for eliminating the marking frames meeting the preset false detection conditions in the initial detection image by utilizing the preset false detection reduction module so as to obtain a target detection result image.

In a third aspect, the present application provides an electronic device, including:

a memory for storing a computer program;

and a processor for executing the computer program to implement the pedestrian detection method described above.

In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which when executed by a processor implements the aforementioned pedestrian detection method.

In the method, an original pedestrian image is obtained, and feature detection is carried out on the original pedestrian image by utilizing a backbone network in a preset pedestrian detection network so as to obtain an initial feature image; determining a corresponding density characteristic image by using a crowd density estimation network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by using a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic image so as to obtain an initial detection image based on the pedestrian characteristic image; and eliminating the labeling frame meeting the preset false detection condition in the initial detection image by utilizing a preset false detection reduction module so as to obtain a target detection result image. Through the scheme, the corresponding density characteristic image can be determined based on the original pedestrian image, the initial detection image is determined by utilizing the density characteristic image, and then the annotation frame meeting the preset false detection condition in the initial detection image is removed by utilizing the preset false detection lowering module. In this way, the density characteristic image and the pedestrian characteristic detection network are utilized to determine the initial detection image, so that the attention degree of the preset pedestrian detection network to the crowd head information can be improved, the omission rate of pedestrian detection in a crowded scene is reduced, the performance of the preset pedestrian detection network is improved, the robustness of the network is enhanced, the marking frames meeting the preset false detection condition are removed by utilizing the preset false detection reduction module, the number of false detection marking frames is reduced, the false detection rate of pedestrian detection is reduced, and the pedestrian detection effect is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a pedestrian detection method provided in the present application;

FIG. 2 is a schematic diagram of a frame design of a pedestrian detection system provided herein;

fig. 3 is a schematic structural diagram of a pedestrian detection device provided in the present application;

fig. 4 is a block diagram of an electronic device provided in the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

At present, the pedestrian detection technology for crowded scenes is not ideal in performance and poor in robustness, and due to shielding among pedestrians and the distance relation between pedestrians and cameras, false detection problems often occur in a pedestrian detection network. Therefore, the application discloses a pedestrian detection method, which can improve the efficiency and performance of pedestrian detection, enhance the robustness of a network, and reduce the false detection rate of pedestrian detection, thereby improving the effect of pedestrian detection.

Referring to fig. 1, an embodiment of the present invention discloses a pedestrian detection method, including:

and S11, acquiring an original pedestrian image, and performing feature detection on the original pedestrian image by using a backbone network in a preset pedestrian detection network to obtain an initial feature image.

In this embodiment, it should be noted that, the feature detection of the original pedestrian image by using the backbone network in the preset pedestrian detection network to obtain an initial feature image may specifically include: and performing feature detection on the original pedestrian image by using an HRNetV2 backbone network and a feature pyramid network in a preset pedestrian detection network, and processing the detected feature data by using a bilinear interpolation method to obtain an initial feature image.

And step S12, determining a corresponding density characteristic image by using the crowd density estimation network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by using the pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic image so as to obtain the initial detection image based on the pedestrian characteristic image.

In this embodiment, it may be understood that the determining, by using the crowd density estimation network in the preset pedestrian detection network and the initial feature image, a corresponding density feature image may specifically include: performing preset density characteristic processing operation on the initial characteristic image by using a crowd density estimation network in the preset pedestrian detection network so as to determine a corresponding density characteristic image; the preset density characteristic processing operation comprises a dimension reduction operation and an up-sampling operation based on a bilinear interpolation method.

It should be noted that the determining, by using the pedestrian feature detection network in the preset pedestrian detection network and the density feature image, a corresponding pedestrian feature image may specifically include: and determining corresponding pedestrian characteristic images based on a preset characteristic fusion strategy and by utilizing a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic images. In this way, based on the preset feature fusion strategy and by utilizing the pedestrian feature detection network and the density feature image in the preset pedestrian detection network, the corresponding pedestrian feature image is determined, so that the performance of the pedestrian detection network can be improved. In addition, in the training stage, the annotation required by the crowd density estimation network can be determined by utilizing the midpoint of the upper boundary of the pedestrian detection annotation frame, so that manual annotation is avoided, and the labor cost is reduced.

It should be noted that in the inference phase, the present application may not use the people group density estimation network, but rather preserve the pedestrian feature detection network. In this way, the present application can maintain consistent inference speeds and avoid creating additional overhead compared to baseline pedestrian detectors.

And S13, eliminating the labeling frames meeting the preset false detection conditions in the initial detection image by utilizing a preset false detection reduction module so as to obtain a target detection result image.

In this embodiment, it may be understood that the removing, by using the preset error detection reduction module, the labeling frame that satisfies the preset error detection condition in the initial detection image may specifically include: determining the false detection score of each marking frame in the initial detection image by using a preset false detection lowering module; judging whether the false detection score is larger than a preset false detection score threshold value or not; if the detection result is larger than the preset false detection condition, representing that the corresponding annotation frame meets the preset false detection condition, and eliminating the annotation frame meeting the preset false detection condition in the initial detection image.

It may be appreciated that the determining, by using the preset drop false detection module, the false detection score of each label frame in the initial detection image may specifically include: determining a first false detection value of each marking frame in the initial detection image by using a preset false detection lowering module; determining false detection scores of the marking frames according to the first false detection values and the second false detection values; the second false detection value is a value determined based on a detection result output by the pedestrian feature detection network. That is, the false detection score of each labeling frame in the initial detection image is determined by combining the first false detection value determined by the preset false detection lowering module and the second false detection value determined based on the detection result of the preset pedestrian feature detection network, so as to reject the labeling frames meeting the preset false detection condition in the initial detection image.

In the embodiment, an original pedestrian image is obtained, and feature detection is performed on the original pedestrian image by using a backbone network in a preset pedestrian detection network to obtain an initial feature image; determining a corresponding density characteristic image by using a crowd density estimation network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by using a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic image so as to obtain an initial detection image based on the pedestrian characteristic image; and eliminating the labeling frame meeting the preset false detection condition in the initial detection image by utilizing a preset false detection reduction module so as to obtain a target detection result image. Through the scheme, the corresponding density characteristic image can be determined based on the original pedestrian image, the initial detection image is determined by utilizing the density characteristic image, and then the annotation frame meeting the preset false detection condition in the initial detection image is removed by utilizing the preset false detection lowering module. In this way, the density characteristic image and the pedestrian characteristic detection network are utilized to determine the initial detection image, so that the attention degree of the preset pedestrian detection network to the crowd head information can be improved, the omission rate of pedestrian detection in a crowded scene is reduced, the performance of the preset pedestrian detection network is improved, the robustness of the network is enhanced, the marking frames meeting the preset false detection condition are removed by utilizing the preset false detection reduction module, the number of false detection marking frames is reduced, the false detection rate of pedestrian detection is reduced, and the pedestrian detection effect is improved.

A specific embodiment of a pedestrian detection method disclosed in the present application will be described below with reference to a schematic frame design of a pedestrian detection system disclosed in fig. 2.

As shown in fig. 2, after an original pedestrian image including a plurality of pedestrians is acquired, in order to accurately predict the positions and sizes of the pedestrians, high-resolution features including semantics and position information are required. In consideration of keeping the network model lightweight, feature subgraphs can be acquired from different stages of the preset pedestrian detection network, the acquired feature subgraphs are processed by using a bilinear interpolation method, and a convolution module is applied, so that a single-scale feature map is obtained. In this way, memory costs may be reduced and learnable parameters provided for convolution operations.

It should be noted that the preset pedestrian detection network of the present application includes a pedestrian feature detection network and a crowd density estimation network. After the initial feature image is acquired, the crowd density of the original pedestrian image can be analyzed by using the crowd density estimation network. In this embodiment, the initial feature image includes four initial feature subgraphs, and the crowd density estimation network in the preset pedestrian detection network determines the density feature subgraphs of the current stage based on the initial feature subgraphs of a plurality of stages, so as to obtain the density feature image. For example, the output of the density profile sub-graph of the fourth stage is defined as:

φ _density ＝f ₂ (φ ₂ +f ₃ (φ ₃ +f ₄ (φ ₄ )))

wherein phi is ₂ 、φ ₃ 、φ ₄ Initial feature subgraphs of the 2 nd stage, the 3 rd stage and the 4 th stage are respectively represented, and f represents the down-and up-sampling operation of the corresponding stages based on bilinear interpolation. Specifically, with the original pedestrian image as an input, the preset pedestrian detection network may generate a plurality of initial feature subgraphs with different resolutions and dimensions to obtain an initial feature image. The embodiment can use a 1×1 convolution layer to reduce the dimension of the initial feature subgraph of the 4 th stage from 1024 to 256, then perform upsampling operation based on bilinear interpolation, and perform pixel addition on the shallower stage, and so on until the resolution of the initial feature subgraph is 1/4 of the original pedestrian image, if at pixel x _i There is a point which can be expressed as a delta function delta (x-x _i ) An image with N marker points can then be represented as:

considering that H (x) is a discrete function, converting to a continuous density function more readily convolves the approximation of the neural network. Thus H (x) can be combined with Gaussian kernel G _σ (x) And (5) performing convolution. Thus, group score of crowd density is defined as follows:

F(x)＝H(x)*G _σ (x)

and correspondingly, the pedestrian feature detection network in the preset pedestrian detection network determines the pedestrian feature image according to the initial feature image and the corresponding density feature image, and then determines the initial detection image comprising a plurality of marking frames based on the obtained pedestrian feature image. The pedestrian characteristic detection network comprises a 3x3 convolution layer and two prediction layers, wherein one prediction layer is used for the central position, and the other prediction layer is used for the corresponding scale. Based on the idea of center and scale map prediction, the predefined size of the detected bounding boxes can be eliminated, and the network architecture and bounding box size can be fine-tuned using different loss settings to better converge and accurately locate, while the architecture can also be fine-tuned using different losses to better converge and accurately locate pedestrian positions. The center loss of the pedestrian feature detection network can be expressed as:

wherein K is the number of marked frames, p _ij And y _ij The predicted central probability and the true label, respectively. CE (p) _ij ，y _ij ) Representing cross entropy loss, alpha _ij Weights (i, j) for each location. M is M _ij Representing a Gaussian-based penalty for pixels around the center, p due to the difficulty in specifying an exact center _ij ^γ And (1-p) _ij ) ^γ The focus weight based on the prediction confidence can be used to reduce the contribution of simple samples to the penalty and help the optimizer focus on difficult samples. (1-M) _ij ) ^β The loss of false detection frames closer to the true center is reduced, according to the dataThe processing determines a pedestrian characteristic subgraph of the corresponding stage.

In this embodiment, the pedestrian feature image output by the pedestrian feature detection network includes a plurality of stage pedestrian feature subgraphs, and correspondingly, the density feature image output by the crowd density estimation network includes a plurality of stage density feature subgraphs, where the pedestrian feature subgraphs and the density feature subgraphs in the corresponding stages can be fused based on a preset feature fusion policy, and the fused features can be used for pedestrian detection so as to obtain a more accurate detection result. The preset feature fusion strategy can be expressed as:

wherein D is _i And P _i Respectively representing characteristic subgraphs output by a crowd density estimation network and a pedestrian characteristic detection network in an ith stage, f _i ^θ For 3x3 convolution for transforming crowd density estimation characteristics with pedestrian detection functions, representing tandem operation of different functions, f _i ^ω Is a 1 x 1 convolution operator.

Since the pedestrian feature detection network uses the focus loss as the center loss, false positives (pedestrians identified as false pedestrians) near the positive center are not sufficiently penalized. Although most of these false positives are inhibited by Non-maximal inhibition (Non-Maximum Suppression, NMS), another inhibition step is required to inhibit the remaining positives, i.e., ioU (Intersection over Union) of positive prediction is below 0.5, thereby reducing the false detection rate. Therefore, in this embodiment, the preset error-detecting module may be composed of the ROI alignment layer, the convolution layer and the dense layer, and the preset error-detecting module is trained under the separate setting, so that the gradient from the preset error-detecting module avoids flowing back to the feature map or the detection head, thereby realizing a simple, light and effective error-detecting module. After the initial detection image is determined, the false positive in the initial detection image is processed by the preset false-positive-reduction detection module, namely, the labeling frame meeting the preset false-positive detection condition in the initial detection image is removed, so that the detection result is further refined, and a target detection result image is obtained.

In this way, the embodiment can estimate the head distribution information contained in the network-output density characteristic image based on the crowd density in the pedestrian detection process, efficiently count the heads under the condition of dense scenes, and strengthen the recognition capability of the preset pedestrian detection network on the large-scale pedestrian information and the small-scale pedestrian information under the conditions of disordered background and large pedestrian scale difference in the original pedestrian image, thereby improving the crowd counting accuracy. In this embodiment, pedestrian detection in a real scene (such as a crowded scene and a small-scale pedestrian scene) can be assisted by using crowd density attributes, so that the network can pay more attention to the head of a person and the small-scale pedestrian, thereby improving the characteristic characterization and detection capability of the pedestrian under the condition of severely shielding or keeping away from a camera, remarkably improving the performance of the pedestrian detection method, and respectively avoiding additional annotation and calculation burden in training and reasoning stages. The density characteristic image output by the crowd density estimation network can be used as noise of the pedestrian characteristic detection network, so that the shallow pedestrian detection network is forced to pay more attention to the upper boundary of the detection frame, a large number of error detection frames in the upper boundary are reduced, and the robustness of the network is enhanced. Meanwhile, after the initial detection image is obtained, the preset false detection module is utilized to process, so that the detection capability of a preset pedestrian detection network on pedestrians is improved, the false detection incidence rate is reduced, the number of true positives is increased, and the number of false detection marking frames is reduced. The pedestrian feature detection network and the crowd density estimation network share the same backbone network, information features can be effectively extracted, the two networks are mutually coordinated, and a better pedestrian feature detection network can bring a better shallow feature representation, so that the performance of the crowd density estimation network is improved, and meanwhile, the feature fusion also promotes the improvement of the performance of the pedestrian feature detection network.

Referring to fig. 3, the present application discloses a pedestrian detection apparatus including:

the initial feature image acquisition module 11 is configured to acquire an original pedestrian image, and perform feature detection on the original pedestrian image by using a backbone network in a preset pedestrian detection network to obtain an initial feature image;

an initial detection image determining module 12, configured to determine a corresponding density feature image using the crowd density estimating network in the preset pedestrian detection network and the initial feature image, and determine a corresponding pedestrian feature image using the pedestrian feature detecting network in the preset pedestrian detection network and the density feature image, so as to obtain an initial detection image based on the pedestrian feature image;

and the false detection processing module 13 is used for eliminating the labeling frames meeting the preset false detection conditions in the initial detection image by utilizing the preset false detection reduction module so as to obtain a target detection result image.

In some specific embodiments, the initial feature image acquiring module 11 may specifically include:

the initial characteristic image acquisition unit is used for carrying out characteristic detection on the original pedestrian image by utilizing a backbone network and a characteristic pyramid network in a preset pedestrian detection network, and processing the detected characteristic data by utilizing a bilinear interpolation method so as to obtain an initial characteristic image.

In some specific embodiments, the initial detection image determining module 12 may specifically include:

the density characteristic image determining unit is used for performing preset density characteristic processing operation on the initial characteristic image by utilizing the crowd density estimating network in the preset pedestrian detection network so as to determine a corresponding density characteristic image; the preset density characteristic processing operation comprises a dimension reduction operation and an up-sampling operation based on a bilinear interpolation method.

the pedestrian characteristic image determining unit is used for determining corresponding pedestrian characteristic images based on a preset characteristic fusion strategy and by utilizing the pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic images.

In some specific embodiments, the false detection processing module 13 may specifically include:

the false detection score determining sub-module is used for determining the false detection score of each marking frame in the initial detection image by utilizing the preset false detection lowering module;

the score threshold judging unit is used for judging whether the false detection score is larger than a preset false detection score threshold or not;

and the marking frame eliminating unit is used for characterizing that the corresponding marking frame meets the preset false detection condition if the marking frame is larger than the preset false detection condition so as to eliminate the marking frame meeting the preset false detection condition in the initial detection image.

In some specific embodiments, the false detection score determination submodule may specifically include:

the false detection value determining unit is used for determining a first false detection value of each marking frame in the initial detection image by utilizing a preset descent false detection module;

the false detection score determining unit is used for determining the false detection score of the marking frame according to the first false detection value and the second false detection value; the second false detection value is a value determined based on a detection result output by the pedestrian feature detection network.

Further, the embodiment of the present application further discloses an electronic device, and fig. 4 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the figure is not to be considered as any limitation on the scope of use of the present application.

Fig. 4 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is configured to store a computer program that is loaded and executed by the processor 21 to implement the relevant steps in the pedestrian detection method disclosed in any one of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.

In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.

The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.

The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the pedestrian detection method performed by the electronic device 20 disclosed in any of the foregoing embodiments.

Further, the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the pedestrian detection method disclosed previously. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing has outlined the detailed description of the preferred embodiment of the present application, and the detailed description of the principles and embodiments of the present application has been provided herein by way of example only to facilitate the understanding of the method and core concepts of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A pedestrian detection method, characterized by comprising:

2. The pedestrian detection method according to claim 1, wherein the feature detection of the original pedestrian image by using a backbone network in a preset pedestrian detection network to obtain an initial feature image includes:

3. The pedestrian detection method according to claim 1, wherein the point annotation data of the crowd density estimation network is data determined by using a midpoint of an upper boundary of a pedestrian detection annotation frame.

4. The pedestrian detection method according to claim 1, wherein the determining a corresponding density feature image using the initial feature image and a crowd density estimation network in the preset pedestrian detection network includes:

5. The pedestrian detection method according to claim 1, wherein the determining a corresponding pedestrian feature image using the pedestrian feature detection network of the preset pedestrian detection networks and the density feature image includes:

6. The pedestrian detection method according to any one of claims 1 to 5, wherein the removing, by using a preset false drop detection module, the labeling frame satisfying a preset false drop detection condition in the initial detection image includes:

7. The pedestrian detection method of claim 6, wherein the determining, with the preset drop false detection module, a false detection score for each label box in the initial detection image comprises:

8. A pedestrian detection apparatus characterized by comprising:

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the pedestrian detection method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program which, when executed by a processor, implements the pedestrian detection method according to any one of claims 1 to 7.