CN116363701A - Pedestrian detection method, device, equipment and storage medium - Google Patents

Pedestrian detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN116363701A
CN116363701A CN202310342106.9A CN202310342106A CN116363701A CN 116363701 A CN116363701 A CN 116363701A CN 202310342106 A CN202310342106 A CN 202310342106A CN 116363701 A CN116363701 A CN 116363701A
Authority
CN
China
Prior art keywords
detection
pedestrian
image
preset
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310342106.9A
Other languages
Chinese (zh)
Inventor
刘伟华
左勇
肖恒玉
林超超
罗艳
周敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Athena Eyes Co Ltd
Original Assignee
Athena Eyes Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Athena Eyes Co Ltd filed Critical Athena Eyes Co Ltd
Priority to CN202310342106.9A priority Critical patent/CN116363701A/en
Publication of CN116363701A publication Critical patent/CN116363701A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a pedestrian detection method, a device, equipment and a storage medium, which relate to the field of computer vision and comprise the following steps: acquiring an original pedestrian image, and performing feature detection on the original pedestrian image by utilizing a backbone network in a preset pedestrian detection network to obtain an initial feature image; determining a corresponding density characteristic image by using a crowd density estimation network in a preset pedestrian detection network and an initial characteristic image, and determining a corresponding pedestrian characteristic image by using a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic image so as to obtain the initial detection image based on the pedestrian characteristic image; and eliminating the labeling frames meeting the preset false detection conditions in the initial detection image by using a preset false detection reduction module so as to obtain a target detection result image. Therefore, the efficiency and performance of pedestrian detection can be improved, the robustness of the network is enhanced, and the false detection rate of pedestrian detection is reduced.

Description

Pedestrian detection method, device, equipment and storage medium
Technical Field
The present invention relates to the field of computer vision, and in particular, to a pedestrian detection method, device, apparatus, and storage medium.
Background
Pedestrian detection is a very important task in the field of computer vision, and is widely applied to the fields of automatic driving, smart cities and the like. When crowds are crowded in a large quantity, the statistics of the number of detected pedestrians in the crowded scene can reflect the density level of the crowds, at the moment, the situation can be controlled timely through proper management, and safety warning is provided, so that dangerous events such as driving, treading and the like are avoided.
However, the existing pedestrian detection technology for crowded scenes often only utilizes a pedestrian frame, discards other valuable pedestrian attributes, which makes their performances non-ideal, has poor robustness, and often causes false detection problems in the pedestrian detection network due to shielding between pedestrians and distance relation between pedestrians and cameras. Therefore, how to improve the detection performance of pedestrian detection and reduce the false detection rate is a current urgent problem to be solved.
Disclosure of Invention
Accordingly, the present invention is directed to a pedestrian detection method, apparatus, device, and storage medium, which can improve the efficiency and performance of pedestrian detection, enhance the robustness of the network, and reduce the false detection rate of pedestrian detection, thereby improving the pedestrian detection effect. The specific scheme is as follows:
in a first aspect, the present application provides a pedestrian detection method, including:
acquiring an original pedestrian image, and performing feature detection on the original pedestrian image by utilizing a backbone network in a preset pedestrian detection network to obtain an initial feature image;
determining a corresponding density characteristic image by using a crowd density estimation network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by using a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic image so as to obtain an initial detection image based on the pedestrian characteristic image;
and eliminating the labeling frame meeting the preset false detection condition in the initial detection image by utilizing a preset false detection reduction module so as to obtain a target detection result image.
Optionally, the feature detection of the original pedestrian image by using a backbone network in a preset pedestrian detection network to obtain an initial feature image includes:
and performing feature detection on the original pedestrian image by using a backbone network and a feature pyramid network in a preset pedestrian detection network, and processing the detected feature data by using a bilinear interpolation method to obtain an initial feature image.
Optionally, the point annotation data of the crowd density estimation network is data determined by using the midpoint of the upper boundary of the pedestrian detection annotation frame.
Optionally, the determining the corresponding density feature image by using the crowd density estimation network in the preset pedestrian detection network and the initial feature image includes:
performing preset density characteristic processing operation on the initial characteristic image by using a crowd density estimation network in the preset pedestrian detection network so as to determine a corresponding density characteristic image; the preset density characteristic processing operation comprises a dimension reduction operation and an up-sampling operation based on a bilinear interpolation method.
Optionally, the determining, by using the pedestrian feature detection network in the preset pedestrian detection network and the density feature image, a corresponding pedestrian feature image includes:
and determining corresponding pedestrian characteristic images based on a preset characteristic fusion strategy and by utilizing a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic images.
Optionally, the removing, by using a preset error detection reduction module, the labeling frame meeting a preset error detection condition in the initial detection image includes:
determining the false detection score of each marking frame in the initial detection image by using a preset false detection lowering module;
judging whether the false detection score is larger than a preset false detection score threshold value or not;
if the detection result is larger than the preset false detection condition, representing that the corresponding annotation frame meets the preset false detection condition, and eliminating the annotation frame meeting the preset false detection condition in the initial detection image.
Optionally, the determining, by using a preset false drop detection module, the false drop score of each label frame in the initial detection image includes:
determining a first false detection value of each marking frame in the initial detection image by using a preset false detection lowering module;
determining false detection scores of the marking frames according to the first false detection values and the second false detection values; the second false detection value is a value determined based on a detection result output by the pedestrian feature detection network.
In a second aspect, the present application provides a pedestrian detection apparatus comprising:
the system comprises an initial characteristic image acquisition module, a characteristic detection module and a characteristic detection module, wherein the initial characteristic image acquisition module is used for acquiring an original pedestrian image and carrying out characteristic detection on the original pedestrian image by utilizing a backbone network in a preset pedestrian detection network so as to obtain an initial characteristic image;
the initial detection image determining module is used for determining a corresponding density characteristic image by utilizing a crowd density estimating network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by utilizing a pedestrian characteristic detecting network in the preset pedestrian detection network and the density characteristic image so as to obtain an initial detection image based on the pedestrian characteristic image;
the false detection processing module is used for eliminating the marking frames meeting the preset false detection conditions in the initial detection image by utilizing the preset false detection reduction module so as to obtain a target detection result image.
In a third aspect, the present application provides an electronic device, including:
a memory for storing a computer program;
and a processor for executing the computer program to implement the pedestrian detection method described above.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which when executed by a processor implements the aforementioned pedestrian detection method.
In the method, an original pedestrian image is obtained, and feature detection is carried out on the original pedestrian image by utilizing a backbone network in a preset pedestrian detection network so as to obtain an initial feature image; determining a corresponding density characteristic image by using a crowd density estimation network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by using a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic image so as to obtain an initial detection image based on the pedestrian characteristic image; and eliminating the labeling frame meeting the preset false detection condition in the initial detection image by utilizing a preset false detection reduction module so as to obtain a target detection result image. Through the scheme, the corresponding density characteristic image can be determined based on the original pedestrian image, the initial detection image is determined by utilizing the density characteristic image, and then the annotation frame meeting the preset false detection condition in the initial detection image is removed by utilizing the preset false detection lowering module. In this way, the density characteristic image and the pedestrian characteristic detection network are utilized to determine the initial detection image, so that the attention degree of the preset pedestrian detection network to the crowd head information can be improved, the omission rate of pedestrian detection in a crowded scene is reduced, the performance of the preset pedestrian detection network is improved, the robustness of the network is enhanced, the marking frames meeting the preset false detection condition are removed by utilizing the preset false detection reduction module, the number of false detection marking frames is reduced, the false detection rate of pedestrian detection is reduced, and the pedestrian detection effect is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a pedestrian detection method provided in the present application;
FIG. 2 is a schematic diagram of a frame design of a pedestrian detection system provided herein;
fig. 3 is a schematic structural diagram of a pedestrian detection device provided in the present application;
fig. 4 is a block diagram of an electronic device provided in the present application.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
At present, the pedestrian detection technology for crowded scenes is not ideal in performance and poor in robustness, and due to shielding among pedestrians and the distance relation between pedestrians and cameras, false detection problems often occur in a pedestrian detection network. Therefore, the application discloses a pedestrian detection method, which can improve the efficiency and performance of pedestrian detection, enhance the robustness of a network, and reduce the false detection rate of pedestrian detection, thereby improving the effect of pedestrian detection.
Referring to fig. 1, an embodiment of the present invention discloses a pedestrian detection method, including:
and S11, acquiring an original pedestrian image, and performing feature detection on the original pedestrian image by using a backbone network in a preset pedestrian detection network to obtain an initial feature image.
In this embodiment, it should be noted that, the feature detection of the original pedestrian image by using the backbone network in the preset pedestrian detection network to obtain an initial feature image may specifically include: and performing feature detection on the original pedestrian image by using an HRNetV2 backbone network and a feature pyramid network in a preset pedestrian detection network, and processing the detected feature data by using a bilinear interpolation method to obtain an initial feature image.
And step S12, determining a corresponding density characteristic image by using the crowd density estimation network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by using the pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic image so as to obtain the initial detection image based on the pedestrian characteristic image.
In this embodiment, it may be understood that the determining, by using the crowd density estimation network in the preset pedestrian detection network and the initial feature image, a corresponding density feature image may specifically include: performing preset density characteristic processing operation on the initial characteristic image by using a crowd density estimation network in the preset pedestrian detection network so as to determine a corresponding density characteristic image; the preset density characteristic processing operation comprises a dimension reduction operation and an up-sampling operation based on a bilinear interpolation method.
It should be noted that the determining, by using the pedestrian feature detection network in the preset pedestrian detection network and the density feature image, a corresponding pedestrian feature image may specifically include: and determining corresponding pedestrian characteristic images based on a preset characteristic fusion strategy and by utilizing a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic images. In this way, based on the preset feature fusion strategy and by utilizing the pedestrian feature detection network and the density feature image in the preset pedestrian detection network, the corresponding pedestrian feature image is determined, so that the performance of the pedestrian detection network can be improved. In addition, in the training stage, the annotation required by the crowd density estimation network can be determined by utilizing the midpoint of the upper boundary of the pedestrian detection annotation frame, so that manual annotation is avoided, and the labor cost is reduced.
It should be noted that in the inference phase, the present application may not use the people group density estimation network, but rather preserve the pedestrian feature detection network. In this way, the present application can maintain consistent inference speeds and avoid creating additional overhead compared to baseline pedestrian detectors.
And S13, eliminating the labeling frames meeting the preset false detection conditions in the initial detection image by utilizing a preset false detection reduction module so as to obtain a target detection result image.
In this embodiment, it may be understood that the removing, by using the preset error detection reduction module, the labeling frame that satisfies the preset error detection condition in the initial detection image may specifically include: determining the false detection score of each marking frame in the initial detection image by using a preset false detection lowering module; judging whether the false detection score is larger than a preset false detection score threshold value or not; if the detection result is larger than the preset false detection condition, representing that the corresponding annotation frame meets the preset false detection condition, and eliminating the annotation frame meeting the preset false detection condition in the initial detection image.
It may be appreciated that the determining, by using the preset drop false detection module, the false detection score of each label frame in the initial detection image may specifically include: determining a first false detection value of each marking frame in the initial detection image by using a preset false detection lowering module; determining false detection scores of the marking frames according to the first false detection values and the second false detection values; the second false detection value is a value determined based on a detection result output by the pedestrian feature detection network. That is, the false detection score of each labeling frame in the initial detection image is determined by combining the first false detection value determined by the preset false detection lowering module and the second false detection value determined based on the detection result of the preset pedestrian feature detection network, so as to reject the labeling frames meeting the preset false detection condition in the initial detection image.
In the embodiment, an original pedestrian image is obtained, and feature detection is performed on the original pedestrian image by using a backbone network in a preset pedestrian detection network to obtain an initial feature image; determining a corresponding density characteristic image by using a crowd density estimation network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by using a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic image so as to obtain an initial detection image based on the pedestrian characteristic image; and eliminating the labeling frame meeting the preset false detection condition in the initial detection image by utilizing a preset false detection reduction module so as to obtain a target detection result image. Through the scheme, the corresponding density characteristic image can be determined based on the original pedestrian image, the initial detection image is determined by utilizing the density characteristic image, and then the annotation frame meeting the preset false detection condition in the initial detection image is removed by utilizing the preset false detection lowering module. In this way, the density characteristic image and the pedestrian characteristic detection network are utilized to determine the initial detection image, so that the attention degree of the preset pedestrian detection network to the crowd head information can be improved, the omission rate of pedestrian detection in a crowded scene is reduced, the performance of the preset pedestrian detection network is improved, the robustness of the network is enhanced, the marking frames meeting the preset false detection condition are removed by utilizing the preset false detection reduction module, the number of false detection marking frames is reduced, the false detection rate of pedestrian detection is reduced, and the pedestrian detection effect is improved.
A specific embodiment of a pedestrian detection method disclosed in the present application will be described below with reference to a schematic frame design of a pedestrian detection system disclosed in fig. 2.
As shown in fig. 2, after an original pedestrian image including a plurality of pedestrians is acquired, in order to accurately predict the positions and sizes of the pedestrians, high-resolution features including semantics and position information are required. In consideration of keeping the network model lightweight, feature subgraphs can be acquired from different stages of the preset pedestrian detection network, the acquired feature subgraphs are processed by using a bilinear interpolation method, and a convolution module is applied, so that a single-scale feature map is obtained. In this way, memory costs may be reduced and learnable parameters provided for convolution operations.
It should be noted that the preset pedestrian detection network of the present application includes a pedestrian feature detection network and a crowd density estimation network. After the initial feature image is acquired, the crowd density of the original pedestrian image can be analyzed by using the crowd density estimation network. In this embodiment, the initial feature image includes four initial feature subgraphs, and the crowd density estimation network in the preset pedestrian detection network determines the density feature subgraphs of the current stage based on the initial feature subgraphs of a plurality of stages, so as to obtain the density feature image. For example, the output of the density profile sub-graph of the fourth stage is defined as:
φ density =f 22 +f 33 +f 44 )))
wherein phi is 2 、φ 3 、φ 4 Initial feature subgraphs of the 2 nd stage, the 3 rd stage and the 4 th stage are respectively represented, and f represents the down-and up-sampling operation of the corresponding stages based on bilinear interpolation. Specifically, with the original pedestrian image as an input, the preset pedestrian detection network may generate a plurality of initial feature subgraphs with different resolutions and dimensions to obtain an initial feature image. The embodiment can use a 1×1 convolution layer to reduce the dimension of the initial feature subgraph of the 4 th stage from 1024 to 256, then perform upsampling operation based on bilinear interpolation, and perform pixel addition on the shallower stage, and so on until the resolution of the initial feature subgraph is 1/4 of the original pedestrian image, if at pixel x i There is a point which can be expressed as a delta function delta (x-x i ) An image with N marker points can then be represented as:
Figure BDA0004158375340000071
considering that H (x) is a discrete function, converting to a continuous density function more readily convolves the approximation of the neural network. Thus H (x) can be combined with Gaussian kernel G σ (x) And (5) performing convolution. Thus, group score of crowd density is defined as follows:
F(x)=H(x)*G σ (x)
and correspondingly, the pedestrian feature detection network in the preset pedestrian detection network determines the pedestrian feature image according to the initial feature image and the corresponding density feature image, and then determines the initial detection image comprising a plurality of marking frames based on the obtained pedestrian feature image. The pedestrian characteristic detection network comprises a 3x3 convolution layer and two prediction layers, wherein one prediction layer is used for the central position, and the other prediction layer is used for the corresponding scale. Based on the idea of center and scale map prediction, the predefined size of the detected bounding boxes can be eliminated, and the network architecture and bounding box size can be fine-tuned using different loss settings to better converge and accurately locate, while the architecture can also be fine-tuned using different losses to better converge and accurately locate pedestrian positions. The center loss of the pedestrian feature detection network can be expressed as:
Figure BDA0004158375340000081
Figure BDA0004158375340000082
Figure BDA0004158375340000083
wherein K is the number of marked frames, p ij And y ij The predicted central probability and the true label, respectively. CE (p) ij ,y ij ) Representing cross entropy loss, alpha ij Weights (i, j) for each location. M is M ij Representing a Gaussian-based penalty for pixels around the center, p due to the difficulty in specifying an exact center ij γ And (1-p) ij ) γ The focus weight based on the prediction confidence can be used to reduce the contribution of simple samples to the penalty and help the optimizer focus on difficult samples. (1-M) ij ) β The loss of false detection frames closer to the true center is reduced, according to the dataThe processing determines a pedestrian characteristic subgraph of the corresponding stage.
In this embodiment, the pedestrian feature image output by the pedestrian feature detection network includes a plurality of stage pedestrian feature subgraphs, and correspondingly, the density feature image output by the crowd density estimation network includes a plurality of stage density feature subgraphs, where the pedestrian feature subgraphs and the density feature subgraphs in the corresponding stages can be fused based on a preset feature fusion policy, and the fused features can be used for pedestrian detection so as to obtain a more accurate detection result. The preset feature fusion strategy can be expressed as:
Figure BDA0004158375340000084
wherein D is i And P i Respectively representing characteristic subgraphs output by a crowd density estimation network and a pedestrian characteristic detection network in an ith stage, f i θ For 3x3 convolution for transforming crowd density estimation characteristics with pedestrian detection functions, representing tandem operation of different functions, f i ω Is a 1 x 1 convolution operator.
Since the pedestrian feature detection network uses the focus loss as the center loss, false positives (pedestrians identified as false pedestrians) near the positive center are not sufficiently penalized. Although most of these false positives are inhibited by Non-maximal inhibition (Non-Maximum Suppression, NMS), another inhibition step is required to inhibit the remaining positives, i.e., ioU (Intersection over Union) of positive prediction is below 0.5, thereby reducing the false detection rate. Therefore, in this embodiment, the preset error-detecting module may be composed of the ROI alignment layer, the convolution layer and the dense layer, and the preset error-detecting module is trained under the separate setting, so that the gradient from the preset error-detecting module avoids flowing back to the feature map or the detection head, thereby realizing a simple, light and effective error-detecting module. After the initial detection image is determined, the false positive in the initial detection image is processed by the preset false-positive-reduction detection module, namely, the labeling frame meeting the preset false-positive detection condition in the initial detection image is removed, so that the detection result is further refined, and a target detection result image is obtained.
In this way, the embodiment can estimate the head distribution information contained in the network-output density characteristic image based on the crowd density in the pedestrian detection process, efficiently count the heads under the condition of dense scenes, and strengthen the recognition capability of the preset pedestrian detection network on the large-scale pedestrian information and the small-scale pedestrian information under the conditions of disordered background and large pedestrian scale difference in the original pedestrian image, thereby improving the crowd counting accuracy. In this embodiment, pedestrian detection in a real scene (such as a crowded scene and a small-scale pedestrian scene) can be assisted by using crowd density attributes, so that the network can pay more attention to the head of a person and the small-scale pedestrian, thereby improving the characteristic characterization and detection capability of the pedestrian under the condition of severely shielding or keeping away from a camera, remarkably improving the performance of the pedestrian detection method, and respectively avoiding additional annotation and calculation burden in training and reasoning stages. The density characteristic image output by the crowd density estimation network can be used as noise of the pedestrian characteristic detection network, so that the shallow pedestrian detection network is forced to pay more attention to the upper boundary of the detection frame, a large number of error detection frames in the upper boundary are reduced, and the robustness of the network is enhanced. Meanwhile, after the initial detection image is obtained, the preset false detection module is utilized to process, so that the detection capability of a preset pedestrian detection network on pedestrians is improved, the false detection incidence rate is reduced, the number of true positives is increased, and the number of false detection marking frames is reduced. The pedestrian feature detection network and the crowd density estimation network share the same backbone network, information features can be effectively extracted, the two networks are mutually coordinated, and a better pedestrian feature detection network can bring a better shallow feature representation, so that the performance of the crowd density estimation network is improved, and meanwhile, the feature fusion also promotes the improvement of the performance of the pedestrian feature detection network.
Referring to fig. 3, the present application discloses a pedestrian detection apparatus including:
the initial feature image acquisition module 11 is configured to acquire an original pedestrian image, and perform feature detection on the original pedestrian image by using a backbone network in a preset pedestrian detection network to obtain an initial feature image;
an initial detection image determining module 12, configured to determine a corresponding density feature image using the crowd density estimating network in the preset pedestrian detection network and the initial feature image, and determine a corresponding pedestrian feature image using the pedestrian feature detecting network in the preset pedestrian detection network and the density feature image, so as to obtain an initial detection image based on the pedestrian feature image;
and the false detection processing module 13 is used for eliminating the labeling frames meeting the preset false detection conditions in the initial detection image by utilizing the preset false detection reduction module so as to obtain a target detection result image.
In the method, an original pedestrian image is obtained, and feature detection is carried out on the original pedestrian image by utilizing a backbone network in a preset pedestrian detection network so as to obtain an initial feature image; determining a corresponding density characteristic image by using a crowd density estimation network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by using a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic image so as to obtain an initial detection image based on the pedestrian characteristic image; and eliminating the labeling frame meeting the preset false detection condition in the initial detection image by utilizing a preset false detection reduction module so as to obtain a target detection result image. Through the scheme, the corresponding density characteristic image can be determined based on the original pedestrian image, the initial detection image is determined by utilizing the density characteristic image, and then the annotation frame meeting the preset false detection condition in the initial detection image is removed by utilizing the preset false detection lowering module. In this way, the density characteristic image and the pedestrian characteristic detection network are utilized to determine the initial detection image, so that the attention degree of the preset pedestrian detection network to the crowd head information can be improved, the omission rate of pedestrian detection in a crowded scene is reduced, the performance of the preset pedestrian detection network is improved, the robustness of the network is enhanced, the marking frames meeting the preset false detection condition are removed by utilizing the preset false detection reduction module, the number of false detection marking frames is reduced, the false detection rate of pedestrian detection is reduced, and the pedestrian detection effect is improved.
In some specific embodiments, the initial feature image acquiring module 11 may specifically include:
the initial characteristic image acquisition unit is used for carrying out characteristic detection on the original pedestrian image by utilizing a backbone network and a characteristic pyramid network in a preset pedestrian detection network, and processing the detected characteristic data by utilizing a bilinear interpolation method so as to obtain an initial characteristic image.
In some specific embodiments, the initial detection image determining module 12 may specifically include:
the density characteristic image determining unit is used for performing preset density characteristic processing operation on the initial characteristic image by utilizing the crowd density estimating network in the preset pedestrian detection network so as to determine a corresponding density characteristic image; the preset density characteristic processing operation comprises a dimension reduction operation and an up-sampling operation based on a bilinear interpolation method.
In some specific embodiments, the initial detection image determining module 12 may specifically include:
the pedestrian characteristic image determining unit is used for determining corresponding pedestrian characteristic images based on a preset characteristic fusion strategy and by utilizing the pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic images.
In some specific embodiments, the false detection processing module 13 may specifically include:
the false detection score determining sub-module is used for determining the false detection score of each marking frame in the initial detection image by utilizing the preset false detection lowering module;
the score threshold judging unit is used for judging whether the false detection score is larger than a preset false detection score threshold or not;
and the marking frame eliminating unit is used for characterizing that the corresponding marking frame meets the preset false detection condition if the marking frame is larger than the preset false detection condition so as to eliminate the marking frame meeting the preset false detection condition in the initial detection image.
In some specific embodiments, the false detection score determination submodule may specifically include:
the false detection value determining unit is used for determining a first false detection value of each marking frame in the initial detection image by utilizing a preset descent false detection module;
the false detection score determining unit is used for determining the false detection score of the marking frame according to the first false detection value and the second false detection value; the second false detection value is a value determined based on a detection result output by the pedestrian feature detection network.
Further, the embodiment of the present application further discloses an electronic device, and fig. 4 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the figure is not to be considered as any limitation on the scope of use of the present application.
Fig. 4 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is configured to store a computer program that is loaded and executed by the processor 21 to implement the relevant steps in the pedestrian detection method disclosed in any one of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the pedestrian detection method performed by the electronic device 20 disclosed in any of the foregoing embodiments.
Further, the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the pedestrian detection method disclosed previously. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing has outlined the detailed description of the preferred embodiment of the present application, and the detailed description of the principles and embodiments of the present application has been provided herein by way of example only to facilitate the understanding of the method and core concepts of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. A pedestrian detection method, characterized by comprising:
acquiring an original pedestrian image, and performing feature detection on the original pedestrian image by utilizing a backbone network in a preset pedestrian detection network to obtain an initial feature image;
determining a corresponding density characteristic image by using a crowd density estimation network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by using a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic image so as to obtain an initial detection image based on the pedestrian characteristic image;
and eliminating the labeling frame meeting the preset false detection condition in the initial detection image by utilizing a preset false detection reduction module so as to obtain a target detection result image.
2. The pedestrian detection method according to claim 1, wherein the feature detection of the original pedestrian image by using a backbone network in a preset pedestrian detection network to obtain an initial feature image includes:
and performing feature detection on the original pedestrian image by using a backbone network and a feature pyramid network in a preset pedestrian detection network, and processing the detected feature data by using a bilinear interpolation method to obtain an initial feature image.
3. The pedestrian detection method according to claim 1, wherein the point annotation data of the crowd density estimation network is data determined by using a midpoint of an upper boundary of a pedestrian detection annotation frame.
4. The pedestrian detection method according to claim 1, wherein the determining a corresponding density feature image using the initial feature image and a crowd density estimation network in the preset pedestrian detection network includes:
performing preset density characteristic processing operation on the initial characteristic image by using a crowd density estimation network in the preset pedestrian detection network so as to determine a corresponding density characteristic image; the preset density characteristic processing operation comprises a dimension reduction operation and an up-sampling operation based on a bilinear interpolation method.
5. The pedestrian detection method according to claim 1, wherein the determining a corresponding pedestrian feature image using the pedestrian feature detection network of the preset pedestrian detection networks and the density feature image includes:
and determining corresponding pedestrian characteristic images based on a preset characteristic fusion strategy and by utilizing a pedestrian characteristic detection network in the preset pedestrian detection network and the density characteristic images.
6. The pedestrian detection method according to any one of claims 1 to 5, wherein the removing, by using a preset false drop detection module, the labeling frame satisfying a preset false drop detection condition in the initial detection image includes:
determining the false detection score of each marking frame in the initial detection image by using a preset false detection lowering module;
judging whether the false detection score is larger than a preset false detection score threshold value or not;
if the detection result is larger than the preset false detection condition, representing that the corresponding annotation frame meets the preset false detection condition, and eliminating the annotation frame meeting the preset false detection condition in the initial detection image.
7. The pedestrian detection method of claim 6, wherein the determining, with the preset drop false detection module, a false detection score for each label box in the initial detection image comprises:
determining a first false detection value of each marking frame in the initial detection image by using a preset false detection lowering module;
determining false detection scores of the marking frames according to the first false detection values and the second false detection values; the second false detection value is a value determined based on a detection result output by the pedestrian feature detection network.
8. A pedestrian detection apparatus characterized by comprising:
the system comprises an initial characteristic image acquisition module, a characteristic detection module and a characteristic detection module, wherein the initial characteristic image acquisition module is used for acquiring an original pedestrian image and carrying out characteristic detection on the original pedestrian image by utilizing a backbone network in a preset pedestrian detection network so as to obtain an initial characteristic image;
the initial detection image determining module is used for determining a corresponding density characteristic image by utilizing a crowd density estimating network in the preset pedestrian detection network and the initial characteristic image, and determining a corresponding pedestrian characteristic image by utilizing a pedestrian characteristic detecting network in the preset pedestrian detection network and the density characteristic image so as to obtain an initial detection image based on the pedestrian characteristic image;
the false detection processing module is used for eliminating the marking frames meeting the preset false detection conditions in the initial detection image by utilizing the preset false detection reduction module so as to obtain a target detection result image.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the pedestrian detection method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program which, when executed by a processor, implements the pedestrian detection method according to any one of claims 1 to 7.
CN202310342106.9A 2023-03-31 2023-03-31 Pedestrian detection method, device, equipment and storage medium Pending CN116363701A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310342106.9A CN116363701A (en) 2023-03-31 2023-03-31 Pedestrian detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310342106.9A CN116363701A (en) 2023-03-31 2023-03-31 Pedestrian detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116363701A true CN116363701A (en) 2023-06-30

Family

ID=86915624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310342106.9A Pending CN116363701A (en) 2023-03-31 2023-03-31 Pedestrian detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116363701A (en)

Similar Documents

Publication Publication Date Title
WO2019228211A1 (en) Lane-line-based intelligent driving control method and apparatus, and electronic device
JP7016943B2 (en) Methods, devices and equipment for object detection
EP3806064B1 (en) Method and apparatus for detecting parking space usage condition, electronic device, and storage medium
US9990546B2 (en) Method and apparatus for determining target region in video frame for target acquisition
CN106951830B (en) Image scene multi-object marking method based on prior condition constraint
CN110163188B (en) Video processing and method, device and equipment for embedding target object in video
CN110622177A (en) Instance partitioning
CN113610087B (en) Priori super-resolution-based image small target detection method and storage medium
CN115205636B (en) Image target detection method, system, equipment and storage medium
CN113903028A (en) Target detection method and electronic equipment
CN112784750A (en) Fast video object segmentation method and device based on pixel and region feature matching
CN113033715B (en) Target detection model training method and target vehicle detection information generation method
CN117392638A (en) Open object class sensing method and device for serving robot scene
CN116091781B (en) Data processing method and device for image recognition
CN116363701A (en) Pedestrian detection method, device, equipment and storage medium
EP4332910A1 (en) Behavior detection method, electronic device, and computer readable storage medium
CN116363628A (en) Mark detection method and device, nonvolatile storage medium and computer equipment
CN115240133A (en) Bus congestion degree analysis method, device and equipment
CN114219759A (en) Detection method, detection device and computer readable storage medium
CN114627400A (en) Lane congestion detection method and device, electronic equipment and storage medium
CN112597825A (en) Driving scene segmentation method and device, electronic equipment and storage medium
CN114170267A (en) Target tracking method, device, equipment and computer readable storage medium
Garg et al. Low complexity techniques for robust real-time traffic incident detection
CN112149463A (en) Image processing method and device
CN112699711B (en) Lane line detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination