CN111582032A - Pedestrian detection method and device, terminal equipment and storage medium - Google Patents

Pedestrian detection method and device, terminal equipment and storage medium Download PDF

Info

Publication number
CN111582032A
CN111582032A CN202010260160.5A CN202010260160A CN111582032A CN 111582032 A CN111582032 A CN 111582032A CN 202010260160 A CN202010260160 A CN 202010260160A CN 111582032 A CN111582032 A CN 111582032A
Authority
CN
China
Prior art keywords
image
detected
convolution layer
neural network
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010260160.5A
Other languages
Chinese (zh)
Inventor
肖传利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TP Link Technologies Co Ltd
Original Assignee
TP Link Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TP Link Technologies Co Ltd filed Critical TP Link Technologies Co Ltd
Priority to CN202010260160.5A priority Critical patent/CN111582032A/en
Publication of CN111582032A publication Critical patent/CN111582032A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Abstract

The invention discloses a pedestrian detection method, a pedestrian detection device, terminal equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be detected; carrying out motion detection on the image to be detected to obtain a motion area mask corresponding to the image to be detected; inputting the image to be detected into a pre-trained convolutional neural network, and carrying out pedestrian detection on the image to be detected through the convolutional neural network according to the motion area mask to obtain a pedestrian target frame of the image to be detected. By adopting the embodiment of the invention, the pedestrian can be accurately and efficiently identified.

Description

Pedestrian detection method and device, terminal equipment and storage medium
Technical Field
The present invention relates to the field of image recognition technologies, and in particular, to a pedestrian detection method, apparatus, terminal device, and storage medium.
Background
With the development of information technology, video surveillance systems have become ubiquitous in all corners of life. Under the condition that the video monitoring system is widely used, the pedestrian detection technology with a far detection range can detect human bodies in different postures, and can provide effective personnel warning information for users.
At present, feature extraction is generally performed on the whole image by using a convolutional neural network, and then a pedestrian detection result is obtained according to the extracted feature map.
However, in the process of implementing the present invention, the inventor finds that, because the convolutional neural network is relatively complex, feature extraction is performed from the whole image, and particularly for a larger image, the computation amount is large, the time consumption is long, and therefore, the efficiency of pedestrian detection is low.
Disclosure of Invention
The embodiment of the invention provides a pedestrian detection method, a pedestrian detection device, terminal equipment and a storage medium, which can accurately and efficiently identify pedestrians.
In order to achieve the above object, an embodiment of the present invention provides a pedestrian detection method, including:
acquiring an image to be detected;
carrying out motion detection on the image to be detected to obtain a motion area mask corresponding to the image to be detected;
inputting the image to be detected into a pre-trained convolutional neural network, and carrying out pedestrian detection on the image to be detected through the convolutional neural network according to the motion area mask to obtain a pedestrian target frame of the image to be detected.
As an improvement of the above scheme, after the image to be detected is input into a pre-trained convolutional neural network, and the pedestrian detection is performed on the image to be detected through the convolutional neural network according to the motion region mask, so as to obtain a pedestrian target frame of the image to be detected, the method further includes the following steps:
and screening the pedestrian target frame of the image to be detected to obtain a final target frame.
As an improvement of the above scheme, the inputting the image to be detected into a pre-trained convolutional neural network, and performing pedestrian detection on the image to be detected through the convolutional neural network according to the motion region mask to obtain a pedestrian target frame of the image to be detected specifically includes:
inputting the image to be detected into a pre-trained convolutional neural network;
extracting the characteristics of the image to be detected according to the motion area mask through each convolution layer of the convolution neural network to obtain an output characteristic diagram of the image to be detected extracted in each convolution layer;
and predicting the classes and coordinates of the pedestrian targets according to the output characteristic graphs extracted from the convolutional layers through the fully-connected layers of the convolutional neural network to obtain a pedestrian target frame of the image to be detected.
As an improvement of the above scheme, the extracting features of the image to be detected according to the motion region mask by using each convolution layer of the convolutional neural network to obtain an output feature map of the image to be detected extracted at each convolution layer specifically includes:
determining a region to be convolved of the input image of each convolution layer in the convolutional neural network according to the motion region mask;
performing convolution processing on the input image of each convolution layer according to the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain an output characteristic diagram of each convolution layer; wherein the image to be detected is an input image of a first convolutional layer of the convolutional neural network.
As an improvement of the above scheme, the determining, according to the motion region mask, a region to be convolved of the input image of each convolution layer in the convolutional neural network specifically includes:
scaling the size of the mask of the motion area to be equal to the size of the input image of each convolution layer of the convolutional neural network to obtain a first mask of each convolution layer;
and determining the area to be convolved of the input image of each convolution layer according to the first mask of each convolution layer and the receptive field of each convolution layer.
As an improvement of the above scheme, the determining a region to be convolved of the input image of each convolution layer according to the first mask of each convolution layer and the receptive field of each convolution layer specifically includes:
according to the receptive field of each convolution layer, performing expansion processing on the first mask of each convolution layer to obtain a second mask of each convolution layer;
and determining the area to be convolved of the input image of each convolution layer according to the second mask of each convolution layer.
As an improvement of the above scheme, the obtaining an output feature map of each convolutional layer by performing convolution processing on the input image of each convolutional layer through each convolutional layer of the convolutional neural network according to a region to be convolved of the input image of each convolutional layer specifically includes:
performing convolution processing on the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain a characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer, and endowing the characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer to a corresponding node of an output characteristic diagram of each convolution layer;
and assigning the unassigned nodes in the output characteristic diagram of each convolutional layer according to the pre-obtained reference characteristic diagram of each convolutional layer to obtain the output characteristic diagram of each convolutional layer.
Another embodiment of the present invention correspondingly provides a pedestrian detection device, including:
the image acquisition module to be detected is used for acquiring an image to be detected;
the motion area mask acquisition module is used for carrying out motion detection on the image to be detected to obtain a motion area mask corresponding to the image to be detected;
and the pedestrian detection result acquisition module is used for inputting the image to be detected into a pre-trained convolutional neural network, and carrying out pedestrian detection on the image to be detected through the convolutional neural network according to the motion area mask to obtain a pedestrian target frame of the image to be detected.
Another embodiment of the present invention provides a terminal device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor implements the pedestrian detection method according to any one of the above items when executing the computer program.
Another embodiment of the present invention provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the pedestrian detection method according to any one of the above items.
Compared with the prior art, the pedestrian detection method, the device, the terminal device and the storage medium provided by the embodiment of the invention have the advantages that the motion detection is carried out on the obtained image to be detected, the motion area mask corresponding to the image to be detected is obtained, then the image to be detected is input into the pre-trained convolutional neural network, the pedestrian detection is carried out on the image to be detected through the convolutional neural network according to the motion area mask, the pedestrian target frame of the image to be detected is obtained, and therefore the pedestrian detection is realized. In the process of pedestrian detection, the embodiment of the invention can reduce the operation amount in the process of characteristic extraction and reduce the time required by characteristic extraction because the characteristic extraction is carried out on the image to be detected through the convolutional neural network according to the mask of the motion region without carrying out the characteristic extraction on the whole image to be detected, thereby solving the problems of large operation amount and long time consumption caused by the characteristic extraction from the whole image in the prior art, and the mask of the motion region can reflect the information of the motion region in the image to be detected.
Drawings
Fig. 1 is a flowchart illustrating a pedestrian detection method according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a pedestrian detection method according to another embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a YOLO network according to an embodiment of the present invention.
Fig. 4(a) is a schematic diagram of an image to be detected after dividing a grid and marking a motion region according to an embodiment of the present invention.
Fig. 4(b) is a schematic diagram of a first mask of an image to be detected according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a pedestrian detection device according to an embodiment of the present invention.
Fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a pedestrian detection method according to an embodiment of the present invention.
The pedestrian detection method provided by the embodiment of the invention comprises the following steps S11 to S13:
and S11, acquiring an image to be detected.
And S12, carrying out motion detection on the image to be detected to obtain a motion area mask corresponding to the image to be detected.
S13, inputting the image to be detected into a pre-trained convolutional neural network, and carrying out pedestrian detection on the image to be detected through the convolutional neural network according to the motion area mask to obtain a pedestrian target frame of the image to be detected.
The pedestrian detection principle of the embodiment of the invention is as follows:
generally, in each image to be detected, besides moving objects (such as people, animal vehicles and the like), a large number of background regions unrelated to the moving objects exist, a region where the moving objects (such as people, animal vehicles and the like) move is called a moving region, a region where the moving objects do not move is called a non-moving region, and pedestrians have moving characteristics, so that a pedestrian target falls on the moving region. In the feature extraction process of the convolutional neural network, the larger the information amount in the image is, the larger the operation amount is, therefore, motion detection is performed on the image to be detected first to obtain a motion region mask corresponding to the image to be detected, when the convolutional neural network performs pedestrian detection, nodes needing to participate in calculation in the input image of each convolutional layer are determined according to the motion region mask corresponding to the image to be detected, that is, only the nodes in the motion region of the input image are calculated while a certain layer of feature map of the convolutional neural network is calculated, and the nodes in the non-motion region not including a pedestrian target are not involved in calculation, so that the operation amount and time consumption can be greatly reduced.
To facilitate understanding of the present embodiment, a specific example of the process of pedestrian detection may be as follows:
firstly, training a convolutional neural network in advance by an existing training method, wherein the convolutional neural network can be a YOLO network, an SSD network and the like; then, when people are detected, obtaining an image to be detected, wherein the image to be detected can be a frame image in a video stream; then, performing motion detection on the image to be detected to obtain a motion region mask corresponding to the image to be detected, for example, extracting a motion region in the image to be detected by using a motion detection method such as an inter-frame difference method to obtain a motion region mask corresponding to the image to be detected, where the motion region mask may be used to control a processing region or a processing process of an image, the motion region mask may be a binary image composed of 0 and 1, a 1-value region is an interest region, and a 0-value region is a shielded region; then, inputting the image to be detected into a pre-trained convolutional neural network, and performing pedestrian detection on the image to be detected through the convolutional neural network according to the moving region mask to obtain a pedestrian target frame of the image to be detected, wherein when the image to be detected is subjected to pedestrian detection through the convolutional neural network, a non-moving region on the image is shielded by using the moving region mask to enable the non-moving region not to participate in calculation, so that feature extraction is performed only on the moving region with pedestrians, the data calculation amount can be reduced while the detection precision is ensured, and the data processing speed and efficiency are improved.
It should be noted that the execution subject of the pedestrian detection method according to the embodiment of the present invention may be a local computer, a server, or even a mobile terminal connected to the monitoring camera system (wirelessly connected or wired), and may specifically be a processor of these devices.
As can be seen from the above analysis, in the process of detecting pedestrians, the method for detecting pedestrians provided in the embodiments of the present invention can reduce the computation amount in the process of extracting features of the image to be detected through the convolutional neural network according to the mask of the motion region, and does not need to extract features of the entire image to be detected, thereby reducing the time required for extracting features, and solving the problems of large computation amount and long time consumption caused by extracting features from the entire image in the prior art.
As an alternative embodiment, referring to fig. 2, after the step S13, the pedestrian detection method further includes the steps of:
and S14, screening the pedestrian target frame of the image to be detected to obtain a final target frame.
The pedestrian target frames can be screened by post-processing methods such as non-maximum suppression, repeated pedestrian target frames are filtered and removed, a final target frame is obtained, and the frame position of the pedestrian target is more accurate.
As one optional embodiment, the step S13 specifically includes:
s131, inputting the image to be detected into a pre-trained convolutional neural network.
S132, extracting the characteristics of the image to be detected according to the motion area mask through each convolution layer of the convolution neural network to obtain an output characteristic diagram of the image to be detected extracted in each convolution layer.
And S133, predicting the classes and coordinates of the pedestrian targets according to the output feature maps extracted from the convolutional layers through the full-connection layers of the convolutional neural network to obtain a pedestrian target frame of the image to be detected.
The method comprises the steps of inputting an image to be detected into a pre-trained convolutional neural network, shielding a non-motion area on the image to be detected by using a motion area mask when extracting the features of the image to be detected through each convolutional layer of the convolutional neural network, so that the non-motion area does not participate in calculation, extracting the features of the motion area with pedestrians only, reducing data processing amount, improving data processing speed, and after obtaining output feature maps extracted by each convolutional layer, predicting the category and the coordinate of the pedestrian target according to the output feature maps extracted by each convolutional layer through a full connection layer of the convolutional neural network, so that the pedestrian target frame of the image to be detected is obtained.
Further, the step S132 specifically includes:
s1321, determining a region to be convolved of the input image of each convolution layer in the convolution neural network according to the motion region mask;
s1322, performing convolution processing on the input image of each convolution layer according to the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain an output characteristic diagram of each convolution layer; wherein the image to be detected is an input image of a first convolutional layer of the convolutional neural network.
Optionally, step S1321 specifically includes:
s13211, scaling the size of the motion region mask to be equal to the size of the input image of each convolution layer of the convolutional neural network to obtain a first mask of each convolution layer;
s13212, determining the area to be convolved of the input image of each convolution layer according to the first mask of each convolution layer and the receptive field of each convolution layer.
The motion region mask can be scaled to the same size as the input image of each convolution layer of the convolutional neural network, so that a first mask of each convolution layer is obtained, and in consideration of actual discrimination, the background in each layer of the experience field of the pedestrian target also participates in calculation, so that the region to be convolved of the input image of each convolution layer is determined according to the first mask of each convolution layer and the experience field of each convolution layer, and the accuracy of feature extraction is guaranteed.
Specifically, the step S13212 specifically includes:
s132121, according to the receptive field of each convolution layer, performing expansion processing on the first mask of each convolution layer to obtain a second mask of each convolution layer;
s132122, determining a region to be convolved of the input image of each convolution layer according to the second mask of each convolution layer.
The second mask CalMask _ n of each convolutional layer is a partition (MotionMask _ n, FieldMask _ n), where the partition is an expansion operation, the MotionMask _ n is the first mask of each convolutional layer, and the FieldMask _ n is a field region of each convolutional layer. It is to be noted that the expansion of structure A by structure B is defined as
Figure BDA0002438985170000081
Figure BDA0002438985170000082
It can be understood that, when the convolution operation is performed on the structure B on the structure a, if there is an overlapping area with the structure a during the movement of the structure B, the position is recorded, and the set of all the positions where the movement of the structure B intersects with the structure a is the expansion result of the structure a under the action of the structure B.
It can be understood that the region of the second mask of each convolution layer with a pixel value of 1 is the region to be convolved of the input image of each convolution layer.
As an example, it can be seen from the YOLO network structure shown in fig. 3 that the feature map size extracted from the color map of 448 × 448 size to the left of the dotted line is 7 × 7, as shown in fig. 4(a), the image to be detected of 448 × 448 is divided into 7 × 7 grids, each grid corresponds to one dot in the 7 × 7 size feature map, the region with lower gray level in fig. 4(a) can be obtained as a motion region according to the motion detection, and if there is a motion region in the grid, the value of the dot corresponding to the MotionMask _ n is 1, thereby obtaining the MotionMask _ n corresponding to the convolution layer as shown in fig. 4 (b).
Optionally, the step S1322 specifically includes:
s13221, performing convolution processing on the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain a characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer, and endowing the characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer to a corresponding node of an output characteristic diagram of each convolution layer;
s13222, assigning the nodes which are not assigned in the output feature map of each convolutional layer according to the reference feature map of each convolutional layer obtained in advance to obtain the output feature map of each convolutional layer.
When the feature extraction is performed through each convolution layer of the convolutional neural network, only the feature value of the region to be convolved of the input image is calculated, and for the value of the region which does not participate in the calculation in the input image on the output feature map, the value of the reference feature map of each convolution layer obtained in advance can be used for assignment, so that the accuracy of the feature extraction is ensured while the calculation amount is reduced and the calculation speed is improved. It should be noted that the reference feature map may be obtained by inputting the original map image of the unrelated scene into the trained convolutional neural network in advance for calculation, and the original map image of the unrelated scene may be an image with all pixel values of 0.
The embodiment of the invention also provides a pedestrian detection device which can implement all the processes of the pedestrian detection method.
Fig. 5 is a schematic structural diagram of a pedestrian detection device according to an embodiment of the present invention.
The pedestrian detection device provided by the embodiment of the invention comprises:
the image acquisition module 21 is used for acquiring an image to be detected;
a moving area mask obtaining module 22, configured to perform motion detection on the image to be detected, so as to obtain a moving area mask corresponding to the image to be detected;
and the pedestrian detection result acquisition module 23 is configured to input the image to be detected into a pre-trained convolutional neural network, and perform pedestrian detection on the image to be detected through the convolutional neural network according to the motion region mask to obtain a pedestrian target frame of the image to be detected.
As one of the optional embodiments, the pedestrian detection apparatus further includes:
and the target frame screening module is used for screening the pedestrian target frame of the image to be detected to obtain a final target frame.
As one optional embodiment, the pedestrian detection result obtaining module 23 specifically includes:
the image input submodule is used for inputting the image to be detected into a pre-trained convolutional neural network;
the characteristic extraction submodule is used for extracting the characteristics of the image to be detected according to the motion area mask through each convolution layer of the convolutional neural network to obtain an output characteristic diagram of the image to be detected extracted in each convolution layer;
and the target frame prediction submodule is used for predicting the category and the coordinate of the pedestrian target according to the output characteristic diagram extracted from each convolution layer through the full connection layer of the convolution neural network so as to obtain the pedestrian target frame of the image to be detected.
As one optional embodiment, the feature extraction sub-module specifically includes:
a convolution region obtaining unit, configured to determine a convolution region of the input image of each convolution layer in the convolutional neural network according to the motion region mask;
the output characteristic diagram acquisition unit is used for carrying out convolution processing on the input image of each convolution layer according to the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain an output characteristic diagram of each convolution layer; wherein the image to be detected is an input image of a first convolutional layer of the convolutional neural network.
Further, the to-be-convolved region acquiring unit specifically includes:
a first mask obtaining subunit, configured to scale the size of the mask in the motion region to be equal to the size of the input image of each convolutional layer of the convolutional neural network, so as to obtain a first mask of each convolutional layer;
and the area to be convolved determining subunit is used for determining the area to be convolved of the input image of each convolution layer according to the first mask of each convolution layer and the receptive field of each convolution layer.
Specifically, the to-be-convolved region determining subunit is specifically configured to:
according to the receptive field of each convolution layer, performing expansion processing on the first mask of each convolution layer to obtain a second mask of each convolution layer;
and determining the area to be convolved of the input image of each convolution layer according to the second mask of each convolution layer.
As an optional embodiment, the output feature map obtaining unit specifically includes:
the characteristic value operator unit is used for performing convolution processing on the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain a characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer, and endowing the characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer to a corresponding node of an output characteristic diagram of each convolution layer;
and the assignment subunit is used for assigning the nodes which are not assigned in the output characteristic diagram of each convolutional layer according to the reference characteristic diagram of each convolutional layer obtained in advance to obtain the output characteristic diagram of each convolutional layer.
The principle of the pedestrian detection device for realizing pedestrian detection is the same as that of the embodiment of the method, and the details are not repeated herein.
According to the pedestrian detection device provided by the embodiment of the invention, in the process of pedestrian detection, the feature extraction is carried out on the image to be detected according to the motion region mask through the convolutional neural network, and the feature extraction is not required to be carried out on the whole image to be detected, so that the calculation amount in the feature extraction process can be reduced, the time required by the feature extraction is reduced, the problems of large calculation amount and long consumed time caused by the feature extraction from the whole image in the prior art are solved, in addition, the motion region mask can reflect the motion region information in the image to be detected, the pedestrian detection is carried out by combining the motion region mask and the convolutional neural network, the accuracy of pedestrian detection can be ensured, and the pedestrian can be accurately and efficiently identified.
Fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
The terminal device provided by the embodiment of the present invention includes a processor 31, a memory 32, and a computer program stored in the memory 32 and configured to be executed by the processor 31, and when the processor 31 executes the computer program, the pedestrian detection method according to any one of the above embodiments is implemented.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the pedestrian detection method according to any one of the above embodiments.
The processor 31, when executing the computer program, implements the steps of the above-described embodiment of the pedestrian detection method, such as all the steps of the pedestrian detection method shown in fig. 1. Alternatively, the processor 31, when executing the computer program, implements the functions of the modules/units in the above-described embodiment of the pedestrian detection apparatus, such as the functions of the modules of the pedestrian detection apparatus shown in fig. 5.
Illustratively, the computer program may be divided into one or more modules, which are stored in the memory 32 and executed by the processor 31 to accomplish the present invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device. For example, the computer program may be divided into an image acquisition module to be detected, a motion region mask acquisition module, and a pedestrian detection result acquisition module, where the specific functions of each module are as follows: the image acquisition module to be detected is used for acquiring an image to be detected; the motion area mask acquisition module is used for carrying out motion detection on the image to be detected to obtain a motion area mask corresponding to the image to be detected; and the pedestrian detection result acquisition module is used for inputting the image to be detected into a pre-trained convolutional neural network, and carrying out pedestrian detection on the image to be detected through the convolutional neural network according to the motion area mask to obtain a pedestrian target frame of the image to be detected.
The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor 31, a memory 32. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a terminal device and does not constitute a limitation of a terminal device, and may include more or less components than those shown, or combine certain components, or different components, for example, the terminal device may also include input output devices, network access devices, buses, etc.
The Processor 31 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 31 is a control center of the terminal device and connects various parts of the whole terminal device by using various interfaces and lines.
The memory 32 can be used for storing the computer programs and/or modules, and the processor 31 can implement various functions of the terminal device by running or executing the computer programs and/or modules stored in the memory 32 and calling the data stored in the memory 32. The memory 32 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the terminal device, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the terminal device integrated module/unit can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A pedestrian detection method, characterized by comprising:
acquiring an image to be detected;
carrying out motion detection on the image to be detected to obtain a motion area mask corresponding to the image to be detected;
inputting the image to be detected into a pre-trained convolutional neural network, and carrying out pedestrian detection on the image to be detected through the convolutional neural network according to the motion area mask to obtain a pedestrian target frame of the image to be detected.
2. The pedestrian detection method according to claim 1, wherein the step of inputting the image to be detected into a pre-trained convolutional neural network, and performing pedestrian detection on the image to be detected through the convolutional neural network according to the motion region mask to obtain a pedestrian target frame of the image to be detected further comprises the steps of:
and screening the pedestrian target frame of the image to be detected to obtain a final target frame.
3. The pedestrian detection method according to claim 1 or 2, wherein the inputting the image to be detected into a pre-trained convolutional neural network, and the performing pedestrian detection on the image to be detected through the convolutional neural network according to the motion region mask to obtain a pedestrian target frame of the image to be detected specifically comprises:
inputting the image to be detected into a pre-trained convolutional neural network;
extracting the characteristics of the image to be detected according to the motion area mask through each convolution layer of the convolution neural network to obtain an output characteristic diagram of the image to be detected extracted in each convolution layer;
and predicting the classes and coordinates of the pedestrian targets according to the output characteristic graphs extracted from the convolutional layers through the fully-connected layers of the convolutional neural network to obtain a pedestrian target frame of the image to be detected.
4. The pedestrian detection method according to claim 3, wherein the obtaining of the output feature map of the image to be detected extracted at each convolutional layer by performing feature extraction on the image to be detected according to the motion region mask through each convolutional layer of the convolutional neural network specifically comprises:
determining a region to be convolved of the input image of each convolution layer in the convolutional neural network according to the motion region mask;
performing convolution processing on the input image of each convolution layer according to the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain an output characteristic diagram of each convolution layer; wherein the image to be detected is an input image of a first convolutional layer of the convolutional neural network.
5. The pedestrian detection method according to claim 4, wherein the determining, according to the motion region mask, a region to be convolved of the input image of each convolution layer in the convolutional neural network specifically includes:
scaling the size of the mask of the motion area to be equal to the size of the input image of each convolution layer of the convolutional neural network to obtain a first mask of each convolution layer;
and determining the area to be convolved of the input image of each convolution layer according to the first mask of each convolution layer and the receptive field of each convolution layer.
6. The pedestrian detection method of claim 5, wherein determining the region to be convolved of the input image of each convolution layer according to the first mask of each convolution layer and the receptive field of each convolution layer comprises:
according to the receptive field of each convolution layer, performing expansion processing on the first mask of each convolution layer to obtain a second mask of each convolution layer;
and determining the area to be convolved of the input image of each convolution layer according to the second mask of each convolution layer.
7. The pedestrian detection method according to claim 4, wherein the obtaining of the output feature map of each convolutional layer by performing convolution processing on the input image of each convolutional layer through each convolutional layer of the convolutional neural network according to a region to be convolved of the input image of each convolutional layer specifically comprises:
performing convolution processing on the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain a characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer, and endowing the characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer to a corresponding node of an output characteristic diagram of each convolution layer;
and assigning the unassigned nodes in the output characteristic diagram of each convolutional layer according to the pre-obtained reference characteristic diagram of each convolutional layer to obtain the output characteristic diagram of each convolutional layer.
8. A pedestrian detection device, characterized by comprising:
the image acquisition module to be detected is used for acquiring an image to be detected;
the motion area mask acquisition module is used for carrying out motion detection on the image to be detected to obtain a motion area mask corresponding to the image to be detected;
and the pedestrian detection result acquisition module is used for inputting the image to be detected into a pre-trained convolutional neural network, and carrying out pedestrian detection on the image to be detected through the convolutional neural network according to the motion area mask to obtain a pedestrian target frame of the image to be detected.
9. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the pedestrian detection method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the pedestrian detection method according to any one of claims 1 to 7.
CN202010260160.5A 2020-04-03 2020-04-03 Pedestrian detection method and device, terminal equipment and storage medium Withdrawn CN111582032A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010260160.5A CN111582032A (en) 2020-04-03 2020-04-03 Pedestrian detection method and device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010260160.5A CN111582032A (en) 2020-04-03 2020-04-03 Pedestrian detection method and device, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111582032A true CN111582032A (en) 2020-08-25

Family

ID=72121092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010260160.5A Withdrawn CN111582032A (en) 2020-04-03 2020-04-03 Pedestrian detection method and device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111582032A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418243A (en) * 2020-10-28 2021-02-26 北京迈格威科技有限公司 Feature extraction method and device and electronic equipment
CN112767444A (en) * 2021-01-19 2021-05-07 杭州萤石软件有限公司 Moving object detection method, readable storage medium and electronic device
CN112949486A (en) * 2021-03-01 2021-06-11 八维通科技有限公司 Intelligent traffic data processing method and device based on neural network
CN114943909A (en) * 2021-03-31 2022-08-26 华为技术有限公司 Method, device, equipment and system for identifying motion area
CN115183763A (en) * 2022-09-13 2022-10-14 南京北新智能科技有限公司 Personnel map positioning method based on face recognition and grid method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418243A (en) * 2020-10-28 2021-02-26 北京迈格威科技有限公司 Feature extraction method and device and electronic equipment
CN112767444A (en) * 2021-01-19 2021-05-07 杭州萤石软件有限公司 Moving object detection method, readable storage medium and electronic device
CN112949486A (en) * 2021-03-01 2021-06-11 八维通科技有限公司 Intelligent traffic data processing method and device based on neural network
CN114943909A (en) * 2021-03-31 2022-08-26 华为技术有限公司 Method, device, equipment and system for identifying motion area
CN114943909B (en) * 2021-03-31 2023-04-18 华为技术有限公司 Method, device, equipment and system for identifying motion area
CN115183763A (en) * 2022-09-13 2022-10-14 南京北新智能科技有限公司 Personnel map positioning method based on face recognition and grid method

Similar Documents

Publication Publication Date Title
CN109272509B (en) Target detection method, device and equipment for continuous images and storage medium
CN107545262B (en) Method and device for detecting text in natural scene image
CN111582032A (en) Pedestrian detection method and device, terminal equipment and storage medium
CN110443210B (en) Pedestrian tracking method and device and terminal
CN109478239B (en) Method for detecting object in image and object detection system
CN108154105B (en) Underwater biological detection and identification method and device, server and terminal equipment
CN108460362B (en) System and method for detecting human body part
KR101640998B1 (en) Image processing apparatus and image processing method
Lu A multiscale spatio-temporal background model for motion detection
CN109272016B (en) Target detection method, device, terminal equipment and computer readable storage medium
CN109584266B (en) Target detection method and device
CN108564579B (en) Concrete crack detection method and detection device based on time-space correlation
CN108647587B (en) People counting method, device, terminal and storage medium
CN110659391A (en) Video detection method and device
CN110909712B (en) Moving object detection method and device, electronic equipment and storage medium
CN109005367B (en) High dynamic range image generation method, mobile terminal and storage medium
CN114005058A (en) Dust identification method and device and terminal equipment
CN115239644B (en) Concrete defect identification method, device, computer equipment and storage medium
CN112036381B (en) Visual tracking method, video monitoring method and terminal equipment
CN111783524A (en) Scene change detection method and device, storage medium and terminal equipment
CN111382637A (en) Pedestrian detection tracking method, device, terminal equipment and medium
CN116543261A (en) Model training method for image recognition, image recognition method device and medium
CN114758268A (en) Gesture recognition method and device and intelligent equipment
CN113052019A (en) Target tracking method and device, intelligent equipment and computer storage medium
CN110765875B (en) Method, equipment and device for detecting boundary of traffic target

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200825