CN111582032A

CN111582032A - Pedestrian detection method and device, terminal equipment and storage medium

Info

Publication number: CN111582032A
Application number: CN202010260160.5A
Authority: CN
Inventors: 肖传利
Original assignee: TP Link Technologies Co Ltd
Current assignee: TP Link Technologies Co Ltd
Priority date: 2020-04-03
Filing date: 2020-04-03
Publication date: 2020-08-25

Abstract

The invention discloses a pedestrian detection method, a pedestrian detection device, terminal equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be detected; carrying out motion detection on the image to be detected to obtain a motion area mask corresponding to the image to be detected; inputting the image to be detected into a pre-trained convolutional neural network, and carrying out pedestrian detection on the image to be detected through the convolutional neural network according to the motion area mask to obtain a pedestrian target frame of the image to be detected. By adopting the embodiment of the invention, the pedestrian can be accurately and efficiently identified.

Description

Pedestrian detection method and device, terminal equipment and storage medium

Technical Field

The present invention relates to the field of image recognition technologies, and in particular, to a pedestrian detection method, apparatus, terminal device, and storage medium.

Background

With the development of information technology, video surveillance systems have become ubiquitous in all corners of life. Under the condition that the video monitoring system is widely used, the pedestrian detection technology with a far detection range can detect human bodies in different postures, and can provide effective personnel warning information for users.

At present, feature extraction is generally performed on the whole image by using a convolutional neural network, and then a pedestrian detection result is obtained according to the extracted feature map.

However, in the process of implementing the present invention, the inventor finds that, because the convolutional neural network is relatively complex, feature extraction is performed from the whole image, and particularly for a larger image, the computation amount is large, the time consumption is long, and therefore, the efficiency of pedestrian detection is low.

Disclosure of Invention

The embodiment of the invention provides a pedestrian detection method, a pedestrian detection device, terminal equipment and a storage medium, which can accurately and efficiently identify pedestrians.

In order to achieve the above object, an embodiment of the present invention provides a pedestrian detection method, including:

acquiring an image to be detected;

carrying out motion detection on the image to be detected to obtain a motion area mask corresponding to the image to be detected;

inputting the image to be detected into a pre-trained convolutional neural network, and carrying out pedestrian detection on the image to be detected through the convolutional neural network according to the motion area mask to obtain a pedestrian target frame of the image to be detected.

As an improvement of the above scheme, after the image to be detected is input into a pre-trained convolutional neural network, and the pedestrian detection is performed on the image to be detected through the convolutional neural network according to the motion region mask, so as to obtain a pedestrian target frame of the image to be detected, the method further includes the following steps:

and screening the pedestrian target frame of the image to be detected to obtain a final target frame.

As an improvement of the above scheme, the inputting the image to be detected into a pre-trained convolutional neural network, and performing pedestrian detection on the image to be detected through the convolutional neural network according to the motion region mask to obtain a pedestrian target frame of the image to be detected specifically includes:

inputting the image to be detected into a pre-trained convolutional neural network;

extracting the characteristics of the image to be detected according to the motion area mask through each convolution layer of the convolution neural network to obtain an output characteristic diagram of the image to be detected extracted in each convolution layer;

and predicting the classes and coordinates of the pedestrian targets according to the output characteristic graphs extracted from the convolutional layers through the fully-connected layers of the convolutional neural network to obtain a pedestrian target frame of the image to be detected.

As an improvement of the above scheme, the extracting features of the image to be detected according to the motion region mask by using each convolution layer of the convolutional neural network to obtain an output feature map of the image to be detected extracted at each convolution layer specifically includes:

determining a region to be convolved of the input image of each convolution layer in the convolutional neural network according to the motion region mask;

performing convolution processing on the input image of each convolution layer according to the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain an output characteristic diagram of each convolution layer; wherein the image to be detected is an input image of a first convolutional layer of the convolutional neural network.

As an improvement of the above scheme, the determining, according to the motion region mask, a region to be convolved of the input image of each convolution layer in the convolutional neural network specifically includes:

scaling the size of the mask of the motion area to be equal to the size of the input image of each convolution layer of the convolutional neural network to obtain a first mask of each convolution layer;

and determining the area to be convolved of the input image of each convolution layer according to the first mask of each convolution layer and the receptive field of each convolution layer.

As an improvement of the above scheme, the determining a region to be convolved of the input image of each convolution layer according to the first mask of each convolution layer and the receptive field of each convolution layer specifically includes:

according to the receptive field of each convolution layer, performing expansion processing on the first mask of each convolution layer to obtain a second mask of each convolution layer;

and determining the area to be convolved of the input image of each convolution layer according to the second mask of each convolution layer.

As an improvement of the above scheme, the obtaining an output feature map of each convolutional layer by performing convolution processing on the input image of each convolutional layer through each convolutional layer of the convolutional neural network according to a region to be convolved of the input image of each convolutional layer specifically includes:

performing convolution processing on the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain a characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer, and endowing the characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer to a corresponding node of an output characteristic diagram of each convolution layer;

and assigning the unassigned nodes in the output characteristic diagram of each convolutional layer according to the pre-obtained reference characteristic diagram of each convolutional layer to obtain the output characteristic diagram of each convolutional layer.

Another embodiment of the present invention correspondingly provides a pedestrian detection device, including:

the image acquisition module to be detected is used for acquiring an image to be detected;

the motion area mask acquisition module is used for carrying out motion detection on the image to be detected to obtain a motion area mask corresponding to the image to be detected;

and the pedestrian detection result acquisition module is used for inputting the image to be detected into a pre-trained convolutional neural network, and carrying out pedestrian detection on the image to be detected through the convolutional neural network according to the motion area mask to obtain a pedestrian target frame of the image to be detected.

Another embodiment of the present invention provides a terminal device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor implements the pedestrian detection method according to any one of the above items when executing the computer program.

Another embodiment of the present invention provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the pedestrian detection method according to any one of the above items.

Compared with the prior art, the pedestrian detection method, the device, the terminal device and the storage medium provided by the embodiment of the invention have the advantages that the motion detection is carried out on the obtained image to be detected, the motion area mask corresponding to the image to be detected is obtained, then the image to be detected is input into the pre-trained convolutional neural network, the pedestrian detection is carried out on the image to be detected through the convolutional neural network according to the motion area mask, the pedestrian target frame of the image to be detected is obtained, and therefore the pedestrian detection is realized. In the process of pedestrian detection, the embodiment of the invention can reduce the operation amount in the process of characteristic extraction and reduce the time required by characteristic extraction because the characteristic extraction is carried out on the image to be detected through the convolutional neural network according to the mask of the motion region without carrying out the characteristic extraction on the whole image to be detected, thereby solving the problems of large operation amount and long time consumption caused by the characteristic extraction from the whole image in the prior art, and the mask of the motion region can reflect the information of the motion region in the image to be detected.

Drawings

Fig. 1 is a flowchart illustrating a pedestrian detection method according to an embodiment of the present invention.

Fig. 2 is a flowchart illustrating a pedestrian detection method according to another embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a YOLO network according to an embodiment of the present invention.

Fig. 4(a) is a schematic diagram of an image to be detected after dividing a grid and marking a motion region according to an embodiment of the present invention.

Fig. 4(b) is a schematic diagram of a first mask of an image to be detected according to an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a pedestrian detection device according to an embodiment of the present invention.

Fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of a pedestrian detection method according to an embodiment of the present invention.

The pedestrian detection method provided by the embodiment of the invention comprises the following steps S11 to S13:

and S11, acquiring an image to be detected.

And S12, carrying out motion detection on the image to be detected to obtain a motion area mask corresponding to the image to be detected.

S13, inputting the image to be detected into a pre-trained convolutional neural network, and carrying out pedestrian detection on the image to be detected through the convolutional neural network according to the motion area mask to obtain a pedestrian target frame of the image to be detected.

The pedestrian detection principle of the embodiment of the invention is as follows:

generally, in each image to be detected, besides moving objects (such as people, animal vehicles and the like), a large number of background regions unrelated to the moving objects exist, a region where the moving objects (such as people, animal vehicles and the like) move is called a moving region, a region where the moving objects do not move is called a non-moving region, and pedestrians have moving characteristics, so that a pedestrian target falls on the moving region. In the feature extraction process of the convolutional neural network, the larger the information amount in the image is, the larger the operation amount is, therefore, motion detection is performed on the image to be detected first to obtain a motion region mask corresponding to the image to be detected, when the convolutional neural network performs pedestrian detection, nodes needing to participate in calculation in the input image of each convolutional layer are determined according to the motion region mask corresponding to the image to be detected, that is, only the nodes in the motion region of the input image are calculated while a certain layer of feature map of the convolutional neural network is calculated, and the nodes in the non-motion region not including a pedestrian target are not involved in calculation, so that the operation amount and time consumption can be greatly reduced.

To facilitate understanding of the present embodiment, a specific example of the process of pedestrian detection may be as follows:

firstly, training a convolutional neural network in advance by an existing training method, wherein the convolutional neural network can be a YOLO network, an SSD network and the like; then, when people are detected, obtaining an image to be detected, wherein the image to be detected can be a frame image in a video stream; then, performing motion detection on the image to be detected to obtain a motion region mask corresponding to the image to be detected, for example, extracting a motion region in the image to be detected by using a motion detection method such as an inter-frame difference method to obtain a motion region mask corresponding to the image to be detected, where the motion region mask may be used to control a processing region or a processing process of an image, the motion region mask may be a binary image composed of 0 and 1, a 1-value region is an interest region, and a 0-value region is a shielded region; then, inputting the image to be detected into a pre-trained convolutional neural network, and performing pedestrian detection on the image to be detected through the convolutional neural network according to the moving region mask to obtain a pedestrian target frame of the image to be detected, wherein when the image to be detected is subjected to pedestrian detection through the convolutional neural network, a non-moving region on the image is shielded by using the moving region mask to enable the non-moving region not to participate in calculation, so that feature extraction is performed only on the moving region with pedestrians, the data calculation amount can be reduced while the detection precision is ensured, and the data processing speed and efficiency are improved.

It should be noted that the execution subject of the pedestrian detection method according to the embodiment of the present invention may be a local computer, a server, or even a mobile terminal connected to the monitoring camera system (wirelessly connected or wired), and may specifically be a processor of these devices.

As can be seen from the above analysis, in the process of detecting pedestrians, the method for detecting pedestrians provided in the embodiments of the present invention can reduce the computation amount in the process of extracting features of the image to be detected through the convolutional neural network according to the mask of the motion region, and does not need to extract features of the entire image to be detected, thereby reducing the time required for extracting features, and solving the problems of large computation amount and long time consumption caused by extracting features from the entire image in the prior art.

As an alternative embodiment, referring to fig. 2, after the step S13, the pedestrian detection method further includes the steps of:

and S14, screening the pedestrian target frame of the image to be detected to obtain a final target frame.

The pedestrian target frames can be screened by post-processing methods such as non-maximum suppression, repeated pedestrian target frames are filtered and removed, a final target frame is obtained, and the frame position of the pedestrian target is more accurate.

As one optional embodiment, the step S13 specifically includes:

s131, inputting the image to be detected into a pre-trained convolutional neural network.

S132, extracting the characteristics of the image to be detected according to the motion area mask through each convolution layer of the convolution neural network to obtain an output characteristic diagram of the image to be detected extracted in each convolution layer.

And S133, predicting the classes and coordinates of the pedestrian targets according to the output feature maps extracted from the convolutional layers through the full-connection layers of the convolutional neural network to obtain a pedestrian target frame of the image to be detected.

The method comprises the steps of inputting an image to be detected into a pre-trained convolutional neural network, shielding a non-motion area on the image to be detected by using a motion area mask when extracting the features of the image to be detected through each convolutional layer of the convolutional neural network, so that the non-motion area does not participate in calculation, extracting the features of the motion area with pedestrians only, reducing data processing amount, improving data processing speed, and after obtaining output feature maps extracted by each convolutional layer, predicting the category and the coordinate of the pedestrian target according to the output feature maps extracted by each convolutional layer through a full connection layer of the convolutional neural network, so that the pedestrian target frame of the image to be detected is obtained.

Further, the step S132 specifically includes:

s1321, determining a region to be convolved of the input image of each convolution layer in the convolution neural network according to the motion region mask;

s1322, performing convolution processing on the input image of each convolution layer according to the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain an output characteristic diagram of each convolution layer; wherein the image to be detected is an input image of a first convolutional layer of the convolutional neural network.

Optionally, step S1321 specifically includes:

s13211, scaling the size of the motion region mask to be equal to the size of the input image of each convolution layer of the convolutional neural network to obtain a first mask of each convolution layer;

s13212, determining the area to be convolved of the input image of each convolution layer according to the first mask of each convolution layer and the receptive field of each convolution layer.

The motion region mask can be scaled to the same size as the input image of each convolution layer of the convolutional neural network, so that a first mask of each convolution layer is obtained, and in consideration of actual discrimination, the background in each layer of the experience field of the pedestrian target also participates in calculation, so that the region to be convolved of the input image of each convolution layer is determined according to the first mask of each convolution layer and the experience field of each convolution layer, and the accuracy of feature extraction is guaranteed.

Specifically, the step S13212 specifically includes:

s132121, according to the receptive field of each convolution layer, performing expansion processing on the first mask of each convolution layer to obtain a second mask of each convolution layer;

s132122, determining a region to be convolved of the input image of each convolution layer according to the second mask of each convolution layer.

The second mask CalMask _ n of each convolutional layer is a partition (MotionMask _ n, FieldMask _ n), where the partition is an expansion operation, the MotionMask _ n is the first mask of each convolutional layer, and the FieldMask _ n is a field region of each convolutional layer. It is to be noted that the expansion of structure A by structure B is defined as

It can be understood that, when the convolution operation is performed on the structure B on the structure a, if there is an overlapping area with the structure a during the movement of the structure B, the position is recorded, and the set of all the positions where the movement of the structure B intersects with the structure a is the expansion result of the structure a under the action of the structure B.

It can be understood that the region of the second mask of each convolution layer with a pixel value of 1 is the region to be convolved of the input image of each convolution layer.

As an example, it can be seen from the YOLO network structure shown in fig. 3 that the feature map size extracted from the color map of 448 × 448 size to the left of the dotted line is 7 × 7, as shown in fig. 4(a), the image to be detected of 448 × 448 is divided into 7 × 7 grids, each grid corresponds to one dot in the 7 × 7 size feature map, the region with lower gray level in fig. 4(a) can be obtained as a motion region according to the motion detection, and if there is a motion region in the grid, the value of the dot corresponding to the MotionMask _ n is 1, thereby obtaining the MotionMask _ n corresponding to the convolution layer as shown in fig. 4 (b).

Optionally, the step S1322 specifically includes:

s13221, performing convolution processing on the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain a characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer, and endowing the characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer to a corresponding node of an output characteristic diagram of each convolution layer;

s13222, assigning the nodes which are not assigned in the output feature map of each convolutional layer according to the reference feature map of each convolutional layer obtained in advance to obtain the output feature map of each convolutional layer.

When the feature extraction is performed through each convolution layer of the convolutional neural network, only the feature value of the region to be convolved of the input image is calculated, and for the value of the region which does not participate in the calculation in the input image on the output feature map, the value of the reference feature map of each convolution layer obtained in advance can be used for assignment, so that the accuracy of the feature extraction is ensured while the calculation amount is reduced and the calculation speed is improved. It should be noted that the reference feature map may be obtained by inputting the original map image of the unrelated scene into the trained convolutional neural network in advance for calculation, and the original map image of the unrelated scene may be an image with all pixel values of 0.

The embodiment of the invention also provides a pedestrian detection device which can implement all the processes of the pedestrian detection method.

The pedestrian detection device provided by the embodiment of the invention comprises:

the image acquisition module 21 is used for acquiring an image to be detected;

a moving area mask obtaining module 22, configured to perform motion detection on the image to be detected, so as to obtain a moving area mask corresponding to the image to be detected;

and the pedestrian detection result acquisition module 23 is configured to input the image to be detected into a pre-trained convolutional neural network, and perform pedestrian detection on the image to be detected through the convolutional neural network according to the motion region mask to obtain a pedestrian target frame of the image to be detected.

As one of the optional embodiments, the pedestrian detection apparatus further includes:

and the target frame screening module is used for screening the pedestrian target frame of the image to be detected to obtain a final target frame.

As one optional embodiment, the pedestrian detection result obtaining module 23 specifically includes:

the image input submodule is used for inputting the image to be detected into a pre-trained convolutional neural network;

the characteristic extraction submodule is used for extracting the characteristics of the image to be detected according to the motion area mask through each convolution layer of the convolutional neural network to obtain an output characteristic diagram of the image to be detected extracted in each convolution layer;

and the target frame prediction submodule is used for predicting the category and the coordinate of the pedestrian target according to the output characteristic diagram extracted from each convolution layer through the full connection layer of the convolution neural network so as to obtain the pedestrian target frame of the image to be detected.

As one optional embodiment, the feature extraction sub-module specifically includes:

a convolution region obtaining unit, configured to determine a convolution region of the input image of each convolution layer in the convolutional neural network according to the motion region mask;

the output characteristic diagram acquisition unit is used for carrying out convolution processing on the input image of each convolution layer according to the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain an output characteristic diagram of each convolution layer; wherein the image to be detected is an input image of a first convolutional layer of the convolutional neural network.

Further, the to-be-convolved region acquiring unit specifically includes:

a first mask obtaining subunit, configured to scale the size of the mask in the motion region to be equal to the size of the input image of each convolutional layer of the convolutional neural network, so as to obtain a first mask of each convolutional layer;

and the area to be convolved determining subunit is used for determining the area to be convolved of the input image of each convolution layer according to the first mask of each convolution layer and the receptive field of each convolution layer.

Specifically, the to-be-convolved region determining subunit is specifically configured to:

As an optional embodiment, the output feature map obtaining unit specifically includes:

the characteristic value operator unit is used for performing convolution processing on the to-be-convolved area of the input image of each convolution layer through each convolution layer of the convolutional neural network to obtain a characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer, and endowing the characteristic value corresponding to the to-be-convolved area of the input image of each convolution layer to a corresponding node of an output characteristic diagram of each convolution layer;

and the assignment subunit is used for assigning the nodes which are not assigned in the output characteristic diagram of each convolutional layer according to the reference characteristic diagram of each convolutional layer obtained in advance to obtain the output characteristic diagram of each convolutional layer.

The principle of the pedestrian detection device for realizing pedestrian detection is the same as that of the embodiment of the method, and the details are not repeated herein.

According to the pedestrian detection device provided by the embodiment of the invention, in the process of pedestrian detection, the feature extraction is carried out on the image to be detected according to the motion region mask through the convolutional neural network, and the feature extraction is not required to be carried out on the whole image to be detected, so that the calculation amount in the feature extraction process can be reduced, the time required by the feature extraction is reduced, the problems of large calculation amount and long consumed time caused by the feature extraction from the whole image in the prior art are solved, in addition, the motion region mask can reflect the motion region information in the image to be detected, the pedestrian detection is carried out by combining the motion region mask and the convolutional neural network, the accuracy of pedestrian detection can be ensured, and the pedestrian can be accurately and efficiently identified.

The terminal device provided by the embodiment of the present invention includes a processor 31, a memory 32, and a computer program stored in the memory 32 and configured to be executed by the processor 31, and when the processor 31 executes the computer program, the pedestrian detection method according to any one of the above embodiments is implemented.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the pedestrian detection method according to any one of the above embodiments.

The processor 31, when executing the computer program, implements the steps of the above-described embodiment of the pedestrian detection method, such as all the steps of the pedestrian detection method shown in fig. 1. Alternatively, the processor 31, when executing the computer program, implements the functions of the modules/units in the above-described embodiment of the pedestrian detection apparatus, such as the functions of the modules of the pedestrian detection apparatus shown in fig. 5.

Illustratively, the computer program may be divided into one or more modules, which are stored in the memory 32 and executed by the processor 31 to accomplish the present invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device. For example, the computer program may be divided into an image acquisition module to be detected, a motion region mask acquisition module, and a pedestrian detection result acquisition module, where the specific functions of each module are as follows: the image acquisition module to be detected is used for acquiring an image to be detected; the motion area mask acquisition module is used for carrying out motion detection on the image to be detected to obtain a motion area mask corresponding to the image to be detected; and the pedestrian detection result acquisition module is used for inputting the image to be detected into a pre-trained convolutional neural network, and carrying out pedestrian detection on the image to be detected through the convolutional neural network according to the motion area mask to obtain a pedestrian target frame of the image to be detected.

The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor 31, a memory 32. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a terminal device and does not constitute a limitation of a terminal device, and may include more or less components than those shown, or combine certain components, or different components, for example, the terminal device may also include input output devices, network access devices, buses, etc.

The Processor 31 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 31 is a control center of the terminal device and connects various parts of the whole terminal device by using various interfaces and lines.

The memory 32 can be used for storing the computer programs and/or modules, and the processor 31 can implement various functions of the terminal device by running or executing the computer programs and/or modules stored in the memory 32 and calling the data stored in the memory 32. The memory 32 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the terminal device, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the terminal device integrated module/unit can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A pedestrian detection method, characterized by comprising:

acquiring an image to be detected;

2. The pedestrian detection method according to claim 1, wherein the step of inputting the image to be detected into a pre-trained convolutional neural network, and performing pedestrian detection on the image to be detected through the convolutional neural network according to the motion region mask to obtain a pedestrian target frame of the image to be detected further comprises the steps of:

3. The pedestrian detection method according to claim 1 or 2, wherein the inputting the image to be detected into a pre-trained convolutional neural network, and the performing pedestrian detection on the image to be detected through the convolutional neural network according to the motion region mask to obtain a pedestrian target frame of the image to be detected specifically comprises:

4. The pedestrian detection method according to claim 3, wherein the obtaining of the output feature map of the image to be detected extracted at each convolutional layer by performing feature extraction on the image to be detected according to the motion region mask through each convolutional layer of the convolutional neural network specifically comprises:

5. The pedestrian detection method according to claim 4, wherein the determining, according to the motion region mask, a region to be convolved of the input image of each convolution layer in the convolutional neural network specifically includes:

6. The pedestrian detection method of claim 5, wherein determining the region to be convolved of the input image of each convolution layer according to the first mask of each convolution layer and the receptive field of each convolution layer comprises:

7. The pedestrian detection method according to claim 4, wherein the obtaining of the output feature map of each convolutional layer by performing convolution processing on the input image of each convolutional layer through each convolutional layer of the convolutional neural network according to a region to be convolved of the input image of each convolutional layer specifically comprises:

8. A pedestrian detection device, characterized by comprising:

9. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the pedestrian detection method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the pedestrian detection method according to any one of claims 1 to 7.