CN110688978A

CN110688978A - Pedestrian detection method, device, system and equipment

Info

Publication number: CN110688978A
Application number: CN201910959000.7A
Authority: CN
Inventors: 罗径庭
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2020-01-14

Abstract

The application discloses a pedestrian detection method, a device, a system and equipment, wherein the method comprises the following steps: establishing a pyramid-depth residual error network model; inputting a pedestrian image to be detected into the pyramid-depth residual error network model, and outputting a pedestrian detection result; the pyramid-depth residual error network model is a multi-scale pedestrian detection network model obtained by adding a convolution layer on the basis of a depth residual error network, constructing the pyramid network by up-sampling the convolution layer, and fusing the output of a residual error block of the depth residual error network with the output of the pyramid network, and solves the technical problem of low pedestrian detection accuracy when a human body is subjected to large-scale change in the conventional pedestrian detection method.

Description

Pedestrian detection method, device, system and equipment

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to a pedestrian detection method, apparatus, system, and device.

Background

Pedestrian detection is an important component in computer vision, and can be widely applied to the fields of automatic driving, video monitoring and the like. With the publication of a large number of available large pedestrian detection data sets, pedestrian detection algorithms are significantly improved, however, in practical applications, it is often necessary to face pedestrian detection in highly crowded and cluttered scenes, and in such scenes, the dimensions and the form of a human body can change in a complex manner, and compared with pedestrian detection in a single scene, the difficulty is higher. The existing pedestrian detection method has low pedestrian detection accuracy when the human body is subjected to large-scale change.

Disclosure of Invention

The application provides a pedestrian detection method, a device, a system and equipment, which are used for solving the technical problem of low pedestrian detection accuracy rate of the existing pedestrian detection method when a human body is subjected to large-scale change.

In view of the above, a first aspect of the present application provides a pedestrian detection method, including:

establishing a pyramid-depth residual error network model;

inputting a pedestrian image to be detected into the pyramid-depth residual error network model, and outputting a pedestrian detection result;

the pyramid-depth residual error network model is a multi-scale pedestrian detection network model obtained by adding a convolution layer on the basis of a depth residual error network, constructing a pyramid network by up-sampling the convolution layer, and fusing the output of a residual error block of the depth residual error network with the output of the pyramid network.

Optionally, the establishing a pyramid-depth residual error network model includes:

acquiring a pedestrian detection image to be trained;

inputting the pedestrian detection image to be trained into a pyramid-depth residual error network model, and training the pyramid-depth residual error network model;

de-duplicating the detection frame extracted by the pyramid-depth residual error network model;

and when the iteration number of the training reaches a threshold value, finishing the training to obtain a trained pyramid-depth residual error network model.

Optionally, the de-duplicating the detection frame extracted by the pyramid-depth residual error network model includes:

and de-repeating the detection frame extracted by the pyramid-depth residual error network model based on non-maximum suppression.

Optionally, after obtaining the to-be-trained pedestrian detection image, inputting the to-be-trained pedestrian detection image into the pyramid-depth residual error network model, and before training the pyramid-depth residual error network model, the method further includes:

and preprocessing the pedestrian detection image to be trained.

Optionally, the pyramid network includes 6 convolutional layers for extracting feature maps with different resolutions.

The present application provides in a second aspect a pedestrian detection apparatus comprising: the pedestrian detection system comprises a model building module and a pedestrian detection module;

the model establishing module is used for establishing a pyramid-depth residual error network model;

the pyramid-depth residual error network model is a multi-scale pedestrian detection network model obtained by adding a convolution layer on the basis of a depth residual error network, constructing a pyramid network by up-sampling the convolution layer, and fusing the output of a residual error block of the depth residual error network with the output of the pyramid network;

the pedestrian detection module is used for inputting the image of the pedestrian to be detected into the pyramid-depth residual error network model and outputting a pedestrian detection result.

Optionally, the model building module is specifically configured to:

acquiring a pedestrian detection image to be trained;

A third aspect of the present application provides a pedestrian detection system comprising: the pedestrian detection device comprises a case, an image collector and the pedestrian detection device of any one of the second aspect.

The image collector and the pedestrian detection device are arranged on the case;

the image collector is used for shooting a pedestrian image and sending the pedestrian image to the pedestrian detection device, so that the pedestrian detection device executes the pedestrian detection method in any one of the first aspect.

Optionally, the method further includes: the LCD, the memory and the image processor;

the liquid crystal display screen is used for displaying a pedestrian detection result;

the memory is used for storing the pedestrian image shot by the image collector or the pedestrian detection result;

the image processor is used for controlling the liquid crystal display screen to display the pedestrian detection result.

A fourth aspect of the present application provides a pedestrian detection apparatus, the apparatus comprising a processor and a memory;

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to execute the pedestrian detection method of any one of the first aspect according to instructions in the program code.

According to the technical scheme, the method has the following advantages:

the application provides a pedestrian detection method, which comprises the following steps: establishing a pyramid-depth residual error network model; inputting a pedestrian image to be detected into the pyramid-depth residual error network model, and outputting a pedestrian detection result; the pyramid-depth residual error network model is a multi-scale pedestrian detection network model obtained by adding a convolution layer on the basis of a depth residual error network, constructing a pyramid network by up-sampling the convolution layer, and fusing the output of a residual error block of the depth residual error network with the output of the pyramid network. According to the pedestrian detection method, the built pyramid-depth residual error network model is that the convolutional layer is added on the basis of the depth residual error network, the pyramid network is constructed by up-sampling the convolutional layer, and a plurality of scale features of the image of the pedestrian to be detected are extracted through the pyramid network, so that the technical problem that the pedestrian detection accuracy is low when the human body is subjected to large-scale change in the existing pedestrian detection method is solved. Meanwhile, as the pyramid network extracts deep features, the output of the pyramid network and the output of the residual block of the depth residual error network are fused, so that the deep features and the low-level features are fused, the richness of the features is improved, and the accuracy of pedestrian detection is improved.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating one embodiment of a pedestrian detection method provided herein;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a pedestrian detection method provided herein;

FIG. 3 is a schematic structural diagram of an embodiment of a pedestrian detection device provided by the present application;

FIG. 4 is a schematic block diagram illustrating one embodiment of a pedestrian detection system according to the present application;

wherein the reference numerals are:

1. a chassis; 2. a memory; 3. an image collector; 4. a central processing unit; 5. a liquid crystal display screen; 6. a graphics processor.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

For ease of understanding, referring to fig. 1, an embodiment of a pedestrian detection method provided by the present application includes:

step 101, establishing a pyramid-depth residual error network model.

In practical application, pedestrian detection is often required in highly crowded and disordered scenes, the scale and the form of a human body in such scenes can be changed in a complex manner, the difficulty is higher compared with the pedestrian detection in a single scene, and the existing pedestrian detection method has the technical problem of low pedestrian detection accuracy when the human body is changed in a large scale. Therefore, in the embodiment of the application, a pyramid-depth residual error network model is constructed, a convolutional layer is added on the basis of a depth residual error network, the pyramid network is constructed by up-sampling the convolutional layer, and the output of a residual error block of the depth residual error network is fused with the output of the pyramid network to obtain the multi-scale pedestrian detection network model. A plurality of scale features of the pedestrian image to be detected are extracted through the constructed pyramid-depth residual error network model, so that the technical problem that the pedestrian detection accuracy is low when the human body is subjected to large-scale change in the existing pedestrian detection method is solved.

The depth residual error network is composed of a series of residual error blocks, the depth residual error network adopts a crossing connection mode, so that the output of a certain convolution layer can directly cross several convolution layers to be used as the input of a certain subsequent convolution layer, multiple layers of networks can be overlapped, certain calculation amount is reduced while the network is deepened, the degradation problem of the convolutional neural network is solved, and the detection accuracy is improved.

And 102, inputting the image of the pedestrian to be detected into the pyramid-depth residual error network model, and outputting a pedestrian detection result.

It should be noted that the video frame may be captured in the surveillance video as the image of the pedestrian to be detected, the image taken by the camera may also be used as the image of the pedestrian to be detected, the image of the pedestrian to be detected may also be screened, and if the image does not contain the pedestrian, the image is removed.

The image of the pedestrian to be detected is input into the pyramid-depth residual error network model, the position of the detection frame can be output, the position of the detection frame is the position of the detected pedestrian, the detection result can be displayed in the image of the pedestrian to be detected, the detected pedestrian is distributed with the detection frame, and the sizes of the detection frames of the pedestrians with different scales can be different.

The embodiment of the application provides a pedestrian detection method, which comprises the following steps: establishing a pyramid-depth residual error network model; inputting a pedestrian image to be detected into the pyramid-depth residual error network model, and outputting a pedestrian detection result; the pyramid-depth residual error network model is a multi-scale pedestrian detection network model obtained by adding a convolution layer on the basis of a depth residual error network, constructing a pyramid network by up-sampling the convolution layer, and fusing the output of a residual error block of the depth residual error network with the output of the pyramid network. According to the pedestrian detection method, the built pyramid-depth residual error network model is that the convolutional layer is added on the basis of the depth residual error network, the pyramid network is constructed by up-sampling the convolutional layer, and a plurality of scale features of the image of the pedestrian to be detected are extracted through the pyramid network, so that the technical problem that the pedestrian detection accuracy is low when the human body is subjected to large-scale change in the existing pedestrian detection method is solved. Meanwhile, as the pyramid network extracts deep features, the output of the pyramid network and the output of the residual block of the depth residual error network are fused, so that the deep features and the low-level features are fused, the richness of the features is improved, and the accuracy of pedestrian detection is improved.

For easy understanding, referring to fig. 2, another embodiment of a pedestrian detection method provided by the present application includes:

step 201, acquiring a pedestrian detection image to be trained.

It should be noted that the to-be-trained pedestrian detection image with the marked pedestrian position can be obtained from the public pedestrian detection database, a large number of video frames can be intercepted from the video monitoring of the intersection as the to-be-trained pedestrian detection image, and the pedestrian position in the video frame is marked, so that the pyramid-depth residual error network model can be conveniently trained.

In order to fully train the pyramid-depth residual error network model and improve the accuracy rate of pedestrian detection, a data enhancement method can be adopted to perform quantity expansion on the obtained pedestrian detection images to be trained. For example, appropriate noise can be added to the obtained to-be-trained pedestrian detection images to expand the number of to-be-trained pedestrian detection images, and the robustness of the pyramid-depth residual error network model can be improved to a certain extent, wherein the noise can be salt and pepper noise or gaussian noise.

Step 202, preprocessing the pedestrian detection image to be trained.

It should be noted that normalization processing may be performed on the to-be-trained pedestrian detection images, so that the to-be-trained pedestrian detection images are uniform in size, and thus, the pyramid-depth residual error network model can be trained conveniently.

And 203, inputting the pedestrian detection image to be trained into the pyramid-depth residual error network model, and training the pyramid-depth residual error network model.

The pyramid network comprises 6 convolutional layers for extracting feature maps with different resolutions, the resolution size of the feature maps extracted by the 6 convolutional layers can be 2 × 2, 4 × 4, 8 × 8, 16 × 16, 32 × 32 and 64 × 64 in sequence, the feature map with the size of 4 × 4 is obtained by up-sampling the feature map with the size of 2 × 2 in 2-fold step size, the feature map with the size of 4 × 4 is obtained by up-sampling the feature map with the size of 8 × 8 in the same way, the feature maps with the sizes of 16 × 16, 32 × 32 and 64 × 64 are respectively obtained, a residual block in the pyramid-depth residual network model is also composed of 6 different convolutional layers, the resolution size of the feature maps extracted by the 6 convolutional layers can be 64 × 64, 32 × 32, 16 × 16, 8 × 8, 4 × 4 and 2 × 2 in sequence, the feature map with the size of 64 × 64 is down-sampled in 2-fold step size, obtaining feature maps of 32 × 32 size, obtaining feature maps of 16 × 16, 8 × 8, 4 × 4 and 2 × 2 size, respectively, and fusing the feature maps output by 6 convolutional layers in the residual block with the feature maps output by 6 convolutional layers in the pyramid by the same method, where the feature maps output by 6 convolutional layers in the residual block and the feature maps output by 6 convolutional layers in the pyramid are cascaded, and the feature maps output by the residual block and the feature maps corresponding to the same size in the pyramid are cascaded. For example, a feature map of 64 × 64 size output from the residual block is cascaded with a feature map of 64 × 64 size output from the pyramid network, a feature map of 32 × 32 size output from the residual block is cascaded with a feature map of 32 × 32 size output from the pyramid network, a feature map of 4 × 4 size output from the residual block is cascaded with a feature map of 4 × 4 size output from the pyramid network, and so on, and finally 6 branches are formed, and a multi-scale pedestrian detection network model is obtained. The pyramid-depth residual error network model can repeatedly stack a plurality of residual error blocks, finally, the output of the residual error blocks and the output of the pyramid network are fused, and a plurality of scale features of the image of the pedestrian to be detected are extracted by constructing the pyramid network, so that the technical problem that the pedestrian detection accuracy is low when the human body is subjected to large-scale change in the existing pedestrian detection method is solved. Meanwhile, the output of the pyramid network and the output of the residual block are fused, so that the feature richness is improved, and the accuracy of pedestrian detection is improved.

And step 204, de-repeating the detection frame extracted by the pyramid-depth residual error network model.

It should be noted that, when the to-be-trained pedestrian detection image is adopted to train the pyramid-depth residual error network model, the pyramid-depth residual error network model detects the position of a pedestrian according to the extracted semantic information, and allocates detection frames to all possible pedestrian targets in the to-be-trained pedestrian detection image, and possibly allocates a plurality of different detection frames to the same pedestrian target, so that there will be repeated detection frames, and if the detection accuracy is calculated for each detection frame, the calculated amount of the model can be greatly increased, therefore, in the embodiment of the application, the extracted detection frames are deduplicated, and the detection frames extracted for 6 branches are deduplicated by non-maximum suppression, and the specific steps include:

and arranging the confidence scores of the detection boxes extracted by the pyramid-depth residual error network model from high to low, and selecting the detection box with the highest confidence score as a suggestion box.

And calculating the area overlapping ratio of each detection frame except the suggestion frame to the suggestion frame, namely calculating the area ratio of the intersection position of the detection frame and the suggestion frame to the union position.

Removing the detection frames corresponding to the area overlapping ratio exceeding the preset area overlapping ratio threshold, repeating the steps until the area overlapping ratio is calculated among all the detection frames, and screening all the detection frames by comparing the area overlapping ratio with the preset area overlapping ratio threshold, thereby removing the repeated detection frames, wherein the preset area overlapping ratio threshold can be set according to the actual situation, and the preset area overlapping ratio threshold can be 0.5 or 0.65.

And step 205, when the iteration number of the training reaches a threshold value, finishing the training to obtain a trained pyramid-depth residual error network model.

It should be noted that the pyramid-depth residual error network model is trained through the pedestrian detection image to be trained, when the number of iterations of the training reaches a threshold value, the training can be stopped, the trained pyramid-depth residual error network model is obtained, and the threshold value can be preset according to the depth of the model and the number of the pedestrian detection images to be trained.

And step 206, inputting the pedestrian image to be detected into the pyramid-depth residual error network model, and outputting a detection result.

It should be noted that the video frame may be captured in the surveillance video as the image of the pedestrian to be detected, the image captured by the camera may also be used as the image of the pedestrian to be detected, the image of the pedestrian to be detected may also be screened, and if the image does not contain the pedestrian, the image is removed.

Inputting the image of the pedestrian to be detected into the trained pyramid-depth residual error network model, outputting the position of the detection frame, wherein the position of the detection frame is the position of the detected pedestrian, displaying the detection result in the image of the pedestrian to be detected, distributing the detection frame to the detected pedestrian, and enabling the detection frames of the pedestrians with different scales to be different in size.

For easy understanding, referring to fig. 3, the present application provides an embodiment of a pedestrian detection apparatus, including:

a model building module 301 and a pedestrian detection module 302.

A model establishing module 301, configured to establish a pyramid-depth residual error network model;

The pedestrian detection module 302 is configured to input a pedestrian image to be detected into the pyramid-depth residual error network model, and output a pedestrian detection result.

Further, the model building module 301 is specifically configured to:

acquiring a pedestrian detection image to be trained;

inputting a pedestrian detection image to be trained into the pyramid-depth residual error network model, and training the pyramid-depth residual error network model;

For ease of understanding, referring to fig. 4, the present application provides an embodiment of a pedestrian detection system, comprising:

the case 1, the image collector 3 and the pedestrian detection device in the embodiment of the pedestrian detection device.

The image collector 3 and the pedestrian detection device are arranged on the case 1;

the image collector 3 is configured to capture a pedestrian image and send the pedestrian image to the pedestrian detection device, so that the pedestrian detection device executes the pedestrian detection method in the embodiment of the pedestrian detection method.

It should be noted that the material outside the chassis 1 may be an aluminum alloy material, so as to facilitate heat dissipation; the density of the aluminum alloy is small, and the aluminum alloy has relatively light weight under the condition of the same volume, so that the aluminum alloy is convenient to move and use; the hardness of the aluminum alloy is higher than that of other materials, so that the anti-collision and anti-falling capacity of the outer side of the case 1 is improved.

Image collector 3 can be the industry camera, and the industry camera has high image stability, high transmission ability and high interference killing feature, can install 1 industry camera respectively in quick-witted case 1 dead ahead and left and right sides, and the pedestrian image transmission that the industry camera will shoot sends pedestrian detection device to, and pedestrian detection device handles the pedestrian image that receives, obtains pedestrian detection result.

Further, the method also comprises the following steps: a liquid crystal display screen 5, a memory 2 and an image processor 6;

the liquid crystal display screen 5 is used for displaying a pedestrian detection result;

the memory 2 is used for storing pedestrian images or pedestrian detection results shot by the image collector 3;

the image processor 6 is used for controlling the liquid crystal display screen 5 to display the pedestrian detection result.

It should be noted that the number of the memories 2 may be 2 or more than 2, one of the memories 2 may be used to store the pedestrian image captured by the image capture device 3, the other memory 2 may be used to store the pedestrian detection result, or one of the memories 2 may be used to store the pedestrian image or the pedestrian detection result captured by the image capture device 3, and the pedestrian detection system further includes a central processing unit 4, which is an operation and control core of the pedestrian detection system.

The application also provides pedestrian detection equipment, which comprises a processor and a memory;

the memory is used for storing the program codes and transmitting the program codes to the processor;

the processor is configured to execute the pedestrian detection method in the embodiment of the pedestrian detection method described above according to instructions in the program code.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for executing all or part of the steps of the method described in the embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device). And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A pedestrian detection method, characterized by comprising:

establishing a pyramid-depth residual error network model;

2. The pedestrian detection method of claim 1, wherein the establishing a pyramid-depth residual network model comprises:

acquiring a pedestrian detection image to be trained;

3. The pedestrian detection method of claim 2, wherein the de-repeating of the detection box extracted by the pyramid-depth residual error network model comprises:

4. The pedestrian detection method according to claim 2, wherein after the obtaining of the to-be-trained pedestrian detection image, the to-be-trained pedestrian detection image is input to a pyramid-depth residual error network model, and before the training of the pyramid-depth residual error network model, the method further comprises:

and preprocessing the pedestrian detection image to be trained.

5. The pedestrian detection method of claim 1, wherein the pyramid network includes 6 convolutional layers that extract different resolution feature maps.

6. A pedestrian detection device, characterized by comprising: the pedestrian detection system comprises a model building module and a pedestrian detection module;

7. The pedestrian detection apparatus of claim 6, wherein the model building module is specifically configured to:

acquiring a pedestrian detection image to be trained;

8. A pedestrian detection system, comprising: a case, an image collector and a pedestrian detection device according to any one of claims 6 to 7.

the image collector is used for shooting a pedestrian image and sending the pedestrian image to the pedestrian detection device, so that the pedestrian detection device executes the pedestrian detection method according to any one of claims 1 to 5.

9. The pedestrian detection system of claim 8, further comprising: the LCD, the memory and the image processor;

10. A pedestrian detection apparatus, characterized in that the apparatus comprises a processor and a memory;

the processor is configured to execute the pedestrian detection method of any one of claims 1-5 according to instructions in the program code.