CN110688883A

CN110688883A - Vehicle and pedestrian detection method and device

Info

Publication number: CN110688883A
Application number: CN201910033533.2A
Authority: CN
Inventors: 陈睿敏; 陈志超; 毛河; 石永禄
Original assignee: Chengdu Tongjia Youbo Technology Co Ltd
Current assignee: Chengdu Tongjia Youbo Technology Co Ltd
Priority date: 2019-01-14
Filing date: 2019-01-14
Publication date: 2020-01-14

Abstract

According to the vehicle and pedestrian detection method and device, the characteristic information of the image to be recognized is extracted on multiple scales through the characteristic pyramid network to obtain a first feature to be recognized; further, the first feature to be identified is further processed through an attention layer to obtain a second feature to be identified; and further, processing the second feature to be recognized through a full connection layer to recognize the vehicle and the pedestrian in the image to be recognized, and marking the vehicle and the pedestrian. According to the technical scheme, missing detection of remote small targets in the image to be recognized is effectively reduced through the characteristic pyramid network, and the influence of a sheltered or overlapped area on the recognition effect in the image to be recognized is reduced through the attention layer. And through weighting processing on the training samples, the influence of class imbalance on the detection precision is reduced.

Description

Vehicle and pedestrian detection method and device

Technical Field

The application relates to the field of image recognition, in particular to a method and a device for detecting vehicles and pedestrians.

Background

At present, the detection of vehicles and pedestrians on the highway is carried out on the basis of images of a monitoring camera, the resolution of the camera is low, the color gamut is narrow, and long-distance vehicles are very fuzzy and have limited characteristics. The existing common method has the following defects: (1) the effective detection distance is short, and the highway is not provided with a lot of shelters, so that the field of view is far, and the detection of a long-distance small target has high requirements. (2) The effective detection distance is increased, the algorithm speed is greatly improved, and the real-time performance is difficult to meet. (3) Under the conditions of limited computing resources and high requirements, high false detection rate and high missing detection rate are difficult to avoid. (4) Human detection tends to adversely affect vehicle detection.

Disclosure of Invention

In order to overcome the above-mentioned deficiencies in the prior art, an object of the present application is to provide a vehicle and pedestrian detection method, which is applied to an image processing device, wherein the image processing device is preset with a neural network model, and the neural network model includes a feature pyramid network, an attention layer and a full connection layer, and the method includes:

acquiring an image to be identified;

learning the image to be identified through the feature pyramid network to obtain a first feature to be identified;

learning the first feature to be recognized through the attention layer to obtain a second feature to be recognized, wherein the attention layer is used for paying attention to a specific area of an image to be recognized;

and learning the second feature to be recognized through the full connection layer to obtain the classification probability of the target in the image to be recognized and the position information of the target.

Optionally, the method further comprises:

obtaining a plurality of boxes according to the classification probability and the position information of the target, wherein the boxes are used for marking the target in the image and comprise corresponding confidence scores, and the confidence scores represent the classification probability of the target;

for a square frame which is larger than a first preset IOU (input over Union) threshold value in the square frame, adjusting the confidence score of the square frame which is larger than the first preset IOU threshold value according to the confidence score of the square frame which is larger than the first preset IOU threshold value, wherein the adjusted confidence score is larger than zero;

selecting a plurality of candidate frames of the target from the plurality of frames according to a second preset IOU threshold value, and obtaining a weight value corresponding to the candidate frames according to the confidence scores of the candidate frames;

and obtaining the target candidate box of the target through a weighted average algorithm according to the confidence scores of the candidate boxes and the weight values corresponding to the candidate boxes.

Optionally, the feature pyramid network includes at least one layer of initial convolutional layers and a feature pyramid, the feature pyramid includes at least one first feature pyramid and at least one second feature pyramid, the first feature pyramid includes a plurality of convolutional layers with different levels, and the second feature pyramid includes a plurality of deconvolution layers corresponding to the plurality of convolutional layers with different levels; a first feature pyramid of the feature pyramid network is a first feature pyramid, an output of the first feature pyramid is used as an input of the second feature pyramid, and an output of the second feature pyramid is used as an input of the first feature pyramid; the step of learning the image to be recognized through the feature pyramid network to obtain a first feature to be recognized comprises:

learning the image to be identified through the initial convolutional layer to obtain an initial characteristic map;

learning the initial feature map sequentially through the first feature pyramid and the second feature pyramid, and taking the output of each layer of the last feature pyramid in the feature pyramid network as the first feature to be identified, wherein the output of each convolution layer of the first feature pyramid is fused with the input of the corresponding deconvolution layer of the last second feature pyramid; and each deconvolution layer output of the second characteristic pyramid network is fused with the input of the convolution layer of the last first characteristic pyramid corresponding to the deconvolution layer.

Optionally, the feature pyramid network comprises 2 first feature pyramids and 1 second feature pyramid.

Optionally, the attention layer includes a convolutional layer and an activation function, and the step of learning the first feature to be recognized through the attention layer to obtain a second feature to be recognized includes:

learning the first feature to be identified through the convolutional layer of the attention layer to obtain a first attention feature;

processing the first attention feature by the activation function to obtain a second attention feature;

and fusing the data of the second attention characteristic and the first feature to be recognized to obtain the second feature to be recognized.

Optionally, the method further comprises a training step of the neural network model:

acquiring a sample set, dividing the sample set into a training sample and a verification sample according to a preset proportion, and initializing the neural network model through preset parameters;

obtaining a plurality of small batch samples with preset quantity from the training samples in a weighted sampling mode, wherein the proportion between the sum of the quantity of pedestrians appearing in the small batch samples and the sum of the quantity of vehicles appearing in the small batch samples is a preset proportion value;

sequentially sending the small batches of samples into the neural network model, calculating the error of the training samples, and then modifying the weight in the neural network model through a back propagation algorithm until the error of the training samples is within a preset range;

inputting the verification sample into the neural network model trained by the training sample for learning, further correcting the weight of the neural network model trained by the training sample, and acquiring the error and accuracy for identifying the verification sample.

Optionally, the obtaining of the sample set further comprises:

for each sample image, obtaining a plurality of labeling information of the sample image, wherein each labeling information comprises a label corresponding to each target object in the sample image, and the label comprises a pedestrian or a vehicle;

counting the total marking times of the same target object in the marking information and the times marked as pedestrians or vehicles according to each sample image;

for each target object, if the proportion of the times marked as pedestrians to the total marking times is greater than a preset threshold value, determining the target object as a pedestrian; and if the proportion of the times marked as the vehicles to the total times marked is greater than a preset threshold value, determining the target object as the vehicle.

Another object of the present application is to provide a vehicle and pedestrian detection apparatus, which is applied to an image processing device, wherein the image processing device is preset with a neural network model, the neural network model includes a feature pyramid network, an attention layer and a full connection layer, and the vehicle and pedestrian detection apparatus includes an image acquisition module, a pyramid module, an attention module and an identification module;

the image acquisition module is used for acquiring an image to be identified;

the pyramid module is used for learning the image to be identified through the feature pyramid network to obtain a first feature to be identified;

the attention module is used for learning the first feature to be recognized through the attention layer to obtain a second feature to be recognized;

the identification module is used for learning the second feature to be identified through the full connection layer to obtain the classification probability of the target in the image to be identified and the position information of the target, and marking the pedestrian or the vehicle in the image to be identified according to the classification probability and the position information of the target.

Optionally, the vehicle and pedestrian detection device further comprises a square frame acquisition module, an adjustment module, a weight module and a determination module;

the box acquisition module is used for acquiring a plurality of boxes according to the classification probability and the position information of the object, the boxes are used for marking the object in the image, the boxes comprise corresponding confidence scores, and the confidence scores represent the classification probability of the object;

the adjusting module is used for adjusting the confidence score of the square frame larger than the first preset IOU threshold value according to the confidence score of the square frame larger than the first preset IOU threshold value aiming at the square frame larger than the first preset IOU threshold value, wherein the adjusted confidence score is larger than zero;

the weight module is used for selecting a plurality of candidate frames of the target from the plurality of frames according to a second preset IOU threshold value and obtaining a weight corresponding to the candidate frame according to the confidence score of the candidate frame;

the determining module is used for obtaining the target candidate box of the target through a weighted average algorithm according to the confidence scores of the candidate boxes and the corresponding weight values of the candidate boxes.

Optionally, the attention layer comprises a convolutional layer and an activation function; the attention module obtains a second feature to be identified by:

and fusing the data of the second attention characteristic and the data of the first feature to be recognized to obtain the second feature to be recognized.

Compared with the prior art, the method has the following beneficial effects:

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a hardware configuration diagram of an image processing apparatus according to an embodiment of the present application;

FIG. 2 is a flow chart of steps of a method of vehicle and pedestrian detection provided by an embodiment of the present application;

fig. 3 is a structural diagram of a feature pyramid network provided in an embodiment of the present application;

fig. 4 is a structural diagram of a vehicle and pedestrian detection device provided in an embodiment of the present application.

Icon: 100-an image processing device; 130-a processor; 120-a memory; 110-vehicle and pedestrian detection means; 800-an image to be recognized; 801-first feature map; 802-second feature map; 803-third feature map; 804-fourth feature map; 805-fifth feature map; 806-sixth feature map; 807-seventh feature map; 808-eighth feature map; 809-ninth feature map; 1101-an image acquisition module; 1102-pyramid module; 1103-attention module; 1104-an identification module; 1105-square acquisition module; 1106-an adjustment module; 1107-weight module; 1108-determination module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present application, it is further noted that, unless expressly stated or limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

The existing image recognition technology is difficult to detect a long-distance small target, and the condition of missing detection and false detection exists for an overlapped target or a shielded target. Meanwhile, in the prior art, when the vehicle and the pedestrian are detected simultaneously, the vehicle and the pedestrian can influence each other, so that the detection precision is influenced.

In view of the above, the present embodiment provides a method for identifying a vehicle and a pedestrian simultaneously through a deep learning technique, and the solution provided by the present embodiment is explained in detail below.

First, referring to fig. 1, the present embodiment provides a hardware structure diagram of an image processing apparatus 100, where the image processing apparatus 100 includes a vehicle and pedestrian detection device 110, a memory 120 and a processor 130. The memory 120 and the processor 130 are electrically connected to each other, directly or indirectly, and the elements are electrically connected to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

The image processing apparatus 100 may be, but is not limited to, a smart phone, a Personal Computer (PC), a tablet PC, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), and the like.

The operating system of the image processing apparatus 100 may be, but is not limited to, an Android system, an ios (Android system), a Windows phone system, a Windows system, and the like.

The Memory 120 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 120 is used for storing a program, and the processor 130 executes the program after receiving the execution instruction.

The processor 130 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 2, a flowchart of steps of a vehicle and pedestrian detection method provided by the present embodiment is applied to the image processing apparatus 100 shown in fig. 1; the image processing device 100 is preset with a neural network model, and the neural network model includes a feature pyramid network, an attention layer, and a full connection layer. The method including the respective steps will be described in detail below.

Step S100, an image 800 to be recognized is acquired.

Alternatively, the image to be recognized 800 may be an image stored in a local storage medium; or may be a real-time image acquired by the image acquisition device acquired by the image processing device 100 through a network.

Step S200, learning the image to be recognized 800 through the feature pyramid network to obtain a first feature to be recognized.

Optionally, in order to effectively reduce missing of the small target in the image to be recognized 800, in this embodiment, the feature information in the image to be recognized 800 is extracted by using a feature pyramid, so as to obtain the first feature to be recognized. The feature pyramid network comprises at least one layer of initial convolutional layers and a feature pyramid, the feature pyramid comprises at least one first feature pyramid and at least one second feature pyramid, the first feature pyramid comprises a plurality of convolutional layers with different levels, and the second feature pyramid comprises a plurality of deconvolution layers corresponding to the convolutional layers with different levels; and the first characteristic pyramid of the characteristic pyramid network is a first characteristic pyramid, the output of the first characteristic pyramid is used as the input of the second characteristic pyramid, and the output of the second characteristic pyramid is used as the input of the first characteristic pyramid.

The image processing apparatus 100 learns the image to be recognized 800 by the initial convolutional layer to obtain an initial feature map. Further, the image processing apparatus 100 learns the initial feature map sequentially through the first feature pyramid and the second feature pyramid, and takes the output of each layer of the last feature pyramid in the feature pyramid network as the first feature to be identified, where the output of each convolution layer of the first feature pyramid is fused with the input of the deconvolution layer of the last second feature pyramid corresponding to the convolution layer; and each deconvolution layer output of the second characteristic pyramid network is fused with the input of the convolution layer of the last first characteristic pyramid corresponding to the deconvolution layer.

For example, in one possible example, please refer to the block diagram of the feature pyramid network shown in fig. 3, which includes 2 first feature pyramids, 1 second feature pyramid, and one initial convolution layer. The first feature pyramid includes 2 convolutional layers, and the second feature pyramid includes 2 deconvolution layers.

The image processing apparatus 100 obtains a first feature map 801 by extracting features of the image 800 to be identified through the initial convolutional layer. Further, the image processing apparatus 100 sequentially performs feature extraction on the first feature map 801 by 2 convolution layers of the first feature pyramid to obtain a second feature map 802 and a third feature map 803.

The image processing apparatus 100 adjusts the number of channels of the third feature map 803 by a 1 × 1 convolution kernel to obtain a fourth feature map 804. The image processing apparatus 100 performs deconvolution processing on the fourth feature map 804 by using the first deconvolution layer of the second feature pyramid, and fuses the processed feature map and the second feature map 802 to obtain a fifth feature map 805, where the number of channels of the processed feature map and the second feature map 802 is equal, and the resolution is equal. The image processing apparatus 100 performs deconvolution processing on the fifth feature map 805 by using the second deconvolution layer of the second feature pyramid, and fuses the processed feature map and the first feature map 801 to obtain a sixth feature map 806, where the number of channels of the processed feature map and the first feature map 801 is equal, and the resolution is equal.

The image processing apparatus 100 adjusts the number of channels of the sixth feature map 806 by a 1 × 1 convolution kernel to obtain a seventh feature map 807. The image processing apparatus 100 performs convolution processing on the seventh feature map 807 through the first convolution layer of the second first feature pyramid, and fuses the processed feature image and the image of the sixth feature map 806 to obtain an eighth feature map 808. Further, the image processing apparatus 100 performs convolution processing on the eighth feature map 808 through the second convolution layer of the second first feature pyramid, and fuses the processed image and the fifth feature map 805 to obtain a ninth feature map 809. The image processing apparatus 100 takes the seventh feature map 807, the eighth feature map 808, and the ninth feature map 809 as the output of the feature pyramid network.

The image processing device 100 performs feature extraction on the image to be recognized 800 through the feature pyramid network, so that missing detection of small targets is effectively reduced. Especially, when the images collected by the monitoring camera on the highway are identified, the small targets at long distances in the images collected by the monitoring camera can be identified, so that the purpose of monitoring whether the vehicles or pedestrians at long distances break rules or not is achieved, and the response time is prolonged.

Optionally, in order to make the features extracted by the image processing apparatus 100 focus on the non-occlusion region of the image 800 to be recognized, and reduce the influence of the occlusion or overlap region on the features, the neural network introduces an attention layer. The image processing device 100 sends the first feature to be identified output by the special occlusion pyramid network to the attention layer for processing. The attention layer includes a convolutional layer and an activation function, wherein in the present embodiment, the activation function is a Sigmoid activation function, and the convolutional layer has a convolutional kernel size of 1 × 1. The image processing apparatus 100 obtains a first attention feature by further processing the first to-be-identified feature by the convolution layer of the attention layer, and obtains a second attention feature by processing the first attention feature by an activation function. Finally, the image processing apparatus 100 fuses the second attention feature and the first feature to be recognized to obtain a second feature to be recognized. In the present embodiment, the image processing apparatus 100 multiplies the second attention feature and the first feature to be recognized for fusion.

Optionally, the image processing apparatus 100 obtains a plurality of boxes according to the classification probability and the position information of the object, the boxes are used for marking the object in the image, the boxes include corresponding confidence scores, and the confidence scores represent the classification probability of the object.

In order to avoid the non-maximum suppression algorithm, the box with the largest confidence score is selected, and the confidence scores of the boxes with IOUs larger than the preset threshold in other boxes are marked as 0, so that some correct targets are filtered. The image processing apparatus 100 adjusts the confidence score of the square frame larger than the first preset IOU threshold value according to the confidence score of the square frame larger than the first preset IOU threshold value, wherein the adjusted confidence score is larger than zero, thereby avoiding filtering out some correct targets. For example, the adjusted confidence score is obtained by performing corresponding linear operation on the confidence score through a linear function, or the adjusted confidence score is obtained by performing corresponding nonlinear operation on the confidence score through a nonlinear function.

The image processing device 100 selects a plurality of candidate frames of the target from the frames according to a second preset IOU threshold, and obtains a weight corresponding to the candidate frames according to the confidence scores of the candidate frames. Further, the image processing apparatus 100 obtains the target candidate box of the target through a weighted average algorithm according to the confidence scores of the plurality of candidate boxes and the weight values corresponding to the candidate boxes.

Optionally, the vehicle and pedestrian detection method further comprises a training step of the neural network model. The image processing device 100 acquires a sample set, divides the sample set into training samples and verification samples according to a preset proportion, and initializes the neural network model through preset parameters; obtaining a plurality of small batch samples with preset quantity from the training samples in a weighted sampling mode, wherein the proportion between the sum of the quantity of pedestrians appearing in the small batch samples and the sum of the quantity of vehicles appearing in the small batch samples is a preset proportion value; sequentially sending the small batches of samples into the neural network model, calculating the error of the training samples, and then modifying the weight in the neural network model through a back propagation algorithm until the error of the training samples is within a preset range; inputting the verification sample into the neural network model trained by the training sample for learning, further correcting the weight of the neural network model trained by the training sample, and acquiring the error and accuracy for identifying the verification sample.

Because the trained neural network model can identify the pedestrians and the vehicles at the same time, the training samples comprise the pedestrians and the vehicles, and in order to eliminate the unstable influence on training caused by the unbalanced proportion of the pedestrians and the vehicles, the training samples are adopted to carry out weighted sampling to balance the proportion between the two classes. The inventor researches and discovers that the proportion of the total number of pedestrians to the total number of vehicles in each batch of training samples is 1:2, so that the adverse effect of human detection on vehicle detection can be effectively improved.

Alternatively, for small objects at large distances in the sample, the human eye cannot accurately distinguish whether the object is a pedestrian or a vehicle. In order to ensure the objectivity of the sample data, multiple persons are marked as pedestrians or vehicles for the targets appearing in the sample image aiming at the same sample image. The image processing apparatus 100 counts, for each sample image, the total number of times of marking of the same target object in the labeling information, and the number of times of being labeled as a pedestrian or as a vehicle; for each target object, if the proportion of the times marked as pedestrians to the total marking times is greater than a preset threshold value, determining the target object as a pedestrian; and if the proportion of the times marked as the vehicles to the total times marked is greater than a preset threshold value, determining the target object as the vehicle.

Referring to fig. 4, the present embodiment further provides a structural diagram of a vehicle and pedestrian detection apparatus 110, where the vehicle and pedestrian detection apparatus 110 includes at least one software functional module which can be stored in the memory 120 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the image processing device 100. The processor 130 is used for executing executable modules stored in the memory 120, such as software functional modules and computer programs included in the vehicle and pedestrian detection device 110.

The vehicle and pedestrian detection device 110 is applied to an image processing device 100, the image processing device 100 is preset with a neural network model, the neural network model comprises a characteristic pyramid network, an attention layer and a full connection layer, and is divided functionally, and the vehicle and pedestrian detection device 110 comprises an image acquisition module 1101, a pyramid module 1102, an attention module 1103 and an identification module 1104;

the image acquiring module 1101 is configured to acquire an image 800 to be recognized.

In the present embodiment, the image acquiring module 1101 is configured to execute step S100 in fig. 2, and reference may be made to the detailed description of step S100 for a detailed description of the image processing module.

The pyramid module 1102 is configured to learn the image to be recognized 800 through the feature pyramid network to obtain a first feature to be recognized.

In this embodiment, the pyramid module 1102 is configured to perform step S200 in fig. 2, and reference may be made to the detailed description of step S200 for a detailed description of the pyramid module 1102.

The attention module 1103 is configured to learn the first feature to be recognized through the attention layer to obtain a second feature to be recognized.

In the present embodiment, the attention module 1103 is configured to perform step S300 in fig. 2, and reference may be made to the detailed description of step S300 for the detailed description of the pyramid module 1102.

The identification module 1104 is configured to learn the second feature to be identified through the full connection layer to obtain a classification probability of an object in the image 800 to be identified and position information of the object, and mark a pedestrian or a vehicle in the image 800 to be identified according to the classification probability and the position information of the object.

In this embodiment, the identifying module 1104 is configured to perform step S400 in fig. 2, and reference may be made to the detailed description of step S400 for a detailed description of the identifying module 1104.

Optionally, the vehicle and pedestrian detection apparatus 110 further includes a block obtaining module 1105, an adjusting module 1106, a weight value module 1107, and a determining module 1108.

The block obtaining module 1105 is configured to obtain a plurality of blocks according to the classification probability and the position information of the object, where the blocks are used to mark the object in the image, and each block includes a corresponding confidence score, and the confidence score represents the classification probability of the object;

the adjusting module 1106 is configured to, for a box larger than a first preset IOU threshold in the boxes, adjust a confidence score of the box larger than the first preset IOU threshold according to the confidence score of the box larger than the first preset IOU threshold, where the adjusted confidence score is larger than zero;

the weight module 1107 is configured to select multiple candidate frames of the target from the multiple frames according to a second preset IOU threshold, and obtain a weight corresponding to the candidate frame according to a confidence score of the candidate frame;

the determining module 1108 is configured to obtain a target candidate box of the target through a weighted average algorithm according to the confidence scores of the multiple candidate boxes and the weights corresponding to the candidate boxes.

Optionally, the attention layer comprises a convolutional layer and an activation function; the attention module 1103 obtains the second feature to be recognized by:

According to the vehicle and pedestrian detection method and device, the characteristic information of the image 800 to be recognized is extracted on multiple scales through the characteristic pyramid network to obtain a first feature to be recognized; further, the first feature to be identified is further processed through an attention layer to obtain a second feature to be identified; further, the second feature to be recognized is processed through the full connection layer to recognize and mark the vehicle and the pedestrian in the image 800 to be recognized. According to the technical scheme, missing detection of remote small targets in the image 800 to be recognized is effectively reduced through the feature pyramid network, and the influence of a shielding or overlapping area in the image 800 to be recognized on the recognition effect is reduced through the attention layer. And through weighting processing on the training samples, the influence of class imbalance on the detection precision is reduced.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A vehicle and pedestrian detection method is applied to an image processing device, the image processing device is preset with a neural network model, the neural network model comprises a characteristic pyramid network, an attention layer and a full connection layer, and the method comprises the following steps:

acquiring an image to be identified;

2. The vehicle and pedestrian detection method of claim 1, further comprising:

aiming at the square boxes which are larger than a first preset IOU threshold value in the square boxes, adjusting the confidence scores of the square boxes which are larger than the first preset IOU threshold value according to the confidence scores of the square boxes which are larger than the first preset IOU threshold value, wherein the adjusted confidence scores are larger than zero;

3. The vehicle and pedestrian detection method of claim 1, wherein the feature pyramid network includes at least one layer of initial convolutional layers and a feature pyramid, the feature pyramid including at least one first feature pyramid and at least one second feature pyramid, the first feature pyramid including a plurality of hierarchically different convolutional layers, the second feature pyramid including a plurality of deconvolution layers corresponding to the plurality of hierarchically different convolutional layers; a first feature pyramid of the feature pyramid network is a first feature pyramid, an output of the first feature pyramid is used as an input of the second feature pyramid, and an output of the second feature pyramid is used as an input of the first feature pyramid; the step of learning the image to be recognized through the feature pyramid network to obtain a first feature to be recognized comprises:

4. The vehicle and pedestrian detection method of claim 3, wherein the feature pyramid network includes 2 first feature pyramids and 1 second feature pyramid.

5. The vehicle and pedestrian detection method according to claim 1, wherein the attention layer includes a convolutional layer and an activation function, and the step of obtaining a second feature to be identified by learning the first feature to be identified through the attention layer includes:

6. The vehicle and pedestrian detection method of claim 1, further comprising a training step of the neural network model:

sequentially sending the small batch of samples with the preset number into the neural network model, calculating the error of the training samples, and then modifying the weight in the neural network model through a back propagation algorithm until the error of the training samples is within a preset range;

7. The vehicle and pedestrian detection method of claim 6, further comprising, before said obtaining a sample set, the steps of:

8. The vehicle and pedestrian detection device is applied to image processing equipment, a neural network model is preset in the image processing equipment, the neural network model comprises a characteristic pyramid network, an attention layer and a full connection layer, and the vehicle and pedestrian detection device comprises an image acquisition module, a pyramid module, an attention module and an identification module;

the image acquisition module is used for acquiring an image to be identified;

9. The vehicle and pedestrian detection apparatus according to claim 8, further comprising a block acquisition module, an adjustment module, a weight module, a determination module;

10. The vehicle and pedestrian detection apparatus of claim 8, wherein the attention layer includes a convolutional layer and an activation function; the attention module obtains a second feature to be identified by: