CN117496475A

CN117496475A - Target detection method and system applied to automatic driving

Info

Publication number: CN117496475A
Application number: CN202311850776.8A
Authority: CN
Inventors: 李维刚; 俞航
Original assignee: Science And Technology University Jizhi Technology Hubei Co ltd; Wuhan University of Science and Engineering WUSE
Current assignee: Science And Technology University Jizhi Technology Hubei Co ltd; Wuhan University of Science and Engineering WUSE
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-02-02
Anticipated expiration: 2043-12-29
Also published as: CN117496475B

Abstract

The invention discloses a target detection method and a system applied to automatic driving, wherein a C3PRO module designed based on an extraction and diversion idea is applied to a shallow part of a network, and the extraction capacity of the network to shallow features of a small target is enhanced through more parallel gradient flow branches, more convolution operation and parameters which can be used for learning; the invention also adds a feature extraction branch channel which is independent of the backbone network of the original algorithm, has fewer layers, solves the problem of small target shallow information loss caused by too deep network in the forward propagation process, adds a channel attention mechanism at the tail end of the channel, and avoids feature redundancy; aiming at the problem that the YOLOv5s positioning loss function punishs incompletely, the invention introduces a more advanced SIoU loss function so as to more accurately position a small target. The invention enhances the extraction capability, the reservation capability and the positioning capability of the small target in the automatic driving road scene.

Description

Target detection method and system applied to automatic driving

Technical Field

The invention belongs to the technical field of automatic driving, and particularly relates to a target detection method and system applied to automatic driving.

Background

With the rapid development of computer vision and hardware devices, road target detection algorithms in the field of deep learning are becoming an important component of automatic driving technology. However, although current road object detection algorithms have made significant progress in the detection of large objects, a series of challenges remain in the detection process for small objects in the road. The problem of the target detection algorithm is that the information quantity of the small target is limited, enough characteristic information is difficult to extract through the traditional convolution operation, and the shallow information of the small target is gradually lost or even completely disappeared with the increase of the network depth, so that the problem of missed detection and false detection of the small target, which are more serious than those of the large target, is caused.

Therefore, there is a need to develop a method for small target detection.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a target detection method and a target detection system applied to automatic driving.

In order to achieve the expected effect, the invention adopts the following technical scheme:

the invention discloses a target detection method applied to automatic driving, which comprises the following steps:

collecting an original target image identified in an automatic driving process;

inputting the original target image into an improved YOLOv5s algorithm to detect the target to obtain a final detection target image;

the improved YOLOv5s algorithm comprises a first backbone network and a second backbone network;

the first backbone network comprises at least one C3PRO module; the C3PRO module is used for extracting shallow features of an original target image to obtain a first feature map, and sending the first feature map to a second backbone network or a neck network;

the second backbone network is an independent branch channel between the first backbone network and the neck network; the second backbone network comprises at least one C3PRO-SE module; the C3PRO-SE module is used for carrying out feature extraction on the feature map sent by the first backbone network to obtain a second feature map, and sending the second feature map to the neck network.

Further, the C3PRO module includes a number of 1×1 convolutional layers, a number of 3×3 convolutional layers, and a number of bottleneck layers.

Further, after the first target image is input into the C3PRO module, the first target image is subjected to feature stitching of a plurality of 1×1 convolution layers, a plurality of 3×3 convolution layers and a plurality of bottleneck layers to obtain a second target image of the 1×1 convolution layers, and the second target image is output by the C3PRO module.

Further, the C3PRO-SE module is formed by combining the C3PRO module and a SE channel attention model.

Further, after inputting a third target image into the C3PRO-SE module, the third target image is first subjected to feature stitching of a plurality of 1×1 convolution layers, a plurality of 3×3 convolution layers and a plurality of bottleneck layers in the C3PRO module to obtain a fourth target image of the 1×1 convolution layers, then the fourth target image is sent to the SE channel attention model to carry out channel attention vector mapping to obtain a fifth target image, and finally the fifth target image and the fourth target image are multiplied channel by channel to obtain a sixth target image and output by the C3PRO-SE module.

Further, the modified YOLOv5s algorithm further includes a SIoU loss function, the SIoU loss function being:

；

where IoU is the intersection ratio of the real frame and the predicted frame,as distance loss，/>Is a loss of shape.

Further, the distance is lostThe method comprises the following steps:

；

in the method, in the process of the invention,calculated from the angle loss function, < >>And->The method is calculated by combining the center coordinates of the detection frame with the width and the height of the minimum circumscribed rectangle.

Further, the angle loss function is:

；

in the method, in the process of the invention,for the difference in height between the predicted and real frame center point,/->Is the distance between the predicted frame and the center point of the real frame.

Further, the shape lossThe method comprises the following steps:

；

in the method, in the process of the invention,and->And the theta is the attention degree of the shape obtained by calculating the width and the height of the prediction frame and the real frame respectively.

The invention also discloses a target detection system applied to automatic driving, the system can realize any one of the methods, and the system comprises:

the acquisition module is used for acquiring an original target image identified in the automatic driving process;

the detection module is used for inputting the original target image into the improved YOLOv5s algorithm to detect the target so as to obtain a final detection target image; the improved YOLOv5s algorithm comprises a first backbone network and a second backbone network; the first backbone network comprises at least one C3PRO module; the C3PRO module is used for extracting shallow features of an original target image to obtain a first feature map, and sending the first feature map to a second backbone network or a neck network; the second backbone network is an independent branch channel between the first backbone network and the neck network; the second backbone network comprises at least one C3PRO-SE module; the C3PRO-SE module is used for carrying out feature extraction on the feature map sent by the first backbone network to obtain a second feature map, and sending the second feature map to the neck network.

Compared with the prior art, the invention has the beneficial effects that: the invention provides a target detection method and a system applied to automatic driving, firstly, the invention provides a C3PRO module designed based on an extraction and diversion idea, which is applied to a shallow part of a network, and the module enhances the extraction capability of the network to shallow characteristics of a small target through more parallel gradient flow branches, more convolution operation and parameters which can be used for learning; secondly, a feature extraction branch channel is added, the channel is independent of a backbone network of an original algorithm, the number of layers is smaller, the problem of small target shallow information loss caused by too deep network in the forward propagation process is solved, and the channel attention is added at the tail end of the channel, so that feature redundancy is avoided; finally, the invention introduces a more advanced SIoU loss function to solve the problem of incomplete penalty of the YOLOv5s positioning loss function, so as to more accurately position a small target. The invention improves the serious missed detection and false detection problems of the small target in the road scene of automatic driving by enhancing the extraction capability, the retention capability and the positioning capability of the small target of the shallow information of the small target.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings described below are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a network structure of an improved YOLOv5s algorithm according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a C3 module according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a C3PRO module according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a C3PRO-SE module according to an embodiment of the present invention.

Fig. 5 is a schematic view of an angle loss according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of PR curves for a small target according to an embodiment of the present invention.

Fig. 7 is a schematic diagram of PR curves for an entire dataset according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1 to 7, the present invention discloses a target detection method applied to automatic driving, the method comprising:

collecting an original target image identified in an automatic driving process;

specifically, the existing YOLOv5s algorithm includes a backbone network, a neck network and a head network, wherein the backbone network is mainly used for feature extraction, the neck network is mainly used for feature fusion (fusion of shallow features and deep features), and the head network is mainly responsible for converting the extracted features into final detection results. Typically only one backbone network is included, whereas the present invention includes two backbone networks. In addition, the size k of the convolution kernel and the step size s are included to control the edge filling p of the size and shape of the output feature map and the number of output channels c.

optionally, the C3PRO module includes a plurality of 1×1 convolution layers, a plurality of 3×3 convolution layers, and a plurality of bottleneck layers.

Specifically, the general YOLOv5s algorithm uses a C3 module (as shown in fig. 2) as a main feature extraction module, and the C3 module is designed based on the idea of extracting and splitting, however, the C3 module has a simple structure and a small parameter amount, and is limited in the capability of capturing shallow information required by a small target. The invention further develops the concept of extracting the split stream and provides a C3PRO module (shown in figure 3). The C3PRO module is still designed in a way that a combination of convolution modules and bottleneck layer (bottleck) modules is used, the number of bottleneck layers can be controlled by the parameter n. Meanwhile, in order to solve the problems of gradient dispersion and network degradation, the C3PRO module still reserves a residual channel of the C3 module.

By way of example, four convolutional layers CBS are present in the gradient channel, wherein the two CBS have a1×1 convolutional kernel for the convolutional part and a 3×3 convolutional kernel for the other two convolutional parts, so as to obtain more feature maps under different receptive fields, the feature maps under small receptive fields can concentrate on the feature information of the small target itself, the feature maps under large receptive fields can obtain the context information of the surrounding environment of the target, and finally the feature maps under different receptive fields are subjected to channel splicing at the end of the module, so that the feature information can be obtained more abundant and comprehensive. In terms of module use, the invention considers that small targets lack fine granularity information of a shallow layer mainly, and simultaneously, in order to avoid the problem of great parameter increase caused by using a large amount of C3PRO modules, only the C3PRO modules are used for replacing two C3 modules of a network shallow layer so as to improve the shallow layer feature extraction capability of the network.

Optionally, after the first target image is input into the C3PRO module, the first target image is spliced by the features of the plurality of 1×1 convolution layers, the plurality of 3×3 convolution layers, and the plurality of bottleneck layers, and then a second target image of the 1×1 convolution layer is obtained and output by the C3PRO module.

Specifically, the Backbone network backbox of the YOLOv5s algorithm has more layers, and in the deepening process of the network, although the deep semantic information can be obtained more and more, the shallow fine granularity information can be continuously disappeared, which is not beneficial to the detection of small targets. In view of this problem, the present invention proposes a second Backbone network backhaul 2, which is a branch channel of the first Backbone network backhaul, which enhances the ability of the network to retain shallow information. Optionally, the second backbone network includes two convolution modules and one C3PRO-SE module.

In an alternative embodiment, the C3PRO-SE module is formed by a combination of the C3PRO module and a SE channel attention model.

Illustratively, the input end of the second Backbone network backhaul 2 receives a 320×320×64 dimension feature map from the shallow layer portion of the first Backbone network backhaul, and after two CBS layers with step sizes of 2, four times downsampling is implemented, and expansion of channel dimensions is completed, so as to obtain an 80×80×256 feature map. In order to further enhance the shallow feature extraction capability, a C3PRO-SE module is added after two CBS layers, the structure of the C3PRO-SE module is shown in fig. 4, and the module is formed by combining the C3PRO module with a channel attention mechanism SE (Squeeze-and-specification). Because the second Backbone network backhaul 2 is independent of the first Backbone network, downsampling and feature extraction are independently carried out, the problem that shallow information disappears due to too deep network is solved, meanwhile, the problem of feature redundancy possibly occurring is avoided by introducing a channel attention mechanism, and detection of small road targets in an automatic driving process is facilitated.

The CIoU loss function adopted by the existing YOLOv5s algorithm does not consider the angle problem when the prediction frame returns to the real frame position, which may cause the prediction frame of the small road target to return along an unsuitable direction in the automatic driving process, so that the problems of low prediction accuracy and the like occur. Based on the above problems, the present invention proposes to introduce a more advanced SIoU loss function in the network, as shown in fig. 5, B is a prediction block,is a true box. The SIoU loss function takes into accountThe angle loss, distance loss and shape loss add to the consideration of the angle penalty compared to the CIoU loss function.

In one aspect, the improved YOLOv5s algorithm further comprises a SIoU loss function, the SIoU loss function being:

；

where IoU is the intersection ratio of the real frame and the predicted frame,for distance loss->Is a loss of shape.

Specifically, the SIoU loss function considers the vector angle problem required by regression, and angle loss calculation is newly added, so that a prediction frame can be regressed towards a more accurate direction, and the improvement on the small target positioning accuracy is realized.

On the other hand, the distance lossThe method comprises the following steps:

；

Alternatively to this, the method may comprise,。

alternatively to this, the method may comprise,。

in the method, in the process of the invention,and->Is the coordinates of the center point of the prediction frame, +.>And->Is the coordinates of the center point of the real frame, +.>And->The width and the height of the minimum circumscribed rectangle of the real frame and the prediction frame are respectively.

In yet another aspect, the angle loss function is:

；

in the method, in the process of the invention,for the difference in height between the predicted and real frame center point,/->Is the distance between the predicted frame and the center point of the real frame. In the training process, if->(wherein->) Less than or equal to 45 degrees, will be for +.>The minimization process is performed, otherwise +.>(wherein->Degree->) A minimization process is performed.

Further, the shape lossThe method comprises the following steps:

；

Alternatively to this, the method may comprise,。

alternatively to this, the method may comprise,。

where w and h are the width and height of the prediction box respectively,and->Respectively of a real frameWide and high.

It is worth noting that in order to verify the superiority of the automatic driving small target detection method provided by the invention, the invention selects experimental data as twenty-thousand images marked in the SODA10M dataset for verification. The SODA10M data set is a two-dimensional automatic driving data set published by Hua in combination with Zhongshan university, the images are real road conditions under different illumination conditions, and the images of the data set contain abundant small targets in six categories, namely Pedestrian (Pederstrian), rider (cycle), common vehicle (Car), truck (Truck), tram (track) and Tricycle (Tricycle). According to the method, firstly, the detection effect of the trained model on the road image of the data set is observed, and then the small target is defined in a small target definition mode based on absolute pixel points. The most commonly used standard at present comes from an MS COCO data set which definitely defines a small target as a target with resolution less than 1024 pixels, the definition is adopted by the invention, the small target meeting the requirement is screened out, and the performance index of the small target is calculated. According to the invention, a Pytorch deep learning framework of version 1.7.1 is used, the selected CUDA version is 10.1, the GPU selects RTX 2080TI, the CPU selects i7 9700K, the memory is 4 16G memory strips, and the training period epoch is set to 200 to train the model.

Experiments show that after the second Backbone network backhaul 2 is introduced, the recognition capability of a long-distance small target is effectively improved, and the detection capability of a short-distance truck appearing with local information is also obviously improved. The SIoU loss function improves the detection capability of small targets generated by far distance and also improves the detection capability of small targets generated by shielding local information on the other side of the road. The C3PRO module, the second Backbone network backhaul 2 and the SIoU loss function provided by the invention all effectively improve the detection capability of small targets on a road in the automatic driving process.

According to the definition of the MS COCO data set on the small target, the target with less than 1024 pixels is defined as the small target, a mode of directly calculating the average precision mean (mAP) of the small targets of six types of roads and the individual precision mean (AP) of the small targets of each type is provided, firstly, the target with less than 1024 pixels, namely the labeling information of the non-small target is completely deleted in a mode of calculating the real frame area, and then the labeling information of the non-small target is not considered in the calculation of the performance index. After deleting the labeling information of the non-small targets, classifying the other targets except the small targets as the background, and detecting the targets classified as the background by an algorithm in the detection process to generate prediction information, so that the wrong performance index is calculated, and removing the prediction information of the non-small targets classified as the background before the prediction result participates in calculation. The algorithm only reserves the prediction information of the small target and participates in calculation of the performance index. The real frames and the prediction frames of the non-small targets are removed, which is equivalent to an algorithm for detecting only the small targets, so that performance indexes directly related to the detection of the small targets can be obtained. As shown in fig. 6, taking the mAP performance index as an example, the present invention improves the mAP for small targets by 3.7 percentage points. As shown in fig. 7, taking the mAP performance index as an example, the present invention improves the mAP of the whole data set by 3 percentage points.

In summary, the invention provides an improved YOLOv5s algorithm for solving the problem of difficult detection of small targets in a road scene of an automatic driving scene, and the detection precision of the small targets in the road can be remarkably improved. The method has certain popularization value in the deep learning field of automatic driving.

Based on the same thought, the invention also discloses a target detection system applied to automatic driving, wherein the system can realize any one of the methods, and the system comprises the following steps:

The system embodiments may be implemented in one-to-one correspondence with the foregoing method embodiments, and are not described herein.

The invention applies the C3PRO module designed based on the extraction and shunt idea to the shallow layer part of the network, and enhances the extraction capability of the network to the shallow layer characteristics of the small target through more parallel gradient flow branches, more convolution operation and parameters which can be used for learning; the invention also adds a feature extraction branch channel which is independent of the backbone network of the original algorithm, has fewer layers, solves the problem of small target shallow information loss caused by too deep network in the forward propagation process, adds a channel attention mechanism at the tail end of the channel, and avoids feature redundancy; aiming at the problem that the YOLOv5s positioning loss function punishs incompletely, the invention introduces a more advanced SIoU loss function so as to more accurately position a small target. The invention enhances the extraction capability, the reservation capability and the positioning capability of the small target in the automatic driving road scene.

Based on the same thought, the invention also discloses electronic equipment, which can comprise: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are in communication with each other through the communication bus. The processor may invoke logic instructions in the memory to perform a target detection method for use in autopilot, the method comprising:

collecting an original target image identified in an automatic driving process;

Further, the logic instructions in the memory described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, embodiments of the present invention further provide a computer program product including a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions which, when executed by a computer, enable the computer to perform a target detection method applied to automatic driving as provided in the above method embodiments, the method including:

collecting an original target image identified in an automatic driving process;

In still another aspect, an embodiment of the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program which is implemented when executed by a processor to perform an object detection method for automatic driving provided in the above embodiments, the method including:

collecting an original target image identified in an automatic driving process;

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. A target detection method applied to automatic driving, the method comprising:

collecting an original target image identified in an automatic driving process;

the first backbone network comprises at least one C3PRO module; the C3PRO module is used for extracting shallow features of an original target image to obtain a first feature map, and sending the first feature map to a second backbone network or a neck network; the C3PRO module comprises a plurality of 1 multiplied by 1 convolution layers, a plurality of 3 multiplied by 3 convolution layers and a plurality of bottleneck layers;

the second backbone network is an independent branch channel between the first backbone network and the neck network; the second backbone network comprises at least one C3PRO-SE module; the C3PRO-SE module is used for extracting the characteristics of the characteristic diagrams sent by the first backbone network to obtain second characteristic diagrams, and sending the second characteristic diagrams to the neck network; the C3PRO-SE module is formed by combining a C3PRO module and a SE channel attention model.

2. The method for detecting an object applied to automatic driving according to claim 1, wherein after a first object image is input to the C3PRO module, the first object image is subjected to feature stitching of a plurality of 1×1 convolution layers, a plurality of 3×3 convolution layers, and a plurality of bottleneck layers, and a second object image of the 1×1 convolution layers is obtained and output by the C3PRO module.

3. The method for detecting an object applied to automatic driving according to any one of claims 1-2, wherein after inputting a third object image into the C3PRO-SE module, the third object image is first spliced by features of a plurality of 1 x 1 convolution layers, a plurality of 3 x 3 convolution layers, and a plurality of bottleneck layers in the C3PRO module to obtain a fourth object image of the 1 x 1 convolution layers, then the fourth object image is sent into the SE channel attention model to perform channel attention vector mapping to obtain a fifth object image, and finally the fifth object image and the fourth object image are multiplied channel by channel to obtain a sixth object image, and the sixth object image is output by the C3PRO-SE module.

4. The method for object detection for use in autopilot of claim 1 wherein the modified YOLOv5s algorithm further includes a SIoU loss function, the SIoU loss function being:

；

5. A target detection method for use in autopilot as claimed in claim 4 wherein said distance lossThe method comprises the following steps:

；

6. The method for detecting an object applied to automatic driving according to claim 5, wherein the angle loss function is:

；

7. A target detection method for use in autopilot as claimed in claim 4 wherein said shape loss isThe method comprises the following steps:

；

in the method, in the process of the invention,and->Calculated by the width and height of the prediction frame and the real frame, respectively, < >>Is the degree of attention to the shape.

8. A target detection system for use in autopilot, the system being capable of implementing the method of any one of claims 1-7, the system comprising:

the detection module is used for inputting the original target image into the improved YOLOv5s algorithm to detect the target so as to obtain a final detection target image; the improved YOLOv5s algorithm comprises a first backbone network and a second backbone network; the first backbone network comprises at least one C3PRO module; the C3PRO module is used for extracting shallow features of an original target image to obtain a first feature map, and sending the first feature map to a second backbone network or a neck network; the C3PRO module comprises a plurality of 1 multiplied by 1 convolution layers, a plurality of 3 multiplied by 3 convolution layers and a plurality of bottleneck layers; the second backbone network is an independent branch channel between the first backbone network and the neck network; the second backbone network comprises at least one C3PRO-SE module; the C3PRO-SE module is used for extracting the characteristics of the characteristic diagrams sent by the first backbone network to obtain second characteristic diagrams, and sending the second characteristic diagrams to the neck network; the C3PRO-SE module is formed by combining a C3PRO module and a SE channel attention model.