CN116912796A - Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device - Google Patents
Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device Download PDFInfo
- Publication number
- CN116912796A CN116912796A CN202310899627.4A CN202310899627A CN116912796A CN 116912796 A CN116912796 A CN 116912796A CN 202310899627 A CN202310899627 A CN 202310899627A CN 116912796 A CN116912796 A CN 116912796A
- Authority
- CN
- China
- Prior art keywords
- automatic driving
- image
- driving target
- network
- yolov8
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000001514 detection method Methods 0.000 claims abstract description 53
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000012360 testing method Methods 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 3
- 230000006870 function Effects 0.000 claims description 14
- 238000010606 normalization Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 5
- IXKSXJFAGXLQOQ-XISFHERQSA-N WHWLQLKPGQPMY Chemical compound C([C@@H](C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N1CCC[C@H]1C(=O)NCC(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(O)=O)NC(=O)[C@@H](N)CC=1C2=CC=CC=C2NC=1)C1=CNC=N1 IXKSXJFAGXLQOQ-XISFHERQSA-N 0.000 claims description 3
- 239000002131 composite material Substances 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 230000004913 activation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000001965 increasing effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses an automatic driving target recognition method and device based on novel dynamic cascade YOLOv8, which are used for preprocessing a pre-acquired original image of a traffic vehicle and dividing the pre-acquired original image into a training set and a testing set; constructing an automatic driving target recognition network based on novel dynamic cascading YOLOv 8; the automatic driving target identification network integrally replaces a Backbone network of a backbond in the YOLOv8 network with a novel dynamic cascade Backbone network; replacing the detection head in the last part of the YOLOv8 network with a shareseep head detection head of a new cross-scale shared convolution weight; adopting improved PolyLoss as a loss function of an automatic driving target recognition network; training the automatic driving target recognition network type by utilizing the training set; inputting the test set into a trained automatic driving target recognition network, and evaluating the automatic driving target recognition network. The invention can improve the identification precision and speed of the target in the automatic driving and provide guarantee for the safety of the automatic driving.
Description
Technical Field
The invention belongs to the application of deep learning in the field of computer vision, and particularly relates to an automatic driving target recognition method and device based on novel dynamic cascade YOLOv 8.
Background
As one of the core problems of computer vision, target detection, which aims to find the category and position of a specific target in an image, is widely used in various fields such as automatic driving, remote sensing images, video monitoring, medical detection, and the like.
YOLO was developed, with version updates since 2016, and v8 has been reached to date. In 2016, a single-Stage (One-Stage) target detection method represented by YOLOv1 has emerged. In view of the development history of the single-Stage target detection method, the first single-Stage target detection method YOLOv1 was proposed to 2023, and the YOLOv series target detection method was developed along with the development of the single-Stage target detection, and has been a typical representative of One-Stage methods.
Although YOLOv8 can perform object detection quickly when processing simple images, it requires more time to detect when there are a large number of vehicles and pedestrians when facing complex scenes such as traffic jams in reality. The real-time performance of automatic driving is important for decision making, and the improvement of the processing speed is still to be improved. Also, accuracy is improved, and highly accurate target detection results are required for automatic driving to ensure correct response to various traffic conditions. Although YOLOv8 may perform well in some situations, in some situations where traffic is complex, the detection accuracy still needs to be improved. The background backbone of the prior art YOLOv8 is fast when processing simple images, but requires more time when encountering complex images with many targets; the existing YOLOv8 detection head model contains more parameters, so that the calculation complexity is higher. In autopilot systems, computational resources are limited, and thus more efficient model design is needed to ensure target detection in embedded or resource-constrained environments.
Disclosure of Invention
The invention aims to: the invention provides an automatic driving target recognition method and device based on novel dynamic cascade YOLOv8, which can accurately detect targets in automatic driving.
The technical scheme is as follows: the invention provides an automatic driving target identification method based on novel dynamic cascade YOLOv8, which specifically comprises the following steps:
(1) Preprocessing a pre-acquired original image of a traffic vehicle, and dividing the pre-acquired original image into a training set and a testing set;
(2) Constructing an automatic driving target recognition network based on novel dynamic cascading YOLOv 8; the automatic driving target identification network integrally replaces a Backbone network of a backbond in the YOLOv8 network with a novel dynamic cascade Backbone network; replacing the detection head in the last part of the YOLOv8 network with a shareseep head detection head of a new cross-scale shared convolution weight;
(3) Adopting improved PolyLoss as a loss function of an automatic driving target recognition network;
(4) Training the automatic driving target recognition network type by utilizing the training set;
(5) Inputting the test set into a trained automatic driving target recognition network, and evaluating the automatic driving target recognition network.
Further, the novel dynamic cascade backbone network in the step (2) is provided with two cascade backbone networks, and a dynamic router is inserted between the two backbone networks to automatically select an optimal route for each image to be detected; the image to be detected is subjected to first-stage multi-scale feature extraction through a first backbone network, and the multi-scale feature is sent to a dynamic router to judge the difficulty level of the image; mapping the features to the difficulty scores through two linear mapping layers; if the image is judged to be a 'simple' image, the first-stage multi-scale feature is sent to the head part of YOLOv 8; if the image is judged to be a difficult image, the image to be detected and the first-stage multi-scale feature thereof are sent to a second backbone network, the second-stage multi-scale feature is extracted and obtained, and the second-stage multi-scale feature is sent to the head part of YOLOv 8.
Further, the implementation process of the novel dynamic cascade backbone network in the step (2) is as follows:
for the input image x, firstly, extracting the multi-scale feature F1 of the input image x, wherein the first trunk B1 is as follows:
wherein L is the number of stages, namely the number of multi-scale features; the router R will then use these multi-scale features F1 to predict the difficulty score φ ε (0, 1) for the image as:
if the router classifies the input image as a "simple" image, the immediately following neck header D1 will output the detection result y as:
if the router classifies the input image as a "complex" image, the multi-scale feature will require further enhancement of the second backbone, embedding the multi-scale feature F1 into H through a composite connection module G:
wherein G is DHLC implementing CBNet; the input image x is fed into the second trunk, and the characteristics of the second trunk are enhanced by summing the elements corresponding to the embedded H at each stage in turn, and are marked as follows:
as a result of the detection, the head and neck portion D2 of the second time is decoded as:
y=D 2 (F 1 )。
further, in the step (2), the shareseephead detection head shares convolution weights among different layers, and independently calculates statistics of BN; the ShareLepohead comprises a first convolution layer, a first depth separable convolution layer, a second convolution layer and a BN normalization layer which are connected in sequence.
Further, the first convolution layer is a 3x3 convolution layer, and the channel number of the input feature map is changed from x to c2 x 2; the first depth separable convolution layer firstly applies convolution operation to each input channel respectively, and then combines the characteristics among the channels; the second depth separable convolution layer reduces the number of channels of the input feature map from c2 to c2; the second convolution layer is a 1*1 convolution layer, changing the channel number of the input feature map from c2 to 4 x self reg_max; each detection head improves gradient propagation and training speed through BN normalization, which is performed by normalizing each small batch of data.
Further, step (3) said modified PolyLoss comprises combining a loss function and a weighted binary cross entropy loss; polyLoss combines binary cross entropy Loss and Focal Loss together, and improves the balance processing capacity between a difficult sample and positive and negative samples by adjusting the weight and shape of a Loss function; calculating binary cross entropy loss between a prediction result and a real label by using weighted binary cross entropy loss, and measuring the matching degree of the prediction and the real label; introducing an alpha_factor to weight the loss, so that the loss of the positive sample and the loss of the negative sample are adjusted to different degrees in calculation; polynomial adjustment factors are incorporated to increase the uncertainty of the sample probability predictions.
Further, the "simple" image is a single target image; the "difficult" images are two or more target images.
Based on the same inventive concept, the present invention proposes an apparatus device comprising a memory and a processor, wherein:
a memory for storing a computer program capable of running on the processor;
a processor for executing the steps of the novel dynamic cascade YOLOv 8-based automatic driving target recognition method as described above when running the computer program.
Based on the same inventive concept, the present invention proposes a storage medium having stored thereon a computer program which, when executed by at least one processor, implements the novel dynamic cascade YOLOv 8-based automatic driving target recognition method steps as described above.
The beneficial effects are that: compared with the prior art, the invention has the beneficial effects that: the automatic driving target recognition network based on the novel dynamic cascade YOLOv8 constructed by the invention enables the YOLOv8 backbone network to adaptively select reasoning routes for input images with different difficulties, and improves the feature extraction efficiency; in order to improve the detection precision of the YOLOv8, a brand new and improved PolyLoss loss function is used, a super-parameter search space is simplified, and polynomial coefficients are adjusted; to upgrade the YOLOv8 detection head, save more parameter, be more efficient, improve accuracy, use novel shared detection head, in order to enhance the model capacity to obtain higher performance; and finally, the target detection of automatic driving is more accurate.
Drawings
FIG. 1 is a schematic diagram of a dynamic cascade backbone network architecture;
FIG. 2 is a schematic diagram of a test head structure sharing convolution weights and separate batch normalization layers.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
The invention provides an automatic driving target identification method based on novel dynamic cascade YOLOv8, which specifically comprises the following steps:
step S1: the invention selects a KITTI data set, wherein the divided data set comprises a test set and a training set. Performance assessment is performed on the autopilot dataset.
Step S2: based on the YOLOv8 network foundation, the Backbone of the backhaul is entirely replaced by a novel Dynamic Cascade (Dynamic Cascade) Backbone.
As shown in fig. 1, a novel Dynamic Cascade (Dynamic Cascade) backbone network has two cascaded backbone networks, and a Dynamic router is inserted between the two backbone networks to automatically select an optimal route for each image to be detected.
An adaptive router: to better judge the difficulty level of the image, and to make a difficulty judgment based on the input multi-scale characteristic information. Assuming that the first backbone network has output multi-scale features, in order to reduce the computational complexity of the dynamic router, information compression is performed on the first backbone network to obtain compressed features, wherein the compressed features are global pooling operation and channel dimension splicing operation. The features are then mapped to difficulty scores by two linear mapping layers.
Two cascaded networks: for the input image x, firstly, extracting the multi-scale feature F1 of the input image x, wherein the first trunk B1 is as follows:
wherein L is the number of stages, i.e., the number of multi-scale features. The router R will then use these multi-scale features F1 to predict the difficulty score φ ε (0, 1) for the image as:
the "simple" image exits at the first trunk, while the "complex" image requires further processing. The simple image is a simple image of a single target pedestrian or a single vehicle; a "complex" image is two or more multiple classes of images. Specifically, if the router classifies an input image as a "simple" image, the immediately following neck header D1 will output a detection result y of:
conversely, if the router classifies the input image as a "complex" image, the multi-scale feature would require further enhancement of the second stem, rather than being immediately decoded by the neck head D1. In particular, the multiscale feature F1 is embedded into H by a composite connection module G:
where G is DHLC implementing CBNet. The input image x is then fed into a second trunk, whose features are enhanced by summing the elements of the embedded H corresponding at each stage in turn, noted as:
as a result of the detection, the head and neck portion D2 of the second time is decoded as:
y=D 2 (F 1 )。
through the above procedure, a "simple" image will only handle one backbone, while a "complex" image will handle two backbones. Obviously, with such an architecture, a tradeoff can be made between computation (i.e., speed) and accuracy.
Step S3: based on the YOLOv8 network, the default CIoU loss function is modified into a new PolyLoss classification loss function, and the detection precision is improved.
The Loss function combines the ideas of binary cross entropy Loss (BCEWithLogitsLoss) and Focal Loss (FL) for object classification in object detection tasks. Comprises the following parts:
combining the loss functions: polyLoss combines binary cross entropy Loss with Focal Loss to improve the model's ability to handle balance between difficult and positive and negative samples by adjusting the weight and shape of the Loss function.
Weighted binary cross entropy loss: polylos first calculates the binary cross entropy loss between the predicted outcome and the true label using nn. BCEWITHLogitLoss. This partial loss is used to measure how well the predictions match the real tags.
Focal Loss adjustment: to handle difficult samples, polyLoss introduced the idea in Focal Loss. By adjusting the prediction probability value, the sample with lower prediction probability plays a larger role in loss calculation, so that the attention to difficult samples is improved.
And (3) loss weight adjustment: by introducing alpha_factor, polyLoss weights the losses. This factor is determined according to the value of the real label, so that the losses of the positive and negative samples are adjusted to different degrees in the calculation.
Polynomial adjustment: in the last step, polyLoss introduces polynomial adjustment factors for increasing the uncertainty of the sample probability predictions. By adjusting the shape and coefficients of the polynomials, the loss can be increased when the sample probability is low or high, thereby further enhancing the attention to difficult samples.
The PolyLoss Loss function combines the ideas of binary cross entropy Loss and Focal Loss in a target detection task, and provides a Loss calculation mode capable of processing difficult samples and balancing positive and negative samples through polynomial adjustment and weight adjustment. This may help the model better learn and handle challenging target classification tasks.
Step S4: based on the YOLOv8 network, the detection head in the last part of the YOLOv8 network is modified to be a novel ShareLepohead detection head which is of a cross-scale and shared convolution weight, and an automatic driving target identification network based on novel dynamic cascading YOLOv8 is formed.
The YOLOv8 original detection head is the last layer of the network and is responsible for generating a prediction result of target detection. It maps feature maps to grids of different scales depending on the size of the input image and the design of the network. Each grid cell is responsible for detecting and locating one or more targets. At each scale, the detection head outputs a set of prediction boxes, each consisting of a plurality of attributes, typically including the coordinates of the bounding box (center coordinates and width-height), the probability of the target class, and the confidence score of the presence of the target. These prediction frames are post-processed by a non-maximal suppression (NMS) for filtering the overlapping frames and preserving the most accurate detection results. The detection head typically employs a combination of convolution and full-connection layers, with different convolution kernel sizes and strides to accommodate different scale target detection. The output of the detection head typically employs appropriate activation functions and normalization operations to ensure that the predicted results are within a proper range and to provide good interpretability and robustness.
As shown in fig. 2, the novel shared convolution weight shared across scales sharesepfead detection head: convolution weights are shared between the different layers, but the statistics of BN (battnorm) are calculated independently. This is a shared detection head, and real-time target detectors typically use separate detection heads for different feature scales to enhance model capabilities for higher performance, rather than sharing one detection head across multiple scales. The cross-scale shared detection head parameters are selected this time, but different batch normalization layer BN layers are adopted, so that the detection head parameters are reduced, and meanwhile, the precision is kept. BN is also more efficient than other normalization layers because it uses statistics computed in training directly in reasoning.
After passing through the YOLOv8head, the image starts to enter the sharesepfead detection head part prediction result. Each head comprises a first convolution layer, a first depth separable convolution layer, a second convolution layer and a BN normalization layer which are connected in sequence.
The first Conv 3x3 convolutional layer, which is a 3x3 convolutional layer, changes the number of channels of the input feature map from x to c2 x 2. It helps extract features and increase the number of channels to better capture information of the target.
A second DWConv 3x3 depth separable convolutional layer first applies a convolutional operation to each input channel separately and then combines the features between the channels. This helps to reduce the amount of computation and improve the efficiency of the model.
The third part DWConv 3x3 depth separable convolutional layer reduces the number of channels of the input signature from c2 to c2. Similar to the previous step, this layer continues to reduce the number of channels and extract higher level features.
The fourth partial Conv1 x 1 convolution layer changes the number of channels of the input signature from c2 to 4 x self reg_max. Is responsible for predicting the coordinate information of the bounding box.
And sharing the parameter information of the detection head among each head.
The gradient propagation and training speed of each detection head is improved through BN normalization, and the BN normalization can enable the activation value in the network to be kept in a relatively small range through normalization processing of data of each small batch, so that the problems of gradient disappearance and gradient explosion are relieved, gradient propagation is promoted, and the training process of the network is accelerated.
Step S5: and (3) training the automatic driving target recognition network based on the novel dynamic cascade YOLOv8 constructed in the step (S4) by using the divided data set. And evaluating the performance of the trained automatic driving target recognition network based on the novel dynamic cascade YOLOv8, and finally realizing target recognition in automatic driving.
Based on the same inventive concept, the present invention proposes an apparatus device comprising a memory and a processor, wherein: a memory for storing a computer program capable of running on the processor; a processor for executing the steps of the novel dynamic cascade YOLOv 8-based automatic driving target recognition method as described above when running the computer program.
Based on the same inventive concept, the invention also proposes a storage medium having stored thereon a computer program which, when executed by at least one processor, implements the novel dynamic cascade YOLOv 8-based automatic driving target recognition method steps as described above.
Thus far, the technical solution of the present invention has been described in connection with the specific experimental procedure shown in the drawings, but the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.
Claims (9)
1. An automatic driving target recognition method based on novel dynamic cascade YOLOv8 is characterized by comprising the following steps:
(1) Preprocessing a pre-acquired original image of a traffic vehicle, and dividing the pre-acquired original image into a training set and a testing set;
(2) Constructing an automatic driving target recognition network based on novel dynamic cascading YOLOv 8; the automatic driving target identification network integrally replaces a Backbone network of a backbond in the YOLOv8 network with a novel dynamic cascade Backbone network; replacing the detection head in the last part of the YOLOv8 network with a shareseep head detection head of a new cross-scale shared convolution weight;
(3) Adopting improved PolyLoss as a loss function of an automatic driving target recognition network;
(4) Training the automatic driving target recognition network type by utilizing the training set;
(5) Inputting the test set into a trained automatic driving target recognition network, and evaluating the automatic driving target recognition network.
2. The method for automatically identifying a driving target based on novel dynamic cascade YOLOv8 according to claim 1, wherein the novel dynamic cascade backbone network in step (2) has two cascade backbone networks, and a dynamic router is inserted between the two backbone networks to automatically select an optimal route for each image to be detected; the image to be detected is subjected to first-stage multi-scale feature extraction through a first backbone network, and the multi-scale feature is sent to a dynamic router to judge the difficulty level of the image; mapping the features to the difficulty scores through two linear mapping layers; if the image is judged to be a 'simple' image, the first-stage multi-scale feature is sent to the head part of YOLOv 8; if the image is judged to be a difficult image, the image to be detected and the first-stage multi-scale feature thereof are sent to a second backbone network, the second-stage multi-scale feature is extracted and obtained, and the second-stage multi-scale feature is sent to the head part of YOLOv 8.
3. The method for automatically identifying a driving target based on novel dynamic cascade YOLOv8 of claim 1, wherein the implementation process of the novel dynamic cascade backbone network in step (2) is as follows:
for the input image x, firstly, extracting the multi-scale feature F1 of the input image x, wherein the first trunk B1 is as follows:
wherein L is the number of stages, namely the number of multi-scale features; the router R will then use these multi-scale features F1 to predict the difficulty score φ ε (0, 1) for the image as:
if the router classifies the input image as a "simple" image, the immediately following neck header D1 will output the detection result y as:
if the router classifies the input image as a "complex" image, the multi-scale feature will require further enhancement of the second backbone, embedding the multi-scale feature F1 into H through a composite connection module G:
wherein G is DHLC implementing CBNet; the input image x is fed into the second trunk, and the characteristics of the second trunk are enhanced by summing the elements corresponding to the embedded H at each stage in turn, and are marked as follows:
as a result of the detection, the head and neck portion D2 of the second time is decoded as:
y=D 2 (F1)。
4. the automatic driving target recognition method based on novel dynamic cascade YOLOv8 of claim 1, wherein the shareseep head detection head in step (2) shares convolution weights among different layers, and calculates statistics of BN independently; the ShareLepohead comprises a first convolution layer, a first depth separable convolution layer, a second convolution layer and a BN normalization layer which are connected in sequence.
5. The method for automatically identifying a driving target based on novel dynamic cascade YOLOv8 according to claim 4, wherein the first convolution layer is a 3x3 convolution layer, and the number of channels of the input feature map is changed from x to c2 x 2; the first depth separable convolution layer firstly applies convolution operation to each input channel respectively, and then combines the characteristics among the channels; the second depth separable convolution layer reduces the number of channels of the input feature map from c2 to c2; the second convolution layer is a 1*1 convolution layer, changing the channel number of the input feature map from c2 to 4 x self reg_max; each detection head improves gradient propagation and training speed through BN normalization, which is performed by normalizing each small batch of data.
6. The method of automatically driving target recognition based on novel dynamic cascading YOLOv8 of claim 1, wherein the modified polylass of step (3) comprises a combined loss function and weighted binary cross entropy loss; polyLoss combines binary cross entropy Loss and Focal Loss together, and improves the balance processing capacity between a difficult sample and positive and negative samples by adjusting the weight and shape of a Loss function; calculating binary cross entropy loss between a prediction result and a real label by using weighted binary cross entropy loss, and measuring the matching degree of the prediction and the real label; introducing an alpha_factor to weight the loss, so that the loss of the positive sample and the loss of the negative sample are adjusted to different degrees in calculation; polynomial adjustment factors are incorporated to increase the uncertainty of the sample probability predictions.
7. The method for automatically identifying a driving target based on novel dynamic cascade YOLOv8 according to claim 2, wherein the "simple" image is a single target image; the "difficult" images are two or more target images.
8. An apparatus device comprising a memory and a processor, wherein:
a memory for storing a computer program capable of running on the processor;
processor for executing the method steps of the novel dynamic cascade YOLOv 8-based automatic driving target recognition method according to any of the claims 1-7 when running the computer program.
9. A storage medium, characterized in that it has stored thereon a computer program which, when executed by at least one processor, implements the method steps of the novel dynamic cascade YOLOv 8-based automatic driving target recognition method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310899627.4A CN116912796A (en) | 2023-07-21 | 2023-07-21 | Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310899627.4A CN116912796A (en) | 2023-07-21 | 2023-07-21 | Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116912796A true CN116912796A (en) | 2023-10-20 |
Family
ID=88350767
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310899627.4A Pending CN116912796A (en) | 2023-07-21 | 2023-07-21 | Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116912796A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117252904A (en) * | 2023-11-15 | 2023-12-19 | 南昌工程学院 | Target tracking method and system based on long-range space perception and channel enhancement |
CN117496362A (en) * | 2024-01-02 | 2024-02-02 | 环天智慧科技股份有限公司 | Land coverage change detection method based on self-adaptive convolution kernel and cascade detection head |
-
2023
- 2023-07-21 CN CN202310899627.4A patent/CN116912796A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117252904A (en) * | 2023-11-15 | 2023-12-19 | 南昌工程学院 | Target tracking method and system based on long-range space perception and channel enhancement |
CN117252904B (en) * | 2023-11-15 | 2024-02-09 | 南昌工程学院 | Target tracking method and system based on long-range space perception and channel enhancement |
CN117496362A (en) * | 2024-01-02 | 2024-02-02 | 环天智慧科技股份有限公司 | Land coverage change detection method based on self-adaptive convolution kernel and cascade detection head |
CN117496362B (en) * | 2024-01-02 | 2024-03-29 | 环天智慧科技股份有限公司 | Land coverage change detection method based on self-adaptive convolution kernel and cascade detection head |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111401201A (en) | Aerial image multi-scale target detection method based on spatial pyramid attention drive | |
CN111126472A (en) | Improved target detection method based on SSD | |
CN111259930A (en) | General target detection method of self-adaptive attention guidance mechanism | |
CN116912796A (en) | Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device | |
CN111428733B (en) | Zero sample target detection method and system based on semantic feature space conversion | |
CN111507370A (en) | Method and device for obtaining sample image of inspection label in automatic labeling image | |
CN112150493A (en) | Semantic guidance-based screen area detection method in natural scene | |
CN112949572A (en) | Slim-YOLOv 3-based mask wearing condition detection method | |
CN110942471A (en) | Long-term target tracking method based on space-time constraint | |
CN110008899B (en) | Method for extracting and classifying candidate targets of visible light remote sensing image | |
CN111753682A (en) | Hoisting area dynamic monitoring method based on target detection algorithm | |
CN110084284A (en) | Target detection and secondary classification algorithm and device based on region convolutional neural networks | |
CN114565842A (en) | Unmanned aerial vehicle real-time target detection method and system based on Nvidia Jetson embedded hardware | |
CN110197213B (en) | Image matching method, device and equipment based on neural network | |
Fan et al. | A novel sonar target detection and classification algorithm | |
US20220147748A1 (en) | Efficient object detection using deep learning techniques | |
CN114139564A (en) | Two-dimensional code detection method and device, terminal equipment and training method for detection network | |
CN116342894B (en) | GIS infrared feature recognition system and method based on improved YOLOv5 | |
CN113095072A (en) | Text processing method and device | |
CN115861595A (en) | Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning | |
CN115984671A (en) | Model online updating method and device, electronic equipment and readable storage medium | |
CN114927236A (en) | Detection method and system for multiple target images | |
CN114998611A (en) | Target contour detection method based on structure fusion | |
CN115063831A (en) | High-performance pedestrian retrieval and re-identification method and device | |
CN111160219B (en) | Object integrity evaluation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |