CN116912796A - Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device - Google Patents

Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device Download PDF

Info

Publication number
CN116912796A
CN116912796A CN202310899627.4A CN202310899627A CN116912796A CN 116912796 A CN116912796 A CN 116912796A CN 202310899627 A CN202310899627 A CN 202310899627A CN 116912796 A CN116912796 A CN 116912796A
Authority
CN
China
Prior art keywords
automatic driving
image
driving target
network
yolov8
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310899627.4A
Other languages
Chinese (zh)
Inventor
洪远
姜明新
杜强
黄俊闻
项靖
王杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaiyin Institute of Technology
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN202310899627.4A priority Critical patent/CN116912796A/en
Publication of CN116912796A publication Critical patent/CN116912796A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses an automatic driving target recognition method and device based on novel dynamic cascade YOLOv8, which are used for preprocessing a pre-acquired original image of a traffic vehicle and dividing the pre-acquired original image into a training set and a testing set; constructing an automatic driving target recognition network based on novel dynamic cascading YOLOv 8; the automatic driving target identification network integrally replaces a Backbone network of a backbond in the YOLOv8 network with a novel dynamic cascade Backbone network; replacing the detection head in the last part of the YOLOv8 network with a shareseep head detection head of a new cross-scale shared convolution weight; adopting improved PolyLoss as a loss function of an automatic driving target recognition network; training the automatic driving target recognition network type by utilizing the training set; inputting the test set into a trained automatic driving target recognition network, and evaluating the automatic driving target recognition network. The invention can improve the identification precision and speed of the target in the automatic driving and provide guarantee for the safety of the automatic driving.

Description

Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device
Technical Field
The invention belongs to the application of deep learning in the field of computer vision, and particularly relates to an automatic driving target recognition method and device based on novel dynamic cascade YOLOv 8.
Background
As one of the core problems of computer vision, target detection, which aims to find the category and position of a specific target in an image, is widely used in various fields such as automatic driving, remote sensing images, video monitoring, medical detection, and the like.
YOLO was developed, with version updates since 2016, and v8 has been reached to date. In 2016, a single-Stage (One-Stage) target detection method represented by YOLOv1 has emerged. In view of the development history of the single-Stage target detection method, the first single-Stage target detection method YOLOv1 was proposed to 2023, and the YOLOv series target detection method was developed along with the development of the single-Stage target detection, and has been a typical representative of One-Stage methods.
Although YOLOv8 can perform object detection quickly when processing simple images, it requires more time to detect when there are a large number of vehicles and pedestrians when facing complex scenes such as traffic jams in reality. The real-time performance of automatic driving is important for decision making, and the improvement of the processing speed is still to be improved. Also, accuracy is improved, and highly accurate target detection results are required for automatic driving to ensure correct response to various traffic conditions. Although YOLOv8 may perform well in some situations, in some situations where traffic is complex, the detection accuracy still needs to be improved. The background backbone of the prior art YOLOv8 is fast when processing simple images, but requires more time when encountering complex images with many targets; the existing YOLOv8 detection head model contains more parameters, so that the calculation complexity is higher. In autopilot systems, computational resources are limited, and thus more efficient model design is needed to ensure target detection in embedded or resource-constrained environments.
Disclosure of Invention
The invention aims to: the invention provides an automatic driving target recognition method and device based on novel dynamic cascade YOLOv8, which can accurately detect targets in automatic driving.
The technical scheme is as follows: the invention provides an automatic driving target identification method based on novel dynamic cascade YOLOv8, which specifically comprises the following steps:
(1) Preprocessing a pre-acquired original image of a traffic vehicle, and dividing the pre-acquired original image into a training set and a testing set;
(2) Constructing an automatic driving target recognition network based on novel dynamic cascading YOLOv 8; the automatic driving target identification network integrally replaces a Backbone network of a backbond in the YOLOv8 network with a novel dynamic cascade Backbone network; replacing the detection head in the last part of the YOLOv8 network with a shareseep head detection head of a new cross-scale shared convolution weight;
(3) Adopting improved PolyLoss as a loss function of an automatic driving target recognition network;
(4) Training the automatic driving target recognition network type by utilizing the training set;
(5) Inputting the test set into a trained automatic driving target recognition network, and evaluating the automatic driving target recognition network.
Further, the novel dynamic cascade backbone network in the step (2) is provided with two cascade backbone networks, and a dynamic router is inserted between the two backbone networks to automatically select an optimal route for each image to be detected; the image to be detected is subjected to first-stage multi-scale feature extraction through a first backbone network, and the multi-scale feature is sent to a dynamic router to judge the difficulty level of the image; mapping the features to the difficulty scores through two linear mapping layers; if the image is judged to be a 'simple' image, the first-stage multi-scale feature is sent to the head part of YOLOv 8; if the image is judged to be a difficult image, the image to be detected and the first-stage multi-scale feature thereof are sent to a second backbone network, the second-stage multi-scale feature is extracted and obtained, and the second-stage multi-scale feature is sent to the head part of YOLOv 8.
Further, the implementation process of the novel dynamic cascade backbone network in the step (2) is as follows:
for the input image x, firstly, extracting the multi-scale feature F1 of the input image x, wherein the first trunk B1 is as follows:
wherein L is the number of stages, namely the number of multi-scale features; the router R will then use these multi-scale features F1 to predict the difficulty score φ ε (0, 1) for the image as:
if the router classifies the input image as a "simple" image, the immediately following neck header D1 will output the detection result y as:
if the router classifies the input image as a "complex" image, the multi-scale feature will require further enhancement of the second backbone, embedding the multi-scale feature F1 into H through a composite connection module G:
wherein G is DHLC implementing CBNet; the input image x is fed into the second trunk, and the characteristics of the second trunk are enhanced by summing the elements corresponding to the embedded H at each stage in turn, and are marked as follows:
as a result of the detection, the head and neck portion D2 of the second time is decoded as:
y=D 2 (F 1 )。
further, in the step (2), the shareseephead detection head shares convolution weights among different layers, and independently calculates statistics of BN; the ShareLepohead comprises a first convolution layer, a first depth separable convolution layer, a second convolution layer and a BN normalization layer which are connected in sequence.
Further, the first convolution layer is a 3x3 convolution layer, and the channel number of the input feature map is changed from x to c2 x 2; the first depth separable convolution layer firstly applies convolution operation to each input channel respectively, and then combines the characteristics among the channels; the second depth separable convolution layer reduces the number of channels of the input feature map from c2 to c2; the second convolution layer is a 1*1 convolution layer, changing the channel number of the input feature map from c2 to 4 x self reg_max; each detection head improves gradient propagation and training speed through BN normalization, which is performed by normalizing each small batch of data.
Further, step (3) said modified PolyLoss comprises combining a loss function and a weighted binary cross entropy loss; polyLoss combines binary cross entropy Loss and Focal Loss together, and improves the balance processing capacity between a difficult sample and positive and negative samples by adjusting the weight and shape of a Loss function; calculating binary cross entropy loss between a prediction result and a real label by using weighted binary cross entropy loss, and measuring the matching degree of the prediction and the real label; introducing an alpha_factor to weight the loss, so that the loss of the positive sample and the loss of the negative sample are adjusted to different degrees in calculation; polynomial adjustment factors are incorporated to increase the uncertainty of the sample probability predictions.
Further, the "simple" image is a single target image; the "difficult" images are two or more target images.
Based on the same inventive concept, the present invention proposes an apparatus device comprising a memory and a processor, wherein:
a memory for storing a computer program capable of running on the processor;
a processor for executing the steps of the novel dynamic cascade YOLOv 8-based automatic driving target recognition method as described above when running the computer program.
Based on the same inventive concept, the present invention proposes a storage medium having stored thereon a computer program which, when executed by at least one processor, implements the novel dynamic cascade YOLOv 8-based automatic driving target recognition method steps as described above.
The beneficial effects are that: compared with the prior art, the invention has the beneficial effects that: the automatic driving target recognition network based on the novel dynamic cascade YOLOv8 constructed by the invention enables the YOLOv8 backbone network to adaptively select reasoning routes for input images with different difficulties, and improves the feature extraction efficiency; in order to improve the detection precision of the YOLOv8, a brand new and improved PolyLoss loss function is used, a super-parameter search space is simplified, and polynomial coefficients are adjusted; to upgrade the YOLOv8 detection head, save more parameter, be more efficient, improve accuracy, use novel shared detection head, in order to enhance the model capacity to obtain higher performance; and finally, the target detection of automatic driving is more accurate.
Drawings
FIG. 1 is a schematic diagram of a dynamic cascade backbone network architecture;
FIG. 2 is a schematic diagram of a test head structure sharing convolution weights and separate batch normalization layers.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
The invention provides an automatic driving target identification method based on novel dynamic cascade YOLOv8, which specifically comprises the following steps:
step S1: the invention selects a KITTI data set, wherein the divided data set comprises a test set and a training set. Performance assessment is performed on the autopilot dataset.
Step S2: based on the YOLOv8 network foundation, the Backbone of the backhaul is entirely replaced by a novel Dynamic Cascade (Dynamic Cascade) Backbone.
As shown in fig. 1, a novel Dynamic Cascade (Dynamic Cascade) backbone network has two cascaded backbone networks, and a Dynamic router is inserted between the two backbone networks to automatically select an optimal route for each image to be detected.
An adaptive router: to better judge the difficulty level of the image, and to make a difficulty judgment based on the input multi-scale characteristic information. Assuming that the first backbone network has output multi-scale features, in order to reduce the computational complexity of the dynamic router, information compression is performed on the first backbone network to obtain compressed features, wherein the compressed features are global pooling operation and channel dimension splicing operation. The features are then mapped to difficulty scores by two linear mapping layers.
Two cascaded networks: for the input image x, firstly, extracting the multi-scale feature F1 of the input image x, wherein the first trunk B1 is as follows:
wherein L is the number of stages, i.e., the number of multi-scale features. The router R will then use these multi-scale features F1 to predict the difficulty score φ ε (0, 1) for the image as:
the "simple" image exits at the first trunk, while the "complex" image requires further processing. The simple image is a simple image of a single target pedestrian or a single vehicle; a "complex" image is two or more multiple classes of images. Specifically, if the router classifies an input image as a "simple" image, the immediately following neck header D1 will output a detection result y of:
conversely, if the router classifies the input image as a "complex" image, the multi-scale feature would require further enhancement of the second stem, rather than being immediately decoded by the neck head D1. In particular, the multiscale feature F1 is embedded into H by a composite connection module G:
where G is DHLC implementing CBNet. The input image x is then fed into a second trunk, whose features are enhanced by summing the elements of the embedded H corresponding at each stage in turn, noted as:
as a result of the detection, the head and neck portion D2 of the second time is decoded as:
y=D 2 (F 1 )。
through the above procedure, a "simple" image will only handle one backbone, while a "complex" image will handle two backbones. Obviously, with such an architecture, a tradeoff can be made between computation (i.e., speed) and accuracy.
Step S3: based on the YOLOv8 network, the default CIoU loss function is modified into a new PolyLoss classification loss function, and the detection precision is improved.
The Loss function combines the ideas of binary cross entropy Loss (BCEWithLogitsLoss) and Focal Loss (FL) for object classification in object detection tasks. Comprises the following parts:
combining the loss functions: polyLoss combines binary cross entropy Loss with Focal Loss to improve the model's ability to handle balance between difficult and positive and negative samples by adjusting the weight and shape of the Loss function.
Weighted binary cross entropy loss: polylos first calculates the binary cross entropy loss between the predicted outcome and the true label using nn. BCEWITHLogitLoss. This partial loss is used to measure how well the predictions match the real tags.
Focal Loss adjustment: to handle difficult samples, polyLoss introduced the idea in Focal Loss. By adjusting the prediction probability value, the sample with lower prediction probability plays a larger role in loss calculation, so that the attention to difficult samples is improved.
And (3) loss weight adjustment: by introducing alpha_factor, polyLoss weights the losses. This factor is determined according to the value of the real label, so that the losses of the positive and negative samples are adjusted to different degrees in the calculation.
Polynomial adjustment: in the last step, polyLoss introduces polynomial adjustment factors for increasing the uncertainty of the sample probability predictions. By adjusting the shape and coefficients of the polynomials, the loss can be increased when the sample probability is low or high, thereby further enhancing the attention to difficult samples.
The PolyLoss Loss function combines the ideas of binary cross entropy Loss and Focal Loss in a target detection task, and provides a Loss calculation mode capable of processing difficult samples and balancing positive and negative samples through polynomial adjustment and weight adjustment. This may help the model better learn and handle challenging target classification tasks.
Step S4: based on the YOLOv8 network, the detection head in the last part of the YOLOv8 network is modified to be a novel ShareLepohead detection head which is of a cross-scale and shared convolution weight, and an automatic driving target identification network based on novel dynamic cascading YOLOv8 is formed.
The YOLOv8 original detection head is the last layer of the network and is responsible for generating a prediction result of target detection. It maps feature maps to grids of different scales depending on the size of the input image and the design of the network. Each grid cell is responsible for detecting and locating one or more targets. At each scale, the detection head outputs a set of prediction boxes, each consisting of a plurality of attributes, typically including the coordinates of the bounding box (center coordinates and width-height), the probability of the target class, and the confidence score of the presence of the target. These prediction frames are post-processed by a non-maximal suppression (NMS) for filtering the overlapping frames and preserving the most accurate detection results. The detection head typically employs a combination of convolution and full-connection layers, with different convolution kernel sizes and strides to accommodate different scale target detection. The output of the detection head typically employs appropriate activation functions and normalization operations to ensure that the predicted results are within a proper range and to provide good interpretability and robustness.
As shown in fig. 2, the novel shared convolution weight shared across scales sharesepfead detection head: convolution weights are shared between the different layers, but the statistics of BN (battnorm) are calculated independently. This is a shared detection head, and real-time target detectors typically use separate detection heads for different feature scales to enhance model capabilities for higher performance, rather than sharing one detection head across multiple scales. The cross-scale shared detection head parameters are selected this time, but different batch normalization layer BN layers are adopted, so that the detection head parameters are reduced, and meanwhile, the precision is kept. BN is also more efficient than other normalization layers because it uses statistics computed in training directly in reasoning.
After passing through the YOLOv8head, the image starts to enter the sharesepfead detection head part prediction result. Each head comprises a first convolution layer, a first depth separable convolution layer, a second convolution layer and a BN normalization layer which are connected in sequence.
The first Conv 3x3 convolutional layer, which is a 3x3 convolutional layer, changes the number of channels of the input feature map from x to c2 x 2. It helps extract features and increase the number of channels to better capture information of the target.
A second DWConv 3x3 depth separable convolutional layer first applies a convolutional operation to each input channel separately and then combines the features between the channels. This helps to reduce the amount of computation and improve the efficiency of the model.
The third part DWConv 3x3 depth separable convolutional layer reduces the number of channels of the input signature from c2 to c2. Similar to the previous step, this layer continues to reduce the number of channels and extract higher level features.
The fourth partial Conv1 x 1 convolution layer changes the number of channels of the input signature from c2 to 4 x self reg_max. Is responsible for predicting the coordinate information of the bounding box.
And sharing the parameter information of the detection head among each head.
The gradient propagation and training speed of each detection head is improved through BN normalization, and the BN normalization can enable the activation value in the network to be kept in a relatively small range through normalization processing of data of each small batch, so that the problems of gradient disappearance and gradient explosion are relieved, gradient propagation is promoted, and the training process of the network is accelerated.
Step S5: and (3) training the automatic driving target recognition network based on the novel dynamic cascade YOLOv8 constructed in the step (S4) by using the divided data set. And evaluating the performance of the trained automatic driving target recognition network based on the novel dynamic cascade YOLOv8, and finally realizing target recognition in automatic driving.
Based on the same inventive concept, the present invention proposes an apparatus device comprising a memory and a processor, wherein: a memory for storing a computer program capable of running on the processor; a processor for executing the steps of the novel dynamic cascade YOLOv 8-based automatic driving target recognition method as described above when running the computer program.
Based on the same inventive concept, the invention also proposes a storage medium having stored thereon a computer program which, when executed by at least one processor, implements the novel dynamic cascade YOLOv 8-based automatic driving target recognition method steps as described above.
Thus far, the technical solution of the present invention has been described in connection with the specific experimental procedure shown in the drawings, but the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims (9)

1. An automatic driving target recognition method based on novel dynamic cascade YOLOv8 is characterized by comprising the following steps:
(1) Preprocessing a pre-acquired original image of a traffic vehicle, and dividing the pre-acquired original image into a training set and a testing set;
(2) Constructing an automatic driving target recognition network based on novel dynamic cascading YOLOv 8; the automatic driving target identification network integrally replaces a Backbone network of a backbond in the YOLOv8 network with a novel dynamic cascade Backbone network; replacing the detection head in the last part of the YOLOv8 network with a shareseep head detection head of a new cross-scale shared convolution weight;
(3) Adopting improved PolyLoss as a loss function of an automatic driving target recognition network;
(4) Training the automatic driving target recognition network type by utilizing the training set;
(5) Inputting the test set into a trained automatic driving target recognition network, and evaluating the automatic driving target recognition network.
2. The method for automatically identifying a driving target based on novel dynamic cascade YOLOv8 according to claim 1, wherein the novel dynamic cascade backbone network in step (2) has two cascade backbone networks, and a dynamic router is inserted between the two backbone networks to automatically select an optimal route for each image to be detected; the image to be detected is subjected to first-stage multi-scale feature extraction through a first backbone network, and the multi-scale feature is sent to a dynamic router to judge the difficulty level of the image; mapping the features to the difficulty scores through two linear mapping layers; if the image is judged to be a 'simple' image, the first-stage multi-scale feature is sent to the head part of YOLOv 8; if the image is judged to be a difficult image, the image to be detected and the first-stage multi-scale feature thereof are sent to a second backbone network, the second-stage multi-scale feature is extracted and obtained, and the second-stage multi-scale feature is sent to the head part of YOLOv 8.
3. The method for automatically identifying a driving target based on novel dynamic cascade YOLOv8 of claim 1, wherein the implementation process of the novel dynamic cascade backbone network in step (2) is as follows:
for the input image x, firstly, extracting the multi-scale feature F1 of the input image x, wherein the first trunk B1 is as follows:
wherein L is the number of stages, namely the number of multi-scale features; the router R will then use these multi-scale features F1 to predict the difficulty score φ ε (0, 1) for the image as:
if the router classifies the input image as a "simple" image, the immediately following neck header D1 will output the detection result y as:
if the router classifies the input image as a "complex" image, the multi-scale feature will require further enhancement of the second backbone, embedding the multi-scale feature F1 into H through a composite connection module G:
wherein G is DHLC implementing CBNet; the input image x is fed into the second trunk, and the characteristics of the second trunk are enhanced by summing the elements corresponding to the embedded H at each stage in turn, and are marked as follows:
as a result of the detection, the head and neck portion D2 of the second time is decoded as:
y=D 2 (F1)。
4. the automatic driving target recognition method based on novel dynamic cascade YOLOv8 of claim 1, wherein the shareseep head detection head in step (2) shares convolution weights among different layers, and calculates statistics of BN independently; the ShareLepohead comprises a first convolution layer, a first depth separable convolution layer, a second convolution layer and a BN normalization layer which are connected in sequence.
5. The method for automatically identifying a driving target based on novel dynamic cascade YOLOv8 according to claim 4, wherein the first convolution layer is a 3x3 convolution layer, and the number of channels of the input feature map is changed from x to c2 x 2; the first depth separable convolution layer firstly applies convolution operation to each input channel respectively, and then combines the characteristics among the channels; the second depth separable convolution layer reduces the number of channels of the input feature map from c2 to c2; the second convolution layer is a 1*1 convolution layer, changing the channel number of the input feature map from c2 to 4 x self reg_max; each detection head improves gradient propagation and training speed through BN normalization, which is performed by normalizing each small batch of data.
6. The method of automatically driving target recognition based on novel dynamic cascading YOLOv8 of claim 1, wherein the modified polylass of step (3) comprises a combined loss function and weighted binary cross entropy loss; polyLoss combines binary cross entropy Loss and Focal Loss together, and improves the balance processing capacity between a difficult sample and positive and negative samples by adjusting the weight and shape of a Loss function; calculating binary cross entropy loss between a prediction result and a real label by using weighted binary cross entropy loss, and measuring the matching degree of the prediction and the real label; introducing an alpha_factor to weight the loss, so that the loss of the positive sample and the loss of the negative sample are adjusted to different degrees in calculation; polynomial adjustment factors are incorporated to increase the uncertainty of the sample probability predictions.
7. The method for automatically identifying a driving target based on novel dynamic cascade YOLOv8 according to claim 2, wherein the "simple" image is a single target image; the "difficult" images are two or more target images.
8. An apparatus device comprising a memory and a processor, wherein:
a memory for storing a computer program capable of running on the processor;
processor for executing the method steps of the novel dynamic cascade YOLOv 8-based automatic driving target recognition method according to any of the claims 1-7 when running the computer program.
9. A storage medium, characterized in that it has stored thereon a computer program which, when executed by at least one processor, implements the method steps of the novel dynamic cascade YOLOv 8-based automatic driving target recognition method according to any one of claims 1-7.
CN202310899627.4A 2023-07-21 2023-07-21 Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device Pending CN116912796A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310899627.4A CN116912796A (en) 2023-07-21 2023-07-21 Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310899627.4A CN116912796A (en) 2023-07-21 2023-07-21 Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device

Publications (1)

Publication Number Publication Date
CN116912796A true CN116912796A (en) 2023-10-20

Family

ID=88350767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310899627.4A Pending CN116912796A (en) 2023-07-21 2023-07-21 Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device

Country Status (1)

Country Link
CN (1) CN116912796A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117252904A (en) * 2023-11-15 2023-12-19 南昌工程学院 Target tracking method and system based on long-range space perception and channel enhancement
CN117496362A (en) * 2024-01-02 2024-02-02 环天智慧科技股份有限公司 Land coverage change detection method based on self-adaptive convolution kernel and cascade detection head

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117252904A (en) * 2023-11-15 2023-12-19 南昌工程学院 Target tracking method and system based on long-range space perception and channel enhancement
CN117252904B (en) * 2023-11-15 2024-02-09 南昌工程学院 Target tracking method and system based on long-range space perception and channel enhancement
CN117496362A (en) * 2024-01-02 2024-02-02 环天智慧科技股份有限公司 Land coverage change detection method based on self-adaptive convolution kernel and cascade detection head
CN117496362B (en) * 2024-01-02 2024-03-29 环天智慧科技股份有限公司 Land coverage change detection method based on self-adaptive convolution kernel and cascade detection head

Similar Documents

Publication Publication Date Title
CN111401201A (en) Aerial image multi-scale target detection method based on spatial pyramid attention drive
CN111126472A (en) Improved target detection method based on SSD
CN111259930A (en) General target detection method of self-adaptive attention guidance mechanism
CN116912796A (en) Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device
CN111428733B (en) Zero sample target detection method and system based on semantic feature space conversion
CN111507370A (en) Method and device for obtaining sample image of inspection label in automatic labeling image
CN112150493A (en) Semantic guidance-based screen area detection method in natural scene
CN112949572A (en) Slim-YOLOv 3-based mask wearing condition detection method
CN110942471A (en) Long-term target tracking method based on space-time constraint
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN111753682A (en) Hoisting area dynamic monitoring method based on target detection algorithm
CN110084284A (en) Target detection and secondary classification algorithm and device based on region convolutional neural networks
CN114565842A (en) Unmanned aerial vehicle real-time target detection method and system based on Nvidia Jetson embedded hardware
CN110197213B (en) Image matching method, device and equipment based on neural network
Fan et al. A novel sonar target detection and classification algorithm
US20220147748A1 (en) Efficient object detection using deep learning techniques
CN114139564A (en) Two-dimensional code detection method and device, terminal equipment and training method for detection network
CN116342894B (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN113095072A (en) Text processing method and device
CN115861595A (en) Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning
CN115984671A (en) Model online updating method and device, electronic equipment and readable storage medium
CN114927236A (en) Detection method and system for multiple target images
CN114998611A (en) Target contour detection method based on structure fusion
CN115063831A (en) High-performance pedestrian retrieval and re-identification method and device
CN111160219B (en) Object integrity evaluation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination