CN114119965A

CN114119965A - Road target detection method and system

Info

Publication number: CN114119965A
Application number: CN202111447972.1A
Authority: CN
Inventors: 邓立霞; 李洪泉; 刘海英; 陈奂宇; 张肖轶群; 毕凌云
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2022-03-01

Abstract

The invention provides a road target detection method and a system, comprising the following steps: acquiring road related image information; obtaining a road target detection result according to the acquired image information and a preset road target detection model; the road target detection model is obtained by adopting a deep network learning method; in the model, a cavity receptive field module which integrates cavity convolution and depth separable convolution is added, when the downsampling operation is carried out on the characteristic diagram, three branches are adopted to carry out convolution operation with different cavity rates, the cavity convolution operation is carried out only once on each branch, and the cavity convolution is utilized to contain a larger receptive field, so that multi-scale information can be captured, and the loss of characteristic information is effectively reduced; in addition, the hole convolution operation is only carried out once in each branch, so that the problem of image feature loss caused by multiple hole convolution stacking is solved; the conventional convolution operation is replaced by the deep separable convolution, and the amount of calculation and parameters are reduced.

Description

Road target detection method and system

Technical Field

The invention belongs to the technical field of road detection, and particularly relates to a road target detection method and system.

Background

Automobiles have become a main vehicle in modern society, and bring much convenience to people; however, as the number of automobiles increases, traffic problems caused by the automobiles, such as urban congestion and traffic accidents, are increasing; therefore, it is important to search for objects such as vehicles and pedestrians driving on the road, and to set vehicle diversion, speed limit and red lights according to the detection result; the intelligent detection of the road target by adopting the deep learning method is a main means and development direction for detecting the road target in the prior art and the future.

The inventor finds that the existing road detection method based on deep learning has the following problems: the residual block is connected by adopting the traditional convolution operation downsampling, so that certain characteristic loss exists in the method for reducing the image size and increasing the receptive field; in the traditional convolution operation, the calculated amount and the parameter amount are too large, more calculation resources are occupied, and the calculation efficiency and the calculation precision are influenced.

Disclosure of Invention

The invention provides a road target detection method and a system for solving the problems, wherein when the characteristic diagram is subjected to down-sampling operation, three branches are added for carrying out convolution operation with different voidage rates; the cavity convolution is utilized to contain a larger receptive field, so that the multi-scale information can be captured, and the loss of characteristic information is effectively reduced; in addition, the hole convolution operation is only carried out once in each branch, so that the problem of image feature loss caused by multiple hole convolution stacking is solved; the traditional convolution operation is replaced by the deep separable convolution, so that the calculated amount and the parameter amount are greatly reduced; only a small amount of computing resources are occupied, and computing precision is improved.

In order to achieve the purpose, the invention is realized by the following technical scheme:

in a first aspect, the present invention provides a road target detection method, including:

acquiring road related image information;

obtaining a road target detection result according to the acquired image information and a preset road target detection model;

the road target detection model is obtained by adopting a deep network learning method; in the model, a cavity receptive field module which integrates cavity convolution and depth separable convolution is added, when down-sampling operation is carried out on the characteristic diagram, convolution operation with different cavity rates is carried out by adopting three branches, and only one cavity convolution operation is carried out on each branch.

Further, the cavity convolution adds a cavity expansion receptive field in a convolution kernel, so that the receptive field of the convolution kernel under the same parameter quantity kernel calculation quantity is increased; and introducing a dilation rate, and defining the interval distance of the numerical values in the convolution kernel through the dilation rate.

Further, feature fusion is performed on feature graphs obtained by the three branches, and the specific operation is as follows:

wherein F represents the feature map after fusion,

indicating a Concat connection operation.

Further, the feature graph after feature fusion is subjected to traditional convolution operation to reduce the number of channels, and then is subjected to Add superposition operation with the input feature graph.

Further, the convolution operations of the three branches with different void rates include:

the convolution operation of branch 1 is:

wherein, F₁ ^H×W×(C/2)Output feature graph, F, representing a conventional convolution₂ ^H×W×CShows the output characteristic diagram of branch 1, A¹ ^×1×(C/2)Representing a conventional convolution operation with a filter size of 1x1 and a number of C/2, F^H×W×CInput feature maps representing width, height and number of channels H, W and C, respectively, representing convolution operations;

the convolution operation of branch 2 is:

F₃ ^H×W×(C/2)＝F^H×W×C*A^1×1×(C/2)

F₄ ^H×W×C＝F₃ ^H×W×(^C/2)*B^3×3×C

wherein, F₃ ^H×W×(C/2)Output feature graph, F, representing a conventional convolution₄ ^H×W×COutput characteristic diagram of branch 2, B³ ^×3×CRepresenting a hole convolution and a depth separable convolution operation;

the convolution operation of branch 3 is:

F₅ ^H×W×(C/2)＝F^H×W×C*A^1×1×(C/2)

F₆ ^H×W×C＝F₅ ^H×W×(C/2)*B^3×3×C

wherein, F₅ ^H×W×(C/2)Output feature graph, F, representing a conventional convolution₆ ^H×W×CRepresenting the output of branch 3And (5) feature diagrams.

Further, for the feature extraction network in the model, the outputs of the 3 rd, 4 th and 5 th residual blocks are taken as the inputs of the feature extraction network respectively, and an upsampling operation and a Concat operation exist, and three feature maps with different scales are output.

Furthermore, in the model, a cross-phase dense connection structure is added on the basis of the original Darknet53 network structure.

Further, the image information includes vehicle information and pedestrian information; and preprocessing image distortion and random rotation is carried out on the image information subjected to the road target detection model training.

In a second aspect, the invention further provides a road target detection system, which comprises a data acquisition module and a target detection module;

the data acquisition module configured to: acquiring road related image information;

the object detection module configured to: obtaining a road target detection result according to the acquired image information and a preset road target detection model;

In a third aspect, the present invention also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, performs the steps of the road object detection method of the first aspect.

In a fourth aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the road object detecting method according to the first aspect.

Compared with the prior art, the invention has the beneficial effects that:

when the feature graph is subjected to down-sampling operation, three branches are added to carry out convolution operation with different void rates; the cavity convolution is utilized to contain a larger receptive field, so that the multi-scale information can be captured, and the loss of characteristic information is effectively reduced; in addition, the hole convolution operation is only carried out once in each branch, so that the problem of image feature loss caused by multiple hole convolution stacking is solved; the traditional convolution operation is replaced by the deep separable convolution, so that the calculated amount and the parameter amount are greatly reduced; only a small amount of computing resources are occupied, and computing precision is improved.

Drawings

The accompanying drawings, which form a part hereof, are included to provide a further understanding of the present embodiments, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the present embodiments and together with the description serve to explain the present embodiments without unduly limiting the present embodiments.

Fig. 1 is a structure diagram of a ResNet module network according to embodiment 1 of the present invention;

FIG. 2 is a diagram showing a structure of a Darknet53 network according to embodiment 1 of the present invention;

FIG. 3 is a diagram of a hole convolution module DP-module network structure according to embodiment 1 of the present invention;

fig. 4 is a network configuration diagram of an improvement to Darknet53 according to embodiment 1 of the present invention;

FIG. 5 is a graph showing the detection effect obtained by the Pythroch frame in example 1 of the present invention.

The specific implementation mode is as follows:

the invention is further described with reference to the following figures and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

In recent years, with the appearance of image processors with strong calculation power and large-scale data samples, deep learning develops rapidly, and the process of manually selecting and extracting features is replaced by introducing convolutional neural network self-learning target features, so that the real-time performance and the accuracy of target detection are effectively improved.

Two-stage target detection algorithms based on candidate regions, such as R-CNN, SPP-Net, Fast R-CNN, FasterR-CNN, MaskR-CNN and TridentNet, continuously improve target detection accuracy, but because the network structure is complex, target tasks are realized in stages, so that the detection speed is low, and real-time target detection is difficult to realize; the single-stage target detection algorithms such as SDD, DSDD and YOLO series algorithms convert target detection into regression problems, greatly improve the detection speed, but have slightly lower detection precision than the former; both types of target detection algorithms achieve significant results, mainly due to high-performance hardware devices and powerful GPU computing power; due to the complex network structure, large parameter quantity, more occupied physical memory and long training time, the method is difficult to apply to mobile terminals such as smart phones, unmanned planes or other cheap equipment, and especially the ultrahigh delay caused by the limitation of hardware equipment has great influence on the detection speed; the traditional convolution process has certain resource waste while increasing the calculated amount, and related students cut redundant features to achieve the effect of network lightweight through pruning a network structure, but also can reduce the detection precision of the network, such as lightweight versions YOLO-tiny SSD of single-stage target detection, but the detection precision is greatly reduced; some lightweight network structures appearing in the years, such as SqueezeNet, MobileNet and sheffenet, have better detection accuracy compared with a lightweight version of single-target detection while maintaining the detection frame rate.

Considering that the application scene is the detection of a road target, higher real-time performance is required; although the double-stage target detection has higher detection precision, the detection speed is too low, and the real-time detection of the fast-running vehicle is difficult to realize; therefore, the invention mainly relies on a YOLO series algorithm, selects a YOLOv3 algorithm for improvement, and improves the detection effect on the premise of ensuring the practicability.

Example 1:

the embodiment provides a road target detection method, which comprises the following steps:

s1: making a data set for model training and testing, and preprocessing and format conversion are carried out on the data set;

s2: building a virtual environment for model training and testing, wherein in the embodiment, a Pycharm compiling tool can be adopted for realizing;

s3, constructing a YOLOv3 backbone network Darknet53 network structure and a multi-scale feature extraction network structure;

s4: on the basis of the original Darknet53 network structure, a cross-stage dense connection structure is added, so that the fusion of semantic information and detail information of a feature layer is improved, and the loss of image features in a deep network is reduced; meanwhile, in YOLOv3, the residual block is connected with the sample by adopting the traditional convolution operation with the step length of 2, and certain characteristic loss exists in the method for reducing the image size and increasing the receptive field; in the embodiment, a cavity reception field module combining cavity convolution and depth separable convolution is designed, so that a larger reception field can be obtained under the same resolution and the feature loss caused by the traditional convolution can be reduced;

s5: training an improved YOLOv3 model, and keeping a final weight file when the loss function is not converging;

s6: and testing the improved model, detecting the target class and marking the target class by using a square frame.

In the present embodiment, the processing procedure of step S1 is as follows:

s11: in this embodiment, 17900 related vehicle data sets are mainly created, including a partial coco (common Objects in context) data set, a partial VOC2007 data set, and a self-made real shooting data set; the selected COCO data set and the VOC2007 data set are related to vehicles and pedestrians, and mainly include five categories: car, bus, bicycle, motorcycle, and person;

s12, carrying out preprocessing operation of image distortion and random rotation on the manufactured data set, aiming at expanding the data set, improving the target class characteristics and facilitating the training of an optimization model;

s13: the data set tag format adopted by the embodiment is a VOC XML format, the JSON format of the COCO data set needs to be converted into the XML format, the self-made data set is labeled by a target detection labeling tool label, and meanwhile, the tag type is stored into the XML format;

s14: the input feature size adopted in the embodiment is 416x416, and the size of the image after preprocessing and format conversion is uniformly adjusted to 416x 416;

s15: the data set was divided into 9: 1 into a training set and a test set, wherein 10% of the training set is delimited as a validation set.

In the present embodiment, the processing procedure of step S2 is as follows:

s21, training the model in the Windows10 system, and building a virtual environment through Anaconda; installing a Pythrch and other necessary toolkits in the virtual environment;

s22: in order to fully utilize the current hardware equipment and accelerate the model training, cuda and cudnn of corresponding versions need to be installed; the cuda version used in this example is 10.2, cudnn version is 7.4.5;

in the present embodiment, the processing procedure of step S3 is as follows:

s31, the Darknet53 backbone network comprises a large number of residual error network structures, the problem of gradient disappearance caused by the deepening of the network can be effectively solved, and the residual error network adopted by the Darknet53 can be expressed by a formula:

X₁＝σ{β(W₁,X)}

X₂＝σ{β(W₂,X₁)}

X₃＝X+X₂

wherein X represents the input characteristics of the residual structure, (W)₁X) represents the input features as weighted by W₁Is a convolution operation in which W₁Has a convolution kernel size of 1x1, beta represents batch normalization operation, sigma represents nonlinear activation Relu operation, and W₂,X₁) Representing the input features by a weight W₂Is a convolution operation in which W₂Has a convolution kernel size of 1X1, X₂Features of the skeleton output, X, representing the residual structure₃Representing the final output characteristics of the residual structure;

the residual error structure adopted by YOLOv3 is characterized in that firstly, convolution operation with the filter size of 1x1 is carried out on input features to reduce channel number compression features, then batch normalization and Relu activation are carried out on the features after convolution, secondly, convolution kernels with the size of 3x3 are adopted to carry out second convolution operation to increase channel number expansion features, batch normalization and Relu activation are carried out on the features after convolution in the same way, feature graphs output after two convolution operations are overlapped with input features Add, and final features are output;

s32, performing conventional 3x3 convolution operation on input features by a Darknet53 backbone network, then respectively superposing five residual blocks, wherein each residual block respectively comprises 1, 2, 8 and 4 residual network structures shown by S31, and the two residual blocks are connected with each other in a downsampling mode through the convolution operation with the filter size of 3x3 and the step length of 2;

s33: the construction of a network structure related to multi-scale feature extraction mainly introduces the idea of FPN, and specifically comprises the following steps: for the Darknet53 feature extraction network, the output of the No. 3, No. 4 and No. 5 residual blocks are taken as the input of the multi-scale feature extraction network, the convolution adopted in the multi-scale feature extraction network has the size of 1x1 and 3x3, and the upsampling operation and the Concat operation exist, and finally feature maps of three different scales are output: 13x13, 26x26 and 52x52 are used for adapting to the detection of targets with different sizes and improving the detection effect of small target objects.

In the present embodiment, the processing procedure of step S4 is as follows:

s41, although the residual error network adopted by the Darknet53 can keep most of image characteristics, the image still loses part of detail characteristics in the deep network along with the continuous deepening of the network structure; for the problem, in the embodiment, by using the thought of a DenseNet residual error network for reference, a cross-stage dense connection structure is designed, so that the transfer capability of image detail features is enhanced, and the target positioning and identification effects are further improved;

s42: in order to ensure the calculation speed and the model inference time and consider the problem of deployment and application of the training model, a dense connection mode with very dense DenseNet is not adopted in the embodiment, but a dense connection structure with fast residual error spanning is adopted;

s43, specifically, the method comprises the following steps: respectively superposing the feature map after the first convolution of Darknet53 and the output feature maps of the 1 st, 2 nd and 3 rd residual error blocks to the input ends of the 2 nd, 3 rd, 4 th and 5 th residual error blocks through convolution operation, and adding four side-position residual error structures across the residual error blocks in total;

s44, in the DenseNet residual error network, the size of the feature map does not change along with the deepening of the network structure, and in Darknet53, the size has a down-sampling trend, so that the added side bit residual error in the embodiment is subjected to convolution operation with the step length of 2 and the convolution kernel size of 3x3, and the convolved feature map is subjected to batch normalization and nonlinear activation in the same way;

s45: the overlay mode adopted in this embodiment is different from the DenseNet residual error network, and Add connection is adopted in this embodiment.

The Darknet53 network is a network structure with a complex structure and large parameter quantity, in order to expand the receptive field, each residual block of the Darknet53 is connected by down-sampling, and although the connection mode of direct down-sampling expands the receptive field, the method is favorable for extracting features, but certain features are lost, and the method is not favorable for detecting small targets.

The cavity convolution adds a cavity expansion receptive field in a convolution kernel, so that the original convolution kernel of 3x3 has a receptive field of 5x5 or even larger under the same parameter kernel calculation amount; the hole convolution introduces a new hyper-parameter in the traditional convolution, namely the expansion rate, and the parameter defines the interval distance of numerical values in a convolution kernel; the normal convolution variance ratio is 1, and if the variance ratio is greater than 1, the field of hollow convolution becomes large. The receptive field of the hole convolution is shown in the formula:

n＝k+(k-1)×(d-1)

where n represents the receptive field of the hole convolution, k represents the size of the convolution kernel, and d represents the hole rate.

Although the cavity convolution can effectively enlarge the receptive field, the wrong cavity convolution can weaken the feature extraction capability of the network; firstly, in this embodiment, the hole convolution with the same specification is adopted for multiple times to replace the conventional convolution, which may cause a part of pixel information to be ignored and the continuity of information to be lost, which is fatal to the task of pixel-level dense prediction; secondly, the detection of a small target is adversely affected if the voidage is set too large, and the expansion of the receptive field is not obvious if the voidage is set too small; based on this, a three-branch cavity receptive field module-DP-module is designed in this embodiment, so as to increase the feature extraction capability of the network and improve the detection effect on small targets.

The characteristic output process of the cavity receptive field module is shown as a formula, wherein A^1×1×(C/2)Representing the traditional convolution operation, the filter size is 1x1, and the number is C/2; b is^3×3×CRepresenting the separable convolution operation of the void convolution and the depth, wherein the size of a filter is 3x3, and the number of the filters is C; represents a convolution operation; f^H×W×CRepresenting an input feature map with width, height and channel number of H, W, C respectively.

The convolution operation for branch 1 is shown in the formula:

F₁ ^H×W×(C/2)＝F^H×W×C*A^1×1×(C/2)

F₂ ^H×W×C＝F₁ ^H×W×(C/2)*B^3×3×C

wherein, F₁ ^H×W×(C/2)Output feature graph, F, representing a conventional convolution₂ ^H×W×CThe output characteristic diagram of the branch 1 is shown;

the convolution operation of branch 2 is shown in the formula:

F₃ ^H×W×(C/2)＝F^H×W×C*A^1×1×(C/2)

F₄ ^H×W×C＝F₃ ^H×W×(C/2)*B^3×3×C

wherein, F₃ ^H×W×(C/2)Output feature graph, F, representing a conventional convolution₄ ^H×W×CThe output characteristic diagram of the branch 2 is shown;

the convolution operation of branch 3 is shown in the formula:

F₅ ^H×W×(C/2)＝F^H×W×C*A^1×1×(C/2)

F₆ ^H×W×C＝F₅ ^H×W×(C/2)*B^3×3×C

wherein, F₅ ^H×W×(C/2)Output feature graph, F, representing a conventional convolution₆ ^H×W×CA graph representing the output characteristics of branch 3;

and performing feature fusion on the feature graphs obtained by the three branches, wherein the specific operation is shown as a formula:

wherein F represents the feature map after fusion,

indicating a Concat connection operation.

In this embodiment, downsampling operation as described in the formula is performed on the feature map F; compared with the convolution operation of directly carrying out down sampling, the cavity receptive field module is firstly added with three branches to carry out convolution operation with different cavity rates, and the cavity convolution is utilized to contain a larger receptive field, so that the cavity receptive field module can capture multi-scale information, and the loss of characteristic information is effectively reduced; in addition, the hole convolution operation is only carried out once in each branch, so that the problem of image feature loss caused by multiple hole convolution stacking is solved; the multi-branch network structure design and the incremental voidage also effectively improve the feature extraction of targets with different scales; in the embodiment, the depth separable convolution replaces the traditional convolution operation, so that the calculated amount and the parameter amount are greatly reduced, and the cavity receptive field module only needs to occupy a small amount of calculation resources, so that the calculation precision can be obviously improved.

In this embodiment, the processing procedure of step S5 is as follows:

the YOLOv 3-based road target detection algorithm according to claim 1, wherein the processing procedure of step S5 is as follows:

s51, for the data set, according to 9: 1, distributing a training set and a test set, respectively storing picture names corresponding to the training set and the test set under a train.txt folder and a test.txt folder, selecting 10% of the training set as a verification set, and storing the picture names corresponding to the verification set under a val.txt folder;

s52: and converting the target information in the VOC data set format into the txt format, wherein the extracted information comprises the type information and the position information of the target. And respectively storing the training set, the verification set and the test set in different txt files.

S53: newly building an engineering file named as YOLO-master in Pycharm, calling a path of a related configuration text file of a data set in the engineering file, and writing an improved algorithm in the YOLO-master engineering file by using python, wherein the improved algorithm comprises a forward propagation process and a backward propagation process of the algorithm and executable files of training, testing and MAP testing;

s54: aiming at the improved network structure, pre-training weights of YOLOv3 given by the official website are loaded, weights matched with key values are automatically loaded, and weight parts unmatched with the key values are manually loaded;

s55: transmitting initialization parameters including the size of an image, the category number of a detection target, the anchor size generated by k-means clustering and a weight file path into a script file;

s56, training the prepared engineering file, setting parameters epoch as 150 and batch as 6, wherein the first 40epoch is a training process of freezing the backbone network, 40-150 are training processes of all network structures, the training time is 36 hours, and when the loss value tends to be convergent, stopping training and storing the final weight file;

in the present embodiment, the processing procedure of step S6 is as follows:

s61: adopting the test set in S1 as a test sample file;

s62: taking the test set sample file as the input of the improved YOLOv3 network model, loading the trained weight file, and running the test to generate an executable file;

s63: inputting a path of a picture to be tested specifically to obtain a target detection result, and marking the type, type confidence coefficient and position of the target in the picture.

Example 2:

the embodiment further verifies a road target detection method, which is implemented by a road target detection algorithm based on YOLOv3, and includes: data set preprocessing, virtual environment deployment, python programming implementation, model training and model testing.

Specifically, in the present embodiment, the data set is used to perform image warping and random rotation operations for feature enhancement; wherein the training dataset comprises a partial Pascal VOC2007 dataset, a coco dataset, and a self-made dataset, total 19900; meanwhile, the data set is converted into a data format which is convenient for YOLO processing, and the method mainly comprises the following steps: and (3) converting the coco data set label format into an xml format, wherein the homemade data set is also labeled into the xml format, and the VOC2007 data set is not subjected to format processing.

In the embodiment, the virtual environment is built mainly through anaconda and named as a torch, the torch environment is activated through activate, a Pythroch version and other required toolkits are installed, and the GPU computing capacity is called to accelerate training in order to accelerate training efficiency and fully utilize existing hardware equipment. And compiling by adopting Pycharm as ide, creating project engineering files in the Pycharm, and calling the newly created torch environment in setting.

In this embodiment, python may be selected as a programming language, a main folder named YOLO-trans-master is created in Pycharm, and subfolders named nets, utils, img, model _ data, logs and input are created under the main file respectively; the network comprises nets, wherein the nets store subprograms of a network structure and a loss function, and the subprograms comprise executable files of dakenet.py, Yolo-tracing.py and Yolo.py, wherein the Darknet.py compiles the network structure of an improved backbone network, the Yolo.py compiles the network structure of multi-scale feature extraction, and the Yolo-tracing.py compiles a calculation method of the loss function; storing executable files dataloader. py of data set preprocessing and executable files util.py of anchor regression method in utils, storing data and xml label files in executable files config.py of anchor size and type number, storing weight files generated in the training process in logs, and storing detection results of anchors in the training and ground-route in input. In addition, executable files for training, testing, MAP generation and data set format conversion are directly placed under the main folder, and compiling work is completed by calling the classes in the subfolders.

In this embodiment, the training platform may be configured as: the adopted operating system is window10, and the hardware configuration is specifically a computer of NVIDIA GeForce RTX 2060(6G) and AMD Ryzen 74800H with radio Graphics; and (3) training 150 epochs, wherein the first 40 epochs are the training process of freezing the backbone network, 40-150 are the training processes of all network structures, the batch size is set to be 6, the total training time is 36 hours, and the trained model weight file is applied to the MAP executable file to obtain a result. And applying the trained model weight file to the test executable file to obtain the detection effect.

Example 3:

the embodiment provides a road target detection system, which comprises a data acquisition module and a target detection module;

Example 4:

the present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the steps of the road object detection method described in embodiment 1.

Example 5:

this embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the steps of the road object detection method described in embodiment 1 are implemented.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and those skilled in the art can make various modifications and variations. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present embodiment should be included in the protection scope of the present embodiment.

Claims

1. A method of detecting a road target, comprising:

acquiring road related image information;

2. The method for detecting a road target according to claim 1, wherein the hole convolution adds a hole expansion receptive field in the convolution kernel, so that the receptive field of the convolution kernel is increased under the same parameter quantity kernel calculation quantity; and introducing a dilation rate, and defining the interval distance of the numerical values in the convolution kernel through the dilation rate.

3. The road target detection method according to claim 1, wherein feature fusion is performed on feature maps obtained from the three branches, and the specific operations are as follows:

wherein F represents the feature map after fusion,

represents a Concat connection operation;

and carrying out traditional convolution operation on the feature graph after feature fusion, and then carrying out superposition operation on the feature graph and the input feature graph.

4. A road target detection method as claimed in claim 3 wherein the three branches performing convolution operations of different voidages comprises:

the convolution operation of branch 1 is:

F₁ ^H×W×(C/2)＝F^H×W×C*A^1×1×(C/2)

F₂ ^H×W×C＝F₁ ^H×W×(C/2)*B^3×3×C

wherein, F₁ ^H×W×(C/2)Output feature graph, F, representing a conventional convolution₂ ^H×W×CShows the output characteristic diagram of branch 1, A^1×1×(C/2)Representing a conventional convolution operation with a filter size of 1x1 and a number of C/2, F^H×W×CInput feature maps representing width, height and number of channels H, W and C, respectively, representing convolution operations;

the convolution operation of branch 2 is:

F₃ ^H×W×(C/2)＝F^H×W×C*A^1×1×(C/2)

F₄ ^H×W×C＝F₃ ^H×W×(C/2)*B^3×3×C

wherein, F₃ ^H×W×(C/2)Output feature graph, F, representing a conventional convolution₄ ^H×W×COutput characteristic diagram of branch 2, B^3×3×CRepresenting a hole convolution and a depth separable convolution operation;

the convolution operation of branch 3 is:

F₅ ^H×W×(C/2)＝F^H×W×C*A^1×1×(C/2)

F₆ ^H×W×C＝F₅ ^H×W×(C/2)*B^3×3×C

wherein, F₅ ^H×W×(C/2)Output feature graph, F, representing a conventional convolution₆ ^H×W×CThe output characteristic diagram of branch 3 is shown.

5. The method as claimed in claim 1, wherein for the feature extraction network in the model, the outputs of the 3 rd, 4 th and 5 th residual blocks are taken as the inputs of the feature extraction network, and there are an upsampling operation and a Concat operation, outputting feature maps of three different scales.

6. A road object detection method as claimed in claim 1, characterized in that in the model, a cross-phase dense connection structure is added on the basis of the original Darknet53 network structure.

7. A road object detecting method according to claim 1, wherein the image information includes vehicle information and pedestrian information; and preprocessing image distortion and random rotation is carried out on the image information subjected to the road target detection model training.

8. A road target detection system is characterized by comprising a data acquisition module and a target detection module;

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the road object detection method according to any one of claims 1 to 7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the road object detection method according to any of claims 1-7 are implemented when the processor executes the program.