CN114119965A - Road target detection method and system - Google Patents

Road target detection method and system Download PDF

Info

Publication number
CN114119965A
CN114119965A CN202111447972.1A CN202111447972A CN114119965A CN 114119965 A CN114119965 A CN 114119965A CN 202111447972 A CN202111447972 A CN 202111447972A CN 114119965 A CN114119965 A CN 114119965A
Authority
CN
China
Prior art keywords
convolution
target detection
road
cavity
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111447972.1A
Other languages
Chinese (zh)
Inventor
邓立霞
李洪泉
刘海英
陈奂宇
张肖轶群
毕凌云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN202111447972.1A priority Critical patent/CN114119965A/en
Publication of CN114119965A publication Critical patent/CN114119965A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a road target detection method and a system, comprising the following steps: acquiring road related image information; obtaining a road target detection result according to the acquired image information and a preset road target detection model; the road target detection model is obtained by adopting a deep network learning method; in the model, a cavity receptive field module which integrates cavity convolution and depth separable convolution is added, when the downsampling operation is carried out on the characteristic diagram, three branches are adopted to carry out convolution operation with different cavity rates, the cavity convolution operation is carried out only once on each branch, and the cavity convolution is utilized to contain a larger receptive field, so that multi-scale information can be captured, and the loss of characteristic information is effectively reduced; in addition, the hole convolution operation is only carried out once in each branch, so that the problem of image feature loss caused by multiple hole convolution stacking is solved; the conventional convolution operation is replaced by the deep separable convolution, and the amount of calculation and parameters are reduced.

Description

Road target detection method and system
Technical Field
The invention belongs to the technical field of road detection, and particularly relates to a road target detection method and system.
Background
Automobiles have become a main vehicle in modern society, and bring much convenience to people; however, as the number of automobiles increases, traffic problems caused by the automobiles, such as urban congestion and traffic accidents, are increasing; therefore, it is important to search for objects such as vehicles and pedestrians driving on the road, and to set vehicle diversion, speed limit and red lights according to the detection result; the intelligent detection of the road target by adopting the deep learning method is a main means and development direction for detecting the road target in the prior art and the future.
The inventor finds that the existing road detection method based on deep learning has the following problems: the residual block is connected by adopting the traditional convolution operation downsampling, so that certain characteristic loss exists in the method for reducing the image size and increasing the receptive field; in the traditional convolution operation, the calculated amount and the parameter amount are too large, more calculation resources are occupied, and the calculation efficiency and the calculation precision are influenced.
Disclosure of Invention
The invention provides a road target detection method and a system for solving the problems, wherein when the characteristic diagram is subjected to down-sampling operation, three branches are added for carrying out convolution operation with different voidage rates; the cavity convolution is utilized to contain a larger receptive field, so that the multi-scale information can be captured, and the loss of characteristic information is effectively reduced; in addition, the hole convolution operation is only carried out once in each branch, so that the problem of image feature loss caused by multiple hole convolution stacking is solved; the traditional convolution operation is replaced by the deep separable convolution, so that the calculated amount and the parameter amount are greatly reduced; only a small amount of computing resources are occupied, and computing precision is improved.
In order to achieve the purpose, the invention is realized by the following technical scheme:
in a first aspect, the present invention provides a road target detection method, including:
acquiring road related image information;
obtaining a road target detection result according to the acquired image information and a preset road target detection model;
the road target detection model is obtained by adopting a deep network learning method; in the model, a cavity receptive field module which integrates cavity convolution and depth separable convolution is added, when down-sampling operation is carried out on the characteristic diagram, convolution operation with different cavity rates is carried out by adopting three branches, and only one cavity convolution operation is carried out on each branch.
Further, the cavity convolution adds a cavity expansion receptive field in a convolution kernel, so that the receptive field of the convolution kernel under the same parameter quantity kernel calculation quantity is increased; and introducing a dilation rate, and defining the interval distance of the numerical values in the convolution kernel through the dilation rate.
Further, feature fusion is performed on feature graphs obtained by the three branches, and the specific operation is as follows:
Figure BDA0003384589400000021
wherein F represents the feature map after fusion,
Figure BDA0003384589400000022
indicating a Concat connection operation.
Further, the feature graph after feature fusion is subjected to traditional convolution operation to reduce the number of channels, and then is subjected to Add superposition operation with the input feature graph.
Further, the convolution operations of the three branches with different void rates include:
the convolution operation of branch 1 is:
Figure BDA0003384589400000023
Figure BDA0003384589400000024
wherein, F1 H×W×(C/2)Output feature graph, F, representing a conventional convolution2 H×W×CShows the output characteristic diagram of branch 1, A1 ×1×(C/2)Representing a conventional convolution operation with a filter size of 1x1 and a number of C/2, FH×W×CInput feature maps representing width, height and number of channels H, W and C, respectively, representing convolution operations;
the convolution operation of branch 2 is:
F3 H×W×(C/2)=FH×W×C*A1×1×(C/2)
F4 H×W×C=F3 H×W×(C/2)*B3×3×C
wherein, F3 H×W×(C/2)Output feature graph, F, representing a conventional convolution4 H×W×COutput characteristic diagram of branch 2, B3 ×3×CRepresenting a hole convolution and a depth separable convolution operation;
the convolution operation of branch 3 is:
F5 H×W×(C/2)=FH×W×C*A1×1×(C/2)
F6 H×W×C=F5 H×W×(C/2)*B3×3×C
wherein, F5 H×W×(C/2)Output feature graph, F, representing a conventional convolution6 H×W×CRepresenting the output of branch 3And (5) feature diagrams.
Further, for the feature extraction network in the model, the outputs of the 3 rd, 4 th and 5 th residual blocks are taken as the inputs of the feature extraction network respectively, and an upsampling operation and a Concat operation exist, and three feature maps with different scales are output.
Furthermore, in the model, a cross-phase dense connection structure is added on the basis of the original Darknet53 network structure.
Further, the image information includes vehicle information and pedestrian information; and preprocessing image distortion and random rotation is carried out on the image information subjected to the road target detection model training.
In a second aspect, the invention further provides a road target detection system, which comprises a data acquisition module and a target detection module;
the data acquisition module configured to: acquiring road related image information;
the object detection module configured to: obtaining a road target detection result according to the acquired image information and a preset road target detection model;
the road target detection model is obtained by adopting a deep network learning method; in the model, a cavity receptive field module which integrates cavity convolution and depth separable convolution is added, when down-sampling operation is carried out on the characteristic diagram, convolution operation with different cavity rates is carried out by adopting three branches, and only one cavity convolution operation is carried out on each branch.
In a third aspect, the present invention also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, performs the steps of the road object detection method of the first aspect.
In a fourth aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the road object detecting method according to the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
when the feature graph is subjected to down-sampling operation, three branches are added to carry out convolution operation with different void rates; the cavity convolution is utilized to contain a larger receptive field, so that the multi-scale information can be captured, and the loss of characteristic information is effectively reduced; in addition, the hole convolution operation is only carried out once in each branch, so that the problem of image feature loss caused by multiple hole convolution stacking is solved; the traditional convolution operation is replaced by the deep separable convolution, so that the calculated amount and the parameter amount are greatly reduced; only a small amount of computing resources are occupied, and computing precision is improved.
Drawings
The accompanying drawings, which form a part hereof, are included to provide a further understanding of the present embodiments, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the present embodiments and together with the description serve to explain the present embodiments without unduly limiting the present embodiments.
Fig. 1 is a structure diagram of a ResNet module network according to embodiment 1 of the present invention;
FIG. 2 is a diagram showing a structure of a Darknet53 network according to embodiment 1 of the present invention;
FIG. 3 is a diagram of a hole convolution module DP-module network structure according to embodiment 1 of the present invention;
fig. 4 is a network configuration diagram of an improvement to Darknet53 according to embodiment 1 of the present invention;
FIG. 5 is a graph showing the detection effect obtained by the Pythroch frame in example 1 of the present invention.
The specific implementation mode is as follows:
the invention is further described with reference to the following figures and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
In recent years, with the appearance of image processors with strong calculation power and large-scale data samples, deep learning develops rapidly, and the process of manually selecting and extracting features is replaced by introducing convolutional neural network self-learning target features, so that the real-time performance and the accuracy of target detection are effectively improved.
Two-stage target detection algorithms based on candidate regions, such as R-CNN, SPP-Net, Fast R-CNN, FasterR-CNN, MaskR-CNN and TridentNet, continuously improve target detection accuracy, but because the network structure is complex, target tasks are realized in stages, so that the detection speed is low, and real-time target detection is difficult to realize; the single-stage target detection algorithms such as SDD, DSDD and YOLO series algorithms convert target detection into regression problems, greatly improve the detection speed, but have slightly lower detection precision than the former; both types of target detection algorithms achieve significant results, mainly due to high-performance hardware devices and powerful GPU computing power; due to the complex network structure, large parameter quantity, more occupied physical memory and long training time, the method is difficult to apply to mobile terminals such as smart phones, unmanned planes or other cheap equipment, and especially the ultrahigh delay caused by the limitation of hardware equipment has great influence on the detection speed; the traditional convolution process has certain resource waste while increasing the calculated amount, and related students cut redundant features to achieve the effect of network lightweight through pruning a network structure, but also can reduce the detection precision of the network, such as lightweight versions YOLO-tiny SSD of single-stage target detection, but the detection precision is greatly reduced; some lightweight network structures appearing in the years, such as SqueezeNet, MobileNet and sheffenet, have better detection accuracy compared with a lightweight version of single-target detection while maintaining the detection frame rate.
Considering that the application scene is the detection of a road target, higher real-time performance is required; although the double-stage target detection has higher detection precision, the detection speed is too low, and the real-time detection of the fast-running vehicle is difficult to realize; therefore, the invention mainly relies on a YOLO series algorithm, selects a YOLOv3 algorithm for improvement, and improves the detection effect on the premise of ensuring the practicability.
Example 1:
the embodiment provides a road target detection method, which comprises the following steps:
s1: making a data set for model training and testing, and preprocessing and format conversion are carried out on the data set;
s2: building a virtual environment for model training and testing, wherein in the embodiment, a Pycharm compiling tool can be adopted for realizing;
s3, constructing a YOLOv3 backbone network Darknet53 network structure and a multi-scale feature extraction network structure;
s4: on the basis of the original Darknet53 network structure, a cross-stage dense connection structure is added, so that the fusion of semantic information and detail information of a feature layer is improved, and the loss of image features in a deep network is reduced; meanwhile, in YOLOv3, the residual block is connected with the sample by adopting the traditional convolution operation with the step length of 2, and certain characteristic loss exists in the method for reducing the image size and increasing the receptive field; in the embodiment, a cavity reception field module combining cavity convolution and depth separable convolution is designed, so that a larger reception field can be obtained under the same resolution and the feature loss caused by the traditional convolution can be reduced;
s5: training an improved YOLOv3 model, and keeping a final weight file when the loss function is not converging;
s6: and testing the improved model, detecting the target class and marking the target class by using a square frame.
In the present embodiment, the processing procedure of step S1 is as follows:
s11: in this embodiment, 17900 related vehicle data sets are mainly created, including a partial coco (common Objects in context) data set, a partial VOC2007 data set, and a self-made real shooting data set; the selected COCO data set and the VOC2007 data set are related to vehicles and pedestrians, and mainly include five categories: car, bus, bicycle, motorcycle, and person;
s12, carrying out preprocessing operation of image distortion and random rotation on the manufactured data set, aiming at expanding the data set, improving the target class characteristics and facilitating the training of an optimization model;
s13: the data set tag format adopted by the embodiment is a VOC XML format, the JSON format of the COCO data set needs to be converted into the XML format, the self-made data set is labeled by a target detection labeling tool label, and meanwhile, the tag type is stored into the XML format;
s14: the input feature size adopted in the embodiment is 416x416, and the size of the image after preprocessing and format conversion is uniformly adjusted to 416x 416;
s15: the data set was divided into 9: 1 into a training set and a test set, wherein 10% of the training set is delimited as a validation set.
In the present embodiment, the processing procedure of step S2 is as follows:
s21, training the model in the Windows10 system, and building a virtual environment through Anaconda; installing a Pythrch and other necessary toolkits in the virtual environment;
s22: in order to fully utilize the current hardware equipment and accelerate the model training, cuda and cudnn of corresponding versions need to be installed; the cuda version used in this example is 10.2, cudnn version is 7.4.5;
in the present embodiment, the processing procedure of step S3 is as follows:
s31, the Darknet53 backbone network comprises a large number of residual error network structures, the problem of gradient disappearance caused by the deepening of the network can be effectively solved, and the residual error network adopted by the Darknet53 can be expressed by a formula:
X1=σ{β(W1,X)}
X2=σ{β(W2,X1)}
X3=X+X2
wherein X represents the input characteristics of the residual structure, (W)1X) represents the input features as weighted by W1Is a convolution operation in which W1Has a convolution kernel size of 1x1, beta represents batch normalization operation, sigma represents nonlinear activation Relu operation, and W2,X1) Representing the input features by a weight W2Is a convolution operation in which W2Has a convolution kernel size of 1X1, X2Features of the skeleton output, X, representing the residual structure3Representing the final output characteristics of the residual structure;
the residual error structure adopted by YOLOv3 is characterized in that firstly, convolution operation with the filter size of 1x1 is carried out on input features to reduce channel number compression features, then batch normalization and Relu activation are carried out on the features after convolution, secondly, convolution kernels with the size of 3x3 are adopted to carry out second convolution operation to increase channel number expansion features, batch normalization and Relu activation are carried out on the features after convolution in the same way, feature graphs output after two convolution operations are overlapped with input features Add, and final features are output;
s32, performing conventional 3x3 convolution operation on input features by a Darknet53 backbone network, then respectively superposing five residual blocks, wherein each residual block respectively comprises 1, 2, 8 and 4 residual network structures shown by S31, and the two residual blocks are connected with each other in a downsampling mode through the convolution operation with the filter size of 3x3 and the step length of 2;
s33: the construction of a network structure related to multi-scale feature extraction mainly introduces the idea of FPN, and specifically comprises the following steps: for the Darknet53 feature extraction network, the output of the No. 3, No. 4 and No. 5 residual blocks are taken as the input of the multi-scale feature extraction network, the convolution adopted in the multi-scale feature extraction network has the size of 1x1 and 3x3, and the upsampling operation and the Concat operation exist, and finally feature maps of three different scales are output: 13x13, 26x26 and 52x52 are used for adapting to the detection of targets with different sizes and improving the detection effect of small target objects.
In the present embodiment, the processing procedure of step S4 is as follows:
s41, although the residual error network adopted by the Darknet53 can keep most of image characteristics, the image still loses part of detail characteristics in the deep network along with the continuous deepening of the network structure; for the problem, in the embodiment, by using the thought of a DenseNet residual error network for reference, a cross-stage dense connection structure is designed, so that the transfer capability of image detail features is enhanced, and the target positioning and identification effects are further improved;
s42: in order to ensure the calculation speed and the model inference time and consider the problem of deployment and application of the training model, a dense connection mode with very dense DenseNet is not adopted in the embodiment, but a dense connection structure with fast residual error spanning is adopted;
s43, specifically, the method comprises the following steps: respectively superposing the feature map after the first convolution of Darknet53 and the output feature maps of the 1 st, 2 nd and 3 rd residual error blocks to the input ends of the 2 nd, 3 rd, 4 th and 5 th residual error blocks through convolution operation, and adding four side-position residual error structures across the residual error blocks in total;
s44, in the DenseNet residual error network, the size of the feature map does not change along with the deepening of the network structure, and in Darknet53, the size has a down-sampling trend, so that the added side bit residual error in the embodiment is subjected to convolution operation with the step length of 2 and the convolution kernel size of 3x3, and the convolved feature map is subjected to batch normalization and nonlinear activation in the same way;
s45: the overlay mode adopted in this embodiment is different from the DenseNet residual error network, and Add connection is adopted in this embodiment.
The Darknet53 network is a network structure with a complex structure and large parameter quantity, in order to expand the receptive field, each residual block of the Darknet53 is connected by down-sampling, and although the connection mode of direct down-sampling expands the receptive field, the method is favorable for extracting features, but certain features are lost, and the method is not favorable for detecting small targets.
The cavity convolution adds a cavity expansion receptive field in a convolution kernel, so that the original convolution kernel of 3x3 has a receptive field of 5x5 or even larger under the same parameter kernel calculation amount; the hole convolution introduces a new hyper-parameter in the traditional convolution, namely the expansion rate, and the parameter defines the interval distance of numerical values in a convolution kernel; the normal convolution variance ratio is 1, and if the variance ratio is greater than 1, the field of hollow convolution becomes large. The receptive field of the hole convolution is shown in the formula:
n=k+(k-1)×(d-1)
where n represents the receptive field of the hole convolution, k represents the size of the convolution kernel, and d represents the hole rate.
Although the cavity convolution can effectively enlarge the receptive field, the wrong cavity convolution can weaken the feature extraction capability of the network; firstly, in this embodiment, the hole convolution with the same specification is adopted for multiple times to replace the conventional convolution, which may cause a part of pixel information to be ignored and the continuity of information to be lost, which is fatal to the task of pixel-level dense prediction; secondly, the detection of a small target is adversely affected if the voidage is set too large, and the expansion of the receptive field is not obvious if the voidage is set too small; based on this, a three-branch cavity receptive field module-DP-module is designed in this embodiment, so as to increase the feature extraction capability of the network and improve the detection effect on small targets.
The characteristic output process of the cavity receptive field module is shown as a formula, wherein A1×1×(C/2)Representing the traditional convolution operation, the filter size is 1x1, and the number is C/2; b is3×3×CRepresenting the separable convolution operation of the void convolution and the depth, wherein the size of a filter is 3x3, and the number of the filters is C; represents a convolution operation; fH×W×CRepresenting an input feature map with width, height and channel number of H, W, C respectively.
The convolution operation for branch 1 is shown in the formula:
F1 H×W×(C/2)=FH×W×C*A1×1×(C/2)
F2 H×W×C=F1 H×W×(C/2)*B3×3×C
wherein, F1 H×W×(C/2)Output feature graph, F, representing a conventional convolution2 H×W×CThe output characteristic diagram of the branch 1 is shown;
the convolution operation of branch 2 is shown in the formula:
F3 H×W×(C/2)=FH×W×C*A1×1×(C/2)
F4 H×W×C=F3 H×W×(C/2)*B3×3×C
wherein, F3 H×W×(C/2)Output feature graph, F, representing a conventional convolution4 H×W×CThe output characteristic diagram of the branch 2 is shown;
the convolution operation of branch 3 is shown in the formula:
F5 H×W×(C/2)=FH×W×C*A1×1×(C/2)
F6 H×W×C=F5 H×W×(C/2)*B3×3×C
wherein, F5 H×W×(C/2)Output feature graph, F, representing a conventional convolution6 H×W×CA graph representing the output characteristics of branch 3;
and performing feature fusion on the feature graphs obtained by the three branches, wherein the specific operation is shown as a formula:
Figure BDA0003384589400000111
wherein F represents the feature map after fusion,
Figure BDA0003384589400000112
indicating a Concat connection operation.
In this embodiment, downsampling operation as described in the formula is performed on the feature map F; compared with the convolution operation of directly carrying out down sampling, the cavity receptive field module is firstly added with three branches to carry out convolution operation with different cavity rates, and the cavity convolution is utilized to contain a larger receptive field, so that the cavity receptive field module can capture multi-scale information, and the loss of characteristic information is effectively reduced; in addition, the hole convolution operation is only carried out once in each branch, so that the problem of image feature loss caused by multiple hole convolution stacking is solved; the multi-branch network structure design and the incremental voidage also effectively improve the feature extraction of targets with different scales; in the embodiment, the depth separable convolution replaces the traditional convolution operation, so that the calculated amount and the parameter amount are greatly reduced, and the cavity receptive field module only needs to occupy a small amount of calculation resources, so that the calculation precision can be obviously improved.
In this embodiment, the processing procedure of step S5 is as follows:
the YOLOv 3-based road target detection algorithm according to claim 1, wherein the processing procedure of step S5 is as follows:
s51, for the data set, according to 9: 1, distributing a training set and a test set, respectively storing picture names corresponding to the training set and the test set under a train.txt folder and a test.txt folder, selecting 10% of the training set as a verification set, and storing the picture names corresponding to the verification set under a val.txt folder;
s52: and converting the target information in the VOC data set format into the txt format, wherein the extracted information comprises the type information and the position information of the target. And respectively storing the training set, the verification set and the test set in different txt files.
S53: newly building an engineering file named as YOLO-master in Pycharm, calling a path of a related configuration text file of a data set in the engineering file, and writing an improved algorithm in the YOLO-master engineering file by using python, wherein the improved algorithm comprises a forward propagation process and a backward propagation process of the algorithm and executable files of training, testing and MAP testing;
s54: aiming at the improved network structure, pre-training weights of YOLOv3 given by the official website are loaded, weights matched with key values are automatically loaded, and weight parts unmatched with the key values are manually loaded;
s55: transmitting initialization parameters including the size of an image, the category number of a detection target, the anchor size generated by k-means clustering and a weight file path into a script file;
s56, training the prepared engineering file, setting parameters epoch as 150 and batch as 6, wherein the first 40epoch is a training process of freezing the backbone network, 40-150 are training processes of all network structures, the training time is 36 hours, and when the loss value tends to be convergent, stopping training and storing the final weight file;
in the present embodiment, the processing procedure of step S6 is as follows:
s61: adopting the test set in S1 as a test sample file;
s62: taking the test set sample file as the input of the improved YOLOv3 network model, loading the trained weight file, and running the test to generate an executable file;
s63: inputting a path of a picture to be tested specifically to obtain a target detection result, and marking the type, type confidence coefficient and position of the target in the picture.
Example 2:
the embodiment further verifies a road target detection method, which is implemented by a road target detection algorithm based on YOLOv3, and includes: data set preprocessing, virtual environment deployment, python programming implementation, model training and model testing.
Specifically, in the present embodiment, the data set is used to perform image warping and random rotation operations for feature enhancement; wherein the training dataset comprises a partial Pascal VOC2007 dataset, a coco dataset, and a self-made dataset, total 19900; meanwhile, the data set is converted into a data format which is convenient for YOLO processing, and the method mainly comprises the following steps: and (3) converting the coco data set label format into an xml format, wherein the homemade data set is also labeled into the xml format, and the VOC2007 data set is not subjected to format processing.
In the embodiment, the virtual environment is built mainly through anaconda and named as a torch, the torch environment is activated through activate, a Pythroch version and other required toolkits are installed, and the GPU computing capacity is called to accelerate training in order to accelerate training efficiency and fully utilize existing hardware equipment. And compiling by adopting Pycharm as ide, creating project engineering files in the Pycharm, and calling the newly created torch environment in setting.
In this embodiment, python may be selected as a programming language, a main folder named YOLO-trans-master is created in Pycharm, and subfolders named nets, utils, img, model _ data, logs and input are created under the main file respectively; the network comprises nets, wherein the nets store subprograms of a network structure and a loss function, and the subprograms comprise executable files of dakenet.py, Yolo-tracing.py and Yolo.py, wherein the Darknet.py compiles the network structure of an improved backbone network, the Yolo.py compiles the network structure of multi-scale feature extraction, and the Yolo-tracing.py compiles a calculation method of the loss function; storing executable files dataloader. py of data set preprocessing and executable files util.py of anchor regression method in utils, storing data and xml label files in executable files config.py of anchor size and type number, storing weight files generated in the training process in logs, and storing detection results of anchors in the training and ground-route in input. In addition, executable files for training, testing, MAP generation and data set format conversion are directly placed under the main folder, and compiling work is completed by calling the classes in the subfolders.
In this embodiment, the training platform may be configured as: the adopted operating system is window10, and the hardware configuration is specifically a computer of NVIDIA GeForce RTX 2060(6G) and AMD Ryzen 74800H with radio Graphics; and (3) training 150 epochs, wherein the first 40 epochs are the training process of freezing the backbone network, 40-150 are the training processes of all network structures, the batch size is set to be 6, the total training time is 36 hours, and the trained model weight file is applied to the MAP executable file to obtain a result. And applying the trained model weight file to the test executable file to obtain the detection effect.
Example 3:
the embodiment provides a road target detection system, which comprises a data acquisition module and a target detection module;
the data acquisition module configured to: acquiring road related image information;
the object detection module configured to: obtaining a road target detection result according to the acquired image information and a preset road target detection model;
the road target detection model is obtained by adopting a deep network learning method; in the model, a cavity receptive field module which integrates cavity convolution and depth separable convolution is added, when down-sampling operation is carried out on the characteristic diagram, convolution operation with different cavity rates is carried out by adopting three branches, and only one cavity convolution operation is carried out on each branch.
Example 4:
the present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the steps of the road object detection method described in embodiment 1.
Example 5:
this embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the steps of the road object detection method described in embodiment 1 are implemented.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and those skilled in the art can make various modifications and variations. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present embodiment should be included in the protection scope of the present embodiment.

Claims (10)

1. A method of detecting a road target, comprising:
acquiring road related image information;
obtaining a road target detection result according to the acquired image information and a preset road target detection model;
the road target detection model is obtained by adopting a deep network learning method; in the model, a cavity receptive field module which integrates cavity convolution and depth separable convolution is added, when down-sampling operation is carried out on the characteristic diagram, convolution operation with different cavity rates is carried out by adopting three branches, and only one cavity convolution operation is carried out on each branch.
2. The method for detecting a road target according to claim 1, wherein the hole convolution adds a hole expansion receptive field in the convolution kernel, so that the receptive field of the convolution kernel is increased under the same parameter quantity kernel calculation quantity; and introducing a dilation rate, and defining the interval distance of the numerical values in the convolution kernel through the dilation rate.
3. The road target detection method according to claim 1, wherein feature fusion is performed on feature maps obtained from the three branches, and the specific operations are as follows:
Figure FDA0003384589390000011
wherein F represents the feature map after fusion,
Figure FDA0003384589390000012
represents a Concat connection operation;
and carrying out traditional convolution operation on the feature graph after feature fusion, and then carrying out superposition operation on the feature graph and the input feature graph.
4. A road target detection method as claimed in claim 3 wherein the three branches performing convolution operations of different voidages comprises:
the convolution operation of branch 1 is:
F1 H×W×(C/2)=FH×W×C*A1×1×(C/2)
F2 H×W×C=F1 H×W×(C/2)*B3×3×C
wherein, F1 H×W×(C/2)Output feature graph, F, representing a conventional convolution2 H×W×CShows the output characteristic diagram of branch 1, A1×1×(C/2)Representing a conventional convolution operation with a filter size of 1x1 and a number of C/2, FH×W×CInput feature maps representing width, height and number of channels H, W and C, respectively, representing convolution operations;
the convolution operation of branch 2 is:
F3 H×W×(C/2)=FH×W×C*A1×1×(C/2)
F4 H×W×C=F3 H×W×(C/2)*B3×3×C
wherein, F3 H×W×(C/2)Output feature graph, F, representing a conventional convolution4 H×W×COutput characteristic diagram of branch 2, B3×3×CRepresenting a hole convolution and a depth separable convolution operation;
the convolution operation of branch 3 is:
F5 H×W×(C/2)=FH×W×C*A1×1×(C/2)
F6 H×W×C=F5 H×W×(C/2)*B3×3×C
wherein, F5 H×W×(C/2)Output feature graph, F, representing a conventional convolution6 H×W×CThe output characteristic diagram of branch 3 is shown.
5. The method as claimed in claim 1, wherein for the feature extraction network in the model, the outputs of the 3 rd, 4 th and 5 th residual blocks are taken as the inputs of the feature extraction network, and there are an upsampling operation and a Concat operation, outputting feature maps of three different scales.
6. A road object detection method as claimed in claim 1, characterized in that in the model, a cross-phase dense connection structure is added on the basis of the original Darknet53 network structure.
7. A road object detecting method according to claim 1, wherein the image information includes vehicle information and pedestrian information; and preprocessing image distortion and random rotation is carried out on the image information subjected to the road target detection model training.
8. A road target detection system is characterized by comprising a data acquisition module and a target detection module;
the data acquisition module configured to: acquiring road related image information;
the object detection module configured to: obtaining a road target detection result according to the acquired image information and a preset road target detection model;
the road target detection model is obtained by adopting a deep network learning method; in the model, a cavity receptive field module which integrates cavity convolution and depth separable convolution is added, when down-sampling operation is carried out on the characteristic diagram, convolution operation with different cavity rates is carried out by adopting three branches, and only one cavity convolution operation is carried out on each branch.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the road object detection method according to any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the road object detection method according to any of claims 1-7 are implemented when the processor executes the program.
CN202111447972.1A 2021-11-30 2021-11-30 Road target detection method and system Pending CN114119965A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111447972.1A CN114119965A (en) 2021-11-30 2021-11-30 Road target detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111447972.1A CN114119965A (en) 2021-11-30 2021-11-30 Road target detection method and system

Publications (1)

Publication Number Publication Date
CN114119965A true CN114119965A (en) 2022-03-01

Family

ID=80369021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111447972.1A Pending CN114119965A (en) 2021-11-30 2021-11-30 Road target detection method and system

Country Status (1)

Country Link
CN (1) CN114119965A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115223017A (en) * 2022-05-31 2022-10-21 昆明理工大学 Multi-scale feature fusion bridge detection method based on depth separable convolution

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115223017A (en) * 2022-05-31 2022-10-21 昆明理工大学 Multi-scale feature fusion bridge detection method based on depth separable convolution
CN115223017B (en) * 2022-05-31 2023-12-19 昆明理工大学 Multi-scale feature fusion bridge detection method based on depth separable convolution

Similar Documents

Publication Publication Date Title
CN110414451B (en) License plate recognition method, device, equipment and storage medium based on end-to-end
US9928213B2 (en) Event-driven spatio-temporal short-time fourier transform processing for asynchronous pulse-modulated sampled signals
CN112464910A (en) Traffic sign identification method based on YOLO v4-tiny
CN110197182A (en) Remote sensing image semantic segmentation method based on contextual information and attention mechanism
US11551076B2 (en) Event-driven temporal convolution for asynchronous pulse-modulated sampled signals
CN112307978B (en) Target detection method and device, electronic equipment and readable storage medium
CN110807376A (en) Method and device for extracting urban road based on remote sensing image
CN114067211A (en) Lightweight safety helmet detection method and system for mobile terminal
CN113313094B (en) Vehicle-mounted image target detection method and system based on convolutional neural network
CN111444986A (en) Building drawing component classification method and device, electronic equipment and storage medium
CN114119965A (en) Road target detection method and system
CN112508099A (en) Method and device for detecting target in real time
CN115482518A (en) Extensible multitask visual perception method for traffic scene
CN116740516A (en) Target detection method and system based on multi-scale fusion feature extraction
Ayachi et al. An edge implementation of a traffic sign detection system for Advanced driver Assistance Systems
CN112132867B (en) Remote sensing image change detection method and device
CN109409497B (en) Road condition prediction method and device
US20220044065A1 (en) System and method for parameter compression of capsule networks using deep features
Yang et al. A detection model of the complex dynamic traffic environment for unmanned vehicles
CN115527113A (en) Bare land classification method and device for remote sensing image
CN114998866A (en) Traffic sign identification method based on improved YOLOv4
Shao et al. Research on YOLOv5 Vehicle Object Detection Algorithm Based on Attention Mechanism
CN110942179A (en) Automatic driving route planning method and device and vehicle
CN115331128B (en) Viaduct crack detection method
CN115861997B (en) License plate detection and recognition method for key foreground feature guided knowledge distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination