WO2022083784A1 - 一种基于车联网的道路检测方法 - Google Patents

一种基于车联网的道路检测方法 Download PDF

Info

Publication number
WO2022083784A1
WO2022083784A1 PCT/CN2021/130684 CN2021130684W WO2022083784A1 WO 2022083784 A1 WO2022083784 A1 WO 2022083784A1 CN 2021130684 W CN2021130684 W CN 2021130684W WO 2022083784 A1 WO2022083784 A1 WO 2022083784A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
yolov3
feature
module
target
Prior art date
Application number
PCT/CN2021/130684
Other languages
English (en)
French (fr)
Inventor
刘晨
左瑜
Original Assignee
西安科锐盛创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西安科锐盛创新科技有限公司 filed Critical 西安科锐盛创新科技有限公司
Priority to US17/564,524 priority Critical patent/US20230154202A1/en
Publication of WO2022083784A1 publication Critical patent/WO2022083784A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • H04L2012/40267Bus for use in transportation systems
    • H04L2012/40273Bus for use in transportation systems the transportation system being a vehicle

Definitions

  • the invention belongs to the field of image detection, and in particular relates to a road detection method and vehicle-mounted electronic equipment based on the Internet of Vehicles.
  • the embodiments of the present invention provide a road detection method based on the Internet of Vehicles and an in-vehicle electronic device.
  • the specific technical solutions are as follows:
  • an embodiment of the present invention provides a road detection method based on the Internet of Vehicles, which is applied to a vehicle terminal, including:
  • the target road image captured by the image acquisition end input the target road image into the improved YOLOv3 network obtained by pre-training, and use the backbone network in the form of dense connection to perform feature extraction on the target road image to obtain x different scales.
  • Feature map; x is a natural number greater than or equal to 4;
  • the improved FPN network is used to perform top-down, densely connected feature fusion on the x feature maps of different scales, and the prediction results corresponding to each scale are obtained; based on all predictions
  • attribute information of the target road image is obtained, and the attribute information includes the position and category of the target in the target road image;
  • the improved YOLOv3 network includes the densely connected backbone network
  • the improved The improved YOLOv3 network is based on the YOLOv3 network, replacing the residual module in the backbone network with a dense connection module, increasing the feature extraction scale, optimizing the feature fusion method of the FPN network, and pruning and combining It is formed after the knowledge distillation guides
  • an embodiment of the present invention provides an in-vehicle electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;
  • the processor is configured to implement the steps of any one of the road detection methods based on the Internet of Vehicles provided in the first aspect when executing the program stored in the memory.
  • the residual module in the backbone network of the YOLOv3 network is replaced with a dense connection module, and the feature fusion mode is changed from parallel to serial, so that when the backbone network performs feature extraction,
  • the early feature map can be directly used as the input of each subsequent layer, and the obtained feature map has more information, which strengthens the transfer of features, so the detection accuracy can be improved when the target road image is detected.
  • the number of parameters and the amount of computation can be reduced by multiplexing the feature map parameters of the shallow network.
  • using multiple feature extraction scales to add fine-grained feature extraction scales for small targets can improve the detection accuracy of small targets in target road images.
  • the feature fusion method of the FPN network is changed, the feature map extracted by the backbone network is characterized by a top-down and dense connection method, and the deep features are directly upsampled by different multiples, so as to make the transmitted All feature maps have the same size.
  • the shallow network there is also the participation of high-dimensional semantic information, which is helpful for Improve the detection accuracy; at the same time, by directly receiving the features of the shallower network, more specific features can be obtained, which will effectively reduce the loss of features, reduce the amount of parameters that need to be calculated, improve the detection speed, and realize real-time detection.
  • the network volume can be reduced and most of the redundancy can be eliminated.
  • the calculation can greatly improve the detection speed while maintaining the detection accuracy.
  • the invention deploys the detection process of the cloud in the edge devices with very limited storage resources and computing resources, and the vehicle-mounted device can realize the road detection beyond the line of sight, and can realize the high-precision and high-real-time detection of the targets on the road. It is beneficial for the driver to drive safely.
  • FIG. 1 is a schematic flowchart of a road detection method based on the Internet of Vehicles provided by an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a YOLOv3 network in the prior art
  • FIG. 3 is a schematic structural diagram of an improved YOLOv3 network provided by an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a transition module provided by an embodiment of the present invention.
  • Figure 5-1 is a comparison diagram of mAP curves between YOLOv3 and Dense-YOLO-1 of an embodiment of the present invention
  • Figure 5-2 is a comparison diagram of loss curves of YOLOv3 and Dense-YOLO-1 of an embodiment of the present invention
  • Figure 6-1 is a comparison diagram of mAP curves of Dense-YOLO-1 and MultiScale-YOLO-1 according to the embodiment of the present invention
  • Figure 6-2 is the loss of Dense-YOLO-1 and MultiScale-YOLO-1 according to the embodiment of the present invention Curve comparison chart;
  • Figure 7-1 is a comparison diagram of mAP curves of Dense-YOLO-1 and Dense-YOLO-2 according to the embodiment of the present invention
  • Figure 7-2 is the loss of Dense-YOLO-1 and Dense-YOLO-2 according to the embodiment of the present invention Curve comparison chart;
  • Figure 8-1 is a comparison diagram of mAP curves of Dense-YOLO-1 and MultiScale-YOLO-2 according to the embodiment of the present invention
  • Figure 8-2 is the loss of Dense-YOLO-1 and MultiScale-YOLO-2 according to the embodiment of the present invention Curve comparison chart;
  • FIG. 9-1 is a weight shift diagram of parameter combination 5 selected in an embodiment of the present invention
  • FIG. 9-2 is a weight overlap diagram of parameter combination 5 selected in an embodiment of the present invention
  • FIG. 10 is a performance comparison diagram of an improved YOLOv3 network (YOLO-Terse) and a YOLOv3 network according to an embodiment of the present invention
  • FIG. 11 is a schematic structural diagram of an in-vehicle electronic device according to an embodiment of the present invention.
  • the embodiments of the present invention provide a road detection method based on the Internet of Vehicles and an in-vehicle electronic device.
  • the execution subject of the road detection method based on the Internet of Vehicles may be a road detection device based on the Internet of Vehicles, and the road detection device based on the Internet of Vehicles may run on the in-vehicle electronic equipment. middle.
  • the in-vehicle electronic device may be a plug-in in an image processing tool, or a program independent of an image processing tool, which is of course not limited thereto.
  • an embodiment of the present invention provides a road detection method based on the Internet of Vehicles.
  • the road detection method based on the Internet of Vehicles will be introduced first.
  • a road detection method based on the Internet of Vehicles provided by an embodiment of the present invention, applied to a vehicle terminal, may include the following steps:
  • the target road image is the image captured by the image acquisition device at the image end for the road area.
  • the image collection terminal can be other vehicles, pedestrians, road facilities, service platforms, etc. that are connected to the current vehicle through the Internet of Vehicles technology.
  • the image terminal can be high-level road facilities such as street light poles and overpasses on the roadside, or it can be flying equipment such as drones. Image acquisition devices are deployed on these image acquisition terminals.
  • the image capturing device may include a camera, a video camera, a camera, a mobile phone, etc.; in an optional embodiment, the image capturing device may be a high-resolution camera.
  • the image acquisition device can continuously collect road images of the corresponding area at certain time intervals, such as shooting at a rate of 30fps, and the image acquisition terminal where it is located transmits the collected road images to the corresponding vehicle.
  • the time interval can also be adjusted according to the density of objects on the road or according to demand.
  • a major problem in the Internet of Vehicles is the problem of beyond the line of sight. Due to the limited sight distance of the driver when driving on the road, the road conditions beyond the sight distance cannot be observed with the naked eye, especially when there are large vehicles and intersections ahead, the sight distance is even more limited.
  • the Internet of Vehicles needs to solve the problem of beyond the line of sight, so that drivers can obtain information on road conditions beyond the line of sight and adjust their driving plans as soon as possible.
  • the size of the target road image is 416 ⁇ 416 ⁇ 3. Therefore, in this step, in one embodiment, the vehicle-mounted terminal can directly obtain a target road image with a size of 416 ⁇ 416 ⁇ 3 from the image acquisition terminal; in another embodiment, the vehicle-mounted terminal can obtain any size sent by the image acquisition terminal. The image of the vehicle is then scaled by a certain size to obtain a target road image with a size of 416 ⁇ 416 ⁇ 3.
  • image enhancement operations such as cropping, splicing, smoothing, filtering, and edge filling can also be performed on the acquired image, so as to enhance the features of interest in the image and expand the generalization ability of the data set.
  • the network structure of the YOLOv3 network in the prior art is introduced.
  • the part inside the dashed box is the YOLOv3 network.
  • the part in the dotted line box is the backbone network of the YOLOv3 network, that is, the darknet-53 network; the backbone network of the YOLOv3 network is composed of a CBL module and five resn modules connected in series.
  • the CBL module is a convolutional network module, including a serially connected conv layer (Convolutional layer, convolutional layer, referred to as conv layer), BN (Batch Normalization, batch normalization) layer and activation function Leaky relu corresponding to the Leaky relu layer, CBL means conv+BN+Leakyrelu.
  • the resn module is a residual module, and n represents a natural number, as shown in Figure 2.
  • the resn module includes serially connected zero padding (zero padding) layers, CBL module and residual unit group
  • the residual unit group is represented by Res unit*n, which means that it includes n residual units Res unit, and each residual unit includes the connection form of Residual Network (Residual Network, referred to as ResNets)
  • ResNets Residual Network
  • the rest of the main network is the FPN (FeaturePyramidNetworks, feature pyramid network) network, and the FPN network is divided into three prediction branches Y 1 ⁇ Y 3 , and the scales of the prediction branches Y 1 ⁇ Y 3 are respectively the same as the ones along the input reverse direction.
  • the scales of the feature maps output by the three residual modules res4, res8, and res8 correspond one-to-one.
  • the prediction results of each prediction branch are represented by Y1, Y2, and Y3, respectively, and the scales of Y1, Y2, and Y3 increase in sequence.
  • Each prediction branch of the FPN network includes a convolutional network module group, specifically including five convolutional network modules, namely CBL*5 in Figure 2.
  • the US (up sampling, up sampling) module is an up sampling module; concat indicates that the feature fusion adopts a cascade method, and concat is the abbreviation of concatenate.
  • the improved YOLOv3 network includes a densely connected backbone network and an improved FPN network; the improved YOLOv3 network is based on the YOLOv3 network, replacing the residual module in the backbone network with a densely connected module, adding features It is formed by extracting scale, optimizing the feature fusion method of FPN network, and performing pruning and combining knowledge distillation to guide network recovery processing; the improved YOLOv3 network is obtained by training based on sample road images and the location and category of the target corresponding to the sample road images. of. The network training process will be introduced later.
  • the structure of the improved YOLOv3 network is first introduced below, first of all, its backbone network part.
  • FIG. 3 is a schematic diagram of the structure of the improved YOLOv3 network provided by the embodiment of the present invention. The portion inside the dot-dash line frame in FIG. 3 .
  • the backbone network of the improved YOLOv3 network Compared with the backbone network of the YOLOv3 network, the backbone network of the improved YOLOv3 network provided by the embodiment of the present invention has an improvement idea in that, on the one hand, drawing on the connection method of the dense convolutional network DenseNet, a specific dense connection module is proposed. to replace the residual module (resn module) in the backbone network of the YOLOv3 network. That is, the backbone network of the improved YOLOv3 network adopts the backbone network in the form of dense connection. It is known that ResNets combine features by summing before passing the features to the layer, that is, feature fusion in a parallel manner.
  • the dense connection method all layers (with matching feature map sizes) are directly connected to each other in order to ensure that information flows between layers in the network to the greatest extent. Specifically, for each layer, all feature maps of its previous layers are used as its input, and its own feature maps are used as the input of all subsequent layers, that is, feature fusion adopts a cascade method (also called concatenation Way). Therefore, compared with the residual module used in the YOLOv3 network, the improved YOLOv3 network can obtain more information in the feature map by switching to the dense connection module, which can enhance the feature propagation and improve the detection accuracy when performing road image detection.
  • the feature maps are transferred from shallow to deep, and feature maps of at least four scales are extracted, so that the network can detect objects of different scales.
  • the detection accuracy can be improved for small targets.
  • the small targets in the embodiments of the present invention include objects with small volumes on the road, such as road signs, small obstacles, small animals, etc., or objects with small areas in the image due to the long shooting distance.
  • the backbone network in the form of dense connection may include:
  • Densely connected modules and transition modules in series at intervals; the densely connected modules are denoted as denm in Figure 3.
  • the number of densely connected modules is y; the densely connected modules include serially connected convolutional network modules and densely connected unit groups; the convolutional network modules include serially connected convolutional layers, BN layers, and Leaky relu layers; densely connected unit groups It includes m densely connected units; each densely connected unit includes multiple convolutional network modules connected in a densely connected form, and uses a cascaded method to fuse the feature maps output by multiple convolutional network modules; where y is greater than or equal to 4 , and m is a natural number greater than 1.
  • the number of densely connected modules in Figure 3 is 5.
  • the improved YOLOv3 network composed of 5 densely connected modules has higher accuracy.
  • the convolutional network module is denoted as CBL as before; the densely connected unit group is denoted as den unit*m, which means that the densely connected unit group includes m densely connected units, and m can be 2.
  • Each dense connection unit is represented as a den unit; it includes multiple convolutional network modules connected in the form of dense connections, and uses a cascaded method to fuse the feature maps output by multiple convolutional network modules.
  • the cascaded method is concat, which means Tensor splicing, this operation is different from the add operation in the residual module, concat will expand the dimension of the tensor, and add just adds directly and does not change the dimension of the tensor.
  • the dense connection module is used to change the feature fusion method from parallel to serial, which can directly use the early feature map as the input of each subsequent layer to strengthen the feature pass, and reduce the number of parameters and the amount of computation by reusing the feature map parameters of the shallow network.
  • the backbone network in the form of dense connection extracts feature maps of at least 4 scales to perform feature fusion of subsequent prediction branches. Therefore, the number y of dense connection modules is greater than or equal to 4, so that the feature maps output by itself correspond to Fusion into each prediction branch. It can be seen that compared with the YOLOv3 network, the improved YOLOv3 network obviously adds at least one finer-grained feature extraction scale to the backbone network. Please refer to Figure 3. Compared with the YOLOv3 network, the feature map output by the fourth residual module along the input reverse is added for subsequent feature fusion.
  • the backbone network in the form of dense connection outputs corresponding feature maps respectively along the input reverse direction of the four dense connection modules, and the scales of these four feature maps increase in turn.
  • the scales of each feature map are 13 ⁇ 13 ⁇ 72, 26 ⁇ 26 ⁇ 72, 52 ⁇ 52 ⁇ 72, and 104 ⁇ 104 ⁇ 72, respectively.
  • five feature extraction scales may also be set, that is, the feature map output by the fifth dense connection module in the reverse direction of the input is added to perform subsequent feature fusion, and so on.
  • step S2 x feature maps of different scales are obtained, including:
  • the feature maps respectively output by the first dense connection module to the fourth dense connection module along the input reverse direction are obtained, and the size of these four feature maps increases in turn.
  • the transition module is a convolutional network module. That is to use the CBL module as a transition module. Then, when building the backbone network of the improved YOLOv3 network, it is only necessary to replace the residual module with a dense connection module, and then connect the dense connection module and the original CBL module in series. In this way, the network construction process will be faster, and the resulting network structure will be simpler.
  • a transition module only uses the convolutional layer for transition, that is, the feature map is dimensionally reduced by increasing the step size directly. This can only take care of local area features, but cannot combine the information of the entire image, so it will make the features Much information is lost in the figure.
  • the transition module includes a convolutional network module and a maximum pooling layer; the input of the convolutional network module and the input of the maximum pooling layer are shared, and the feature map output by the convolutional network module and the maximum pooling layer are shared.
  • the feature maps output by the layers are fused in a cascaded manner.
  • FIG. 4 is a schematic structural diagram of a transition module according to an embodiment of the present invention.
  • the transition module is represented by a tran module, and the MP layer is a maximum pooling layer (Maxpool, abbreviated MP, meaning maximum pooling).
  • the step size of the MP layer can be selected to be 2.
  • the introduced MP layer can reduce the dimension of the feature map with a larger receptive field; the parameters used are relatively small, so the calculation amount will not increase too much, which can reduce the possibility of overfitting and improve the The generalization ability of the network model; and combined with the original CBL module, it can be regarded as dimensionality reduction of feature maps from different receptive fields, so more information can be retained.
  • the number of convolutional network modules included in the transition module is two or three, and the convolutional network modules are connected in series. Compared with using one convolutional network module, using two or three convolutional network modules in series can increase the complexity of the model and fully extract features.
  • the improved FPN network includes x prediction branches Y 1 ⁇ Y x whose scales increase in turn; the scales of the prediction branches Y 1 ⁇ Y x correspond one-to-one with the scales of the x feature maps; exemplarily, the improvement of FIG. 3
  • the FPN network has 4 prediction branches Y 1 -Y 4 , whose scales correspond to the scales of the aforementioned 4 feature maps respectively.
  • the improved FPN network is used to perform top-down, densely connected feature fusion on feature maps of x different scales, including:
  • the feature map after convolution processing is cascaded and fused with the feature map after upsampling processing by the prediction branches Y i-1 to Y 1 respectively;
  • the size of the three feature maps is the same, which is 52 ⁇ 52 ⁇ 72. In this way, the prediction branch Y3 can continue to perform convolution and other processing after the cascade fusion to obtain the prediction result Y3, and the size of Y3 is 52 ⁇ 52 ⁇ 72.
  • the prediction branch Y 1 it obtains the feature map output by the first dense connection module along the input reverse direction and then performs the subsequent prediction process by itself, and does not accept the feature maps of the remaining prediction branches to be fused with it.
  • the feature fusion method of the FPN network of the original YOLOv3 network the method of adding the deep and shallow network features first, and then up-sampling together, after adding the features, must pass the convolution method.
  • the layer extracts feature maps, and such operations will destroy some of the original feature information.
  • the feature fusion combines the horizontal method and the top-down dense connection method.
  • the original top-down method becomes the feature map of the smaller-scale prediction branch. Transfer its own features to each large-scale prediction branch, and change the feature fusion method into a dense fusion method, that is, the deep features are directly upsampled by different multiples, so that all the transferred feature maps have the same size. .
  • each prediction branch mainly uses some convolution operations to perform prediction.
  • the related prior art which will not be described here.
  • the above-mentioned top-down and dense connection mode feature fusion can be respectively adopted for the improved YOLOv3 network that adopts two different forms of transition modules.
  • this step is implemented in a modified YOLOv3 network employing the transition module shown in FIG. 4 .
  • the improved YOLOv3 network refers to the network obtained in Figure 3 combined with Figure 4.
  • four prediction branches output feature maps of four scales, which are 13 ⁇ 13 ⁇ 72, 26 ⁇ 26 ⁇ 72, 52 ⁇ 52 ⁇ 72, and 104 ⁇ 104 ⁇ 72 respectively.
  • the smallest 13 ⁇ 13 ⁇ 72 feature map has the largest receptive field and is suitable for larger target detection
  • the medium 26 ⁇ 26 ⁇ 72 feature map has a medium receptive field and is suitable for detecting medium-sized targets
  • the large 52 ⁇ 52 ⁇ 72 feature map has a smaller receptive field and is suitable for detecting smaller targets
  • the largest 104 ⁇ 104 ⁇ 72 feature map has a smaller receptive field, so it is suitable for detecting smaller objects.
  • the goal It can be seen that the embodiment of the present invention divides images more precisely, and the prediction results are more targeted for objects with smaller sizes.
  • Network training is done in the server, and network training can include three processes: network pre-training, network pruning, and network fine-tuning. Specifically, the following steps may be included:
  • each sample road image is marked in the form of a target frame containing the target, this target frame is true and accurate, and each target frame is marked with coordinate information to reflect the target's position in the image.
  • determine the anchor box size in the sample road image may include the following steps:
  • the anchor box is several boxes of different sizes obtained by statistics or clustering from the ground truth in the training set; the anchor box actually constrains the range of the predicted object, and adds Size prior experience, so as to achieve the purpose of multi-scale learning.
  • the anchor box since it is desired to add a finer-grained feature extraction scale, it is necessary to use a clustering method to cluster the sizes of each target frame (that is, the real frame) that have been marked in the sample road image, so as to obtain suitable Appropriate anchor box size for the scene of the embodiment of the present invention.
  • determine the number of clusters to be clustered for the anchor box size in the sample road image including:
  • This step is actually to obtain the size of each target frame in the sample road image.
  • the size of each target box may be clustered using the K-Means clustering method to obtain a clustering result of the anchor box size; the clustering process will not be repeated here.
  • the definition of the distance for different anchor boxes is the Euclidean distance of its width and height:
  • d 1,2 represents the Euclidean distance between two anchor boxes
  • w 1 , w 2 represent the width of the anchor box
  • h 1 , h 2 represent the height of the anchor box.
  • the clustering results of the anchor box size can be: (13,18), (20,27), (26,40), (38,35), (36,61), (56 ,45), (52,89), (70,61), (85,89), (69,155), (127,112), (135,220). specific:
  • Anchor box size for prediction branch Y 1 (69,155), (127,112), (135,220);
  • Anchor box size for prediction branch Y 2 (52,89), (70,61), (85,89);
  • Anchor box size for prediction branch Y 3 (38,35), (36,61), (56,45);
  • Anchor box size for prediction branch Y 4 (13,18), (20,27), (26,40);
  • the clustering result is written into the configuration file of each prediction branch of the road image detection network according to the anchor box size corresponding to different prediction branches, and then the network can be pre-trained.
  • pre-training the built network including the following steps:
  • the residual module in the backbone network is changed to a dense connection module, the feature extraction scale is increased, and the feature fusion method of the FPN network is optimized.
  • the dense connection of the backbone network The module performs layer pruning to obtain the YOLOv3-1 network;
  • channel pruning is performed directly during the simplified processing of the YOLOv3 network, but the inventor found in experiments that it is still difficult to achieve rapid speed improvement only through channel pruning. Therefore, a layer pruning process is added before channel pruning.
  • this step can perform layer pruning on the dense connection module of the backbone network in the aforementioned complex network, that is, perform layer pruning on the number m of dense connection units in the dense connection module, reduce m to 2, and obtain YOLOv3-1 The internet.
  • the YOLOv3-1 network is sparsely trained to obtain a YOLOv3-2 network with sparse distribution of BN layer scaling coefficients; it may include:
  • the YOLOv3-1 network is sparsely trained. During the training process, sparse regularization is added to the scaling factor ⁇ .
  • the loss function of sparse training is:
  • the application scenario of the embodiment of the present invention is a road target detection scenario, and the number of types of targets to be detected can be set to 13, which is far less than the 80 in the original YOLOv3 data set. Therefore, the value of ⁇ can be selected with a larger value of ⁇ , and the convergence speed of sparse training will not be very slow. At the same time, the convergence can be further accelerated by increasing the learning rate of the model; however, considering that the parameter selection is too large and It will cause a certain loss to the accuracy of the network model.
  • the combination of learning rate of 0.25 ⁇ and ⁇ of 0.1 ⁇ is finally determined as the preferred parameter combination for sparse training.
  • the preferred combination of learning rate and weight in the embodiment of the present invention is more favorable for the distribution of weights after coefficient training, and the accuracy of the network model is also higher.
  • pruning a channel basically corresponds to removing all incoming and outgoing connections of that channel, and a lightweight network can be directly obtained without using any special sparse computing package.
  • scaling factors act as a proxy for channel selection; since they are co-optimized with network weights, the network can automatically identify insignificant channels that can be safely removed without greatly affecting generalization performance.
  • this step may include the following steps:
  • the channel pruning ratio may be 60%.
  • the YOLOv3-3 network is subjected to knowledge distillation to obtain an improved YOLOv3 network.
  • knowledge distillation is introduced into the YOLOv3-3 network, the aforementioned complex network is used as the teacher network, and the YOLOv3-3 network is used as the student network.
  • the teacher network guides the student network to restore and adjust the accuracy, and obtain an improved YOLOv3 network.
  • the output result before the Softmax layer of the aforementioned complex network can be divided by the temperature coefficient to soften the predicted value of the final output of the teacher network, and then the student network uses the softened predicted value as a label to assist in training YOLOv3 -3 network, the accuracy of the YOLOv3-3 network is finally comparable to that of the teacher network; among them, the temperature coefficient is a preset value and does not change with network training.
  • the reason for introducing the temperature parameter T is that the classification result of the input data of a trained network with high accuracy is basically the same as the real label.
  • the real known training class label is [1, 0, 0]
  • the predicted result may be [0.95, 0.02, 0.03], which is very close to the real label value. Therefore, for the student network, there is little difference between using the classification results of the teacher network to assist training and directly using data for training.
  • the temperature parameter T can be used to control the softening degree of the predicted labels, that is, it can increase the bias of the classification results of the teacher network.
  • the in-vehicle device may be a device placed in the car, such as a navigator, a mobile phone, and the like.
  • the improved YOLOv3 network also includes a classification network and a non-maximum suppression module; the classification network and the non-maximum suppression module are concatenated after the FPN network.
  • the attribute information of the target road image is obtained, including:
  • the classification network includes SoftMax classifier.
  • the purpose is to achieve mutually exclusive classification of multiple vehicle classes.
  • the classification network can also use the logistic regression of the YOLOv3 network for classification to achieve multiple independent binary classifications.
  • the non-maximum value suppression module is used for NMS (non_max_suppression, non-maximum value suppression) processing. It is used to exclude prediction boxes with relatively low confidence in multiple prediction boxes that repeatedly select the same target.
  • the detection result is in the form of a vector, including the position of the predicted frame, the confidence level of the vehicle in the predicted frame, and the category of the target in the predicted frame.
  • the position of the prediction frame is used to represent the position of the target in the target road image; specifically, the position of each prediction frame is represented by four values of bx, by, bw, and bh, and bx and by are used to represent the position of the center point of the prediction frame.
  • bw, bh are used to represent the width and height of the prediction box. For example, there are 1 bus, 5 cars and 2 pedestrians on the road. They are located at different positions of the target road image. The bus is located in the image with the upper left corner as the origin, 230 pixels horizontally and 180 pixels vertically. The bus is located in the image. If the width is 20 and the height is 50, then its attribute information can be "230, 180, 20, 50, bus".
  • the category of the target is the category of the object to which the target belongs, such as people, animals, buildings, vehicles, signs and so on.
  • the target may only be a vehicle
  • the categories may include cars, single-deck buses, double-deck buses, large trucks, vans, bicycles, motorcycles, and the like.
  • the method may further include:
  • the attribute information may be displayed, including: displaying the attribute information on the in-vehicle device.
  • the attribute information may be displayed on a display screen in the vehicle, and the display screen may be a display screen of a navigation device or a display screen of a driver's mobile phone.
  • the target road image marked with attribute information can be directly displayed on the display screen in the car, so that the driver in the car can directly observe the attribute information, so as to understand the location and category of each target displayed in the target road image.
  • a driver in the distance can obtain the road conditions beyond his line of sight, and make appropriate driving behaviors in advance, such as slowing down, route planning, object avoidance, etc., to achieve the purpose of safe driving.
  • the attribute information can also be displayed in the form of other text, which is reasonable.
  • the attribute information can be played in the form of voice, so that the driver can easily receive the attribute information even when it is inconvenient to view the image while driving, which is conducive to safe driving.
  • the above two methods can be combined.
  • display attribute information on the in-vehicle device which may include:
  • a special reminder can be given for small targets.
  • the size of the prediction frame where the target is located can be determined, and it can be judged whether the size of a prediction frame is smaller than the preset prediction frame size, and if so, Then it can be determined that the target belongs to the small target to be reminded; alternatively, the category of the target can be divided in advance, and some obviously smaller object categories such as signs are preset as the small target category, by judging whether the category of a target is not. It belongs to the preset small target category to determine whether the target belongs to the small target to be reminded.
  • the position and category of the target can be combined to determine the small target to be reminded.
  • the attribute information can be displayed in the reminder mode on the in-vehicle device; for example, the target road image is marked with brightly colored fonts, or marked in the form of flashing, or supplemented by voice prompts ,and many more.
  • the target road image is marked with brightly colored fonts, or marked in the form of flashing, or supplemented by voice prompts ,and many more.
  • voice prompts and many more.
  • a combination of multiple reminder methods can be used.
  • the general mode can be used to display the attribute information on the in-vehicle device, that is, a consistent mode is adopted for all targets, which will not be repeated here.
  • the method may further include:
  • the driver can send the attribute information to the image acquisition terminal or other vehicles, pedestrians, etc., so that multiple terminals in the Internet of Vehicles system can obtain the attribute information to achieve information statistics, safe driving and other purposes.
  • the vehicle can carry the current position information of the vehicle, such as the coordinate information obtained through GPS (Global Positioning System, global positioning system), and the current time information, so that the receiver can have a better understanding of the road condition information. clearer understanding.
  • a plurality of target road images within a predetermined period of time may be acquired to perform target detection, and the position and category of the same target may be used to achieve target trajectory tracking, and so on.
  • the original YOLOv3 network contains more convolutional layers, because there are more categories of targets, including 80.
  • the target is mainly an object on the road, and the number of categories of the target is small, so a large number of convolutional layers are unnecessary, which will waste network resources and reduce the processing speed.
  • the number of densely connected units contained in the densely connected module is set to 2 , the number of convolutional layers in the backbone network can be reduced for the target road image in the embodiment of the present invention without affecting the accuracy of the network.
  • the improved YOLOv3 network can also be obtained by adjusting the value of k in the convolutional network module group of each prediction branch in the FPN network, that is, k is reduced from 5 in the original YOLOv3 network. is 4 or 3, that is, the original CBL*5 is changed to CBL*4 or CBL*3; this can also reduce the number of convolutional layers in the FPN network, without affecting the network accuracy, for the implementation of the present invention For example, the target road image, the overall number of network layers is simplified, and the network processing speed is improved.
  • the residual module in the backbone network of the YOLOv3 network is replaced with a dense connection module, and the feature fusion mode is changed from parallel to serial, so that when the backbone network performs feature extraction,
  • the early feature map can be directly used as the input of each subsequent layer, and the obtained feature map has more information, which strengthens the transfer of features, so the detection accuracy can be improved when the target road image is detected.
  • the number of parameters and the amount of computation can be reduced by multiplexing the feature map parameters of the shallow network.
  • using multiple feature extraction scales to add fine-grained feature extraction scales for small targets can improve the detection accuracy of small targets in target road images.
  • the feature fusion method of the FPN network is changed, the feature map extracted by the backbone network is characterized by a top-down and dense connection method, and the deep features are directly upsampled by different multiples, so as to make the transmitted All feature maps have the same size.
  • the shallow network there is also the participation of high-dimensional semantic information, which is helpful for Improve the detection accuracy; at the same time, by directly receiving the features of the shallower network, more specific features can be obtained, which will effectively reduce the loss of features, reduce the amount of parameters that need to be calculated, improve the detection speed, and realize real-time detection.
  • the network volume can be reduced and most of the redundancy can be eliminated.
  • the calculation can greatly improve the detection speed while maintaining the detection accuracy.
  • the invention deploys the detection process of the cloud in the edge devices with very limited storage resources and computing resources, and the vehicle-mounted device can realize the road detection beyond the line of sight, and can realize the high-precision and high-real-time detection of the targets on the road. It is beneficial for the driver to drive safely.
  • the present invention selects the UA-DETRAC data set for experiments.
  • the shooting location of the UA-DETRAC dataset is the road crossing overpasses in Beijing and Tianjin.
  • the shooting equipment is Cannon EOS550D
  • the video frame rate is 25FPS
  • the data format is JPEG
  • the image size is 960 ⁇ 540.
  • the dataset contains 60 videos, shot on sunny days, cloudy days, rainy days and nights, including data under different weather conditions.
  • the total number of images is 82085 and the objects are annotated. These annotations are manually annotated, so the annotation data is more accurate. All images in each video are numbered sequentially under the same folder, and the annotation data of all images in each video are recorded in an XML file with the same name as the video folder.
  • the random sampling method is used to extract the data in the data set.
  • the entire dataset contains a total of 82,085 images, and this paper will sample 10,000 images for experiments. And according to the ratio of 4:1 to allocate training set and test set. In order to ensure that the training set and the test set do not contain the same images, the 10,000 images extracted should be randomly selected again for data set allocation.
  • training the YOLO network needs to use data in VOC format or COCO format, that is, five numbers are used to represent the type of frame-selected object, the position of the upper left corner, and the length and width of the object, and these data are stored in text documents. Therefore, Python script is used to convert the annotation format of the dataset, and at the same time, the types and proportions of targets in the dataset are counted.
  • the residual module in the backbone network of the YOLOv3 network is replaced with a dense connection module and the network after the transition module is improved is named Dense-YOLO-1; for the structure of the Dense-YOLO-1 network, please refer to the network in FIG. 2 and FIG.
  • the backbone network of 3 is understood, and will not be repeated here.
  • Test Dense-YOLO-1 with YOLOv3 network The mAP (Mean Average Precision) of the model is selected as the evaluation object. The value of mAP is between 0 and 1. The larger the mAP, the better the model accuracy. Of course, also refer to the loss curve of the model to observe the convergence of the model.
  • Figure 5-1 is a comparison diagram of mAP curves of YOLOv3 and Dense-YOLO-1 according to the embodiment of the present invention
  • Figure 5-2 is YOLOv3 and Dense-YOLO according to the embodiment of the present invention.
  • Table 1 The volumes of the YOLOv3 and Dense-YOLO-1 network models and their detection times on different platforms
  • the time for the network to perform road image detection on different platforms is shown in Table 1. It can be seen that adding dense connections to the network can reduce the size of the network and reduce the time required for detection.
  • Dense-YOLO-1 On the basis of Dense-YOLO-1, an improvement idea of multi-scale is to add a more fine-grained target detection scale to YOLO v3, so that the network can detect smaller objects.
  • the scale of 104 ⁇ 104 is specifically increased, and the corresponding anchor box size is set, and the obtained network is named MultiScale-YOLO-1.
  • the mAP and loss curves of Dense-YOLO-1 and MultiScale-YOLO-1 networks are shown in Figure 6-1 and Figure 6-2.
  • the multi-scale network has improved compared with the densely connected network, but the change is not obvious, only about 7%, and the difference in the loss curve is still not obvious. This may be because the number of small-sized objects in the data set is not large, and the demand for fine-grained recognition is not strong.
  • the requirements are high, if time and energy are sufficient and there is no suitable dataset, you can label the dataset yourself.
  • Dense-YOLO-1 On the basis of Dense-YOLO-1, another multi-scale improvement idea is to start with the method of feature fusion, and try to improve the method of feature fusion to allow the detection process to integrate more dimensional semantic information, thereby improving target recognition. precision. Therefore, the feature fusion method of the FPN network is improved, and the fusion method in the form of top-down and dense connection is adopted, and the obtained network is named Dense-YOLO-2. The network structure is no longer shown. The mAP and loss curves of Dense-YOLO-1 and Dense-YOLO-2 networks are shown in Figure 7-1 and Figure 7-2.
  • the advantages of multi-scale are more obvious. This may be because the densely connected feature fusion method retains the More high-dimensional abstract semantic information than horizontal connections allows the model to discriminate objects more clearly.
  • the network accuracy after changing the fusion method is 18.2% higher than the original, and the loss curve is also slightly lower than before. According to the above graph, it can be seen that the improvement of the fusion method significantly improves the network accuracy.
  • the network should have a smaller parameter volume and a faster detection speed.
  • the volume of the network model after multi-scale improvement and the time for road image detection on different platforms are shown in Table 2.
  • Table 2 The volume of the multi-scale improved network model and its detection time on different platforms
  • the method of layer pruning is to change the densely connected blocks from a group of 4 densely connected units to a group of 2, which simplifies the network structure and can The amount of parameters and operations are reduced by nearly half.
  • the network after layer pruning is named MultiScale-YOLO-3 network, which can also be referred to as YOLOv3-1 network.
  • the YOLOv3-1 network is sparsely trained to obtain a YOLOv3-2 network with sparse distribution of BN layer scaling coefficients;
  • the channel pruning ratio can be 60%. This is because a small number of target types in the target road image to be detected are greatly affected during the network compression process, which will directly affect the mAP. Therefore, the data set and network compression ratio should be considered.
  • the embodiment of the present invention selects types of targets with a smaller number to be combined to balance the number of different types, or directly uses a data set with a more balanced type distribution, which is similar to the application scenario of the embodiment of the present invention. match.
  • the other is to control the compression ratio to ensure that the prediction accuracy of a small number of categories does not drop too much. According to the mAP simulation results, the compression ratio of 50%-60% is the turning point of the accuracy change, so the compression ratio of 60% can be initially selected.
  • the relationship between detection time and model compression ratio should also be considered.
  • the time of image detection is simulated. According to the simulation results, it can be found that the impact of different network compression ratios on the detection time is very weak, while the time required for NMS (non-maximum suppression) has a greater impact.
  • NMS non-maximum suppression
  • the YOLOv3-3 network is subjected to knowledge distillation to obtain an improved YOLOv3 network.
  • the aforementioned complex network serves as the teacher network.
  • the resulting network is named YOLO-Terse.
  • FIG. 10 is a performance comparison diagram of the improved YOLOv3 network (YOLO-Terse) and the YOLOv3 network according to the embodiment of the present invention. It can be seen that the accuracy of YOLO-Terse is 9.0% higher than that of YOLOv3, while the model size is reduced by 72.9%, and the detection time on Tesla V100 and JetsonTX2 is reduced by 18.9% and 15.3%, respectively. This shows that the model volume is greatly reduced and the detection speed of road images is improved when the accuracy is partially improved.
  • the embodiments of the present invention further provide an in-vehicle electronic device, as shown in FIG. 11 , including a processor 1101 , a communication interface 1102 , a memory 1103 and a communication bus 1104 , wherein the processor 1101, the communication interface 1102, and the memory 1103 complete the communication with each other through the communication bus 1104,
  • the processor 1101 is configured to implement the steps of any of the foregoing road detection methods based on the Internet of Vehicles when executing the program stored in the memory 1103 .
  • the communication bus mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • PCI peripheral component interconnect standard
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the above electronic device and other devices.
  • the memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage.
  • RAM Random Access Memory
  • NVM non-Volatile Memory
  • the memory may also be at least one storage device located away from the aforementioned processor.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processing, DSP), application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array

Abstract

本发明公开了一种基于车联网的道路检测方法,应用于车载端,包括:获取图像采集端拍摄的目标道路图像;输入改进型YOLOv3网络中,利用密集连接形式的主干网络进行特征提取,得到x个不同尺度的特征图;x为大于等于4的自然数;利用改进型FPN网络对x个不同尺度的特征图进行自顶向下、密集连接方式的特征融合,得到各尺度对应的预测结果;基于所有预测结果,得到目标道路图像的属性信息,属性信息包括目标道路图像中目标的位置和类别;改进型YOLOv3网络是在YOLOv3网络基础上,将主干网络中的残差模块改为密集连接模块、增加特征提取尺度、优化FPN网络的特征融合方式,进行剪枝及结合知识蒸馏引导网络恢复处理后形成的。

Description

一种基于车联网的道路检测方法 技术领域
本发明属于图像检测领域,具体涉及一种基于车联网的道路检测方法及车载电子设备。
背景技术
近年来,车联网技术得到了快速发展。利用该技术,行驶中的车辆,能够借助新一代信息通信技术,实现车辆、行人、路面设施、服务平台之间的网络连接,提升车辆整体的智能驾驶水平及交通运行效率。
为了向驾驶者提供有效的道路信息,以便驾驶者及时规划路线、紧急避让,实现安全驾驶。一些研究者将基于神经网络的目标检测算法与车联网技术相结合,通过拍摄道路图像,上传至云端进行图像检测,识别出其中车辆的类型和位置,再将检测结果传输给对应的车辆以供驾驶员使用。但数据上传至云端,以及从云端下载是需要占用一定的网络带宽资源的,并且需要较长的时间;同时,云端对图像进行处理,也需要耗费一定的时间。因此会带来较长的传输时延,导致检测实时性较差。但道路状况是瞬息万变的,许多交通事故在极短的时间内就会发生。
并且,随着车辆密集化、道路复杂化,对道路图像的检测精度提出了更高的要求,尤其是需要准确检测出图像中的小目标,比如小型车辆,或者由于拍摄距离远在图像中尺寸较小的车辆等等。而现有检测方法的检测精度,尤其是对于小目标的检测精度并不理想。
因此,急需提出一种基于车联网的道路检测方法,以实现高精度和高实时性检测。
发明内容
为了提出一种基于车联网的道路检测方法,以实现高精度和高实时性检测,本发明实施例提供了一种基于车联网的道路检测方法及车载电子设备。具体技术方案如下:
第一方面,本发明实施例提供了一种基于车联网的道路检测方法,应用于车载端,包括:
获取图像采集端拍摄的目标道路图像;将所述目标道路图像输入预先训练得到的改进型YOLOv3网络中,利用密集连接形式的主干网络对所述目标道路图像进行特征提取,得到x个不同尺度的特征图;x为大于等于4的自然数;利用改进型FPN网络对所述x个不同尺度的特征图进行自顶向下、密集连接方式的特征融合,得到各尺度对应的预测结果;基于所有预测结果,得到所述目标道路图像的属性信息,所述属性信息包括所述目标道路图像中目标的位置和类别;其中,所述改进型YOLOv3网络包括所述密集连接形式的主干网络、所述改进型FPN网络;所述改进型YOLOv3网络是在YOLOv3网络基础上,将主干网络中的残差模块更换为密集连接模块、增加特征提取尺度、优化FPN网络的特 征融合方式,以及进行剪枝及结合知识蒸馏引导网络恢复处理后形成的;所述改进型YOLOv3网络是根据样本道路图像,以及所述样本道路图像对应的目标的位置和类别训练得到的。
第二方面,本发明实施例提供了一种车载电子设备,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;
存储器,用于存放计算机程序;
处理器,用于执行存储器上所存放的程序时,实现上述第一方面提供的任意一种基于车联网的道路检测方法的步骤。
本发明实施例所提供的方案中,一方面,将YOLOv3网络的主干网络中的残差模块更换为密集连接模块,将特征融合方式从并行改为了串行,使得主干网络在进行特征提取时,能够直接将早期的特征图作为后面每一层的输入,获得的特征图的信息量更多,强化了特征的传递,因此在进行目标道路图像检测时,能够提高检测精度。并且,通过复用浅层网络的特征图参数能够减少参数的数量以及运算量。另一方面,采用多个特征提取尺度,为小目标增加细粒度的特征提取尺度,能够提升目标道路图像中小目标的检测精度。又一方面,改变FPN网络的特征融合方式,对主干网络提取的特征图采用自顶向下、密集连接的方式进行特征融合,将深层特征直接进行不同倍数的上采样,以此来使得传递的所有特征图具有相同的尺寸,将这些特征图和浅层的特征图通过串联的方式融合起来,可以利用到更多的原始信息,在浅层网络中也有高维语义信息的参与,有助于提高检测的精度;同时通过直接接收更浅层网络的特征,可以得到更加具体的特征,将有效的减少特征的损失,可以减少需要运算的参数量,提高检测速度,实现实时检测。再一方面,通过对预训练的网络进行层剪枝、稀疏化训练、通道剪枝,以及知识蒸馏处理,并在各个处理过程选取优化的处理参数,可以精简网络体积,摒除大部分的冗余计算,能够在维持检测精度的情况下大幅度提升检测速度。本发明将云端的检测过程部署在了存储资源和计算资源都非常有限的边缘设备当中,由车载设备就可以实现超视距道路检测,可以对道路上的目标实现高精度和高实时性检测,有利于驾驶者进行安全驾驶。
当然,实施本发明的任一产品或方法并不一定需要同时达到以上所述的所有优点。
以下将结合附图及实施例对本发明做进一步详细说明。
附图说明
图1为本发明实施例提供的一种基于车联网的道路检测方法的流程示意图;
图2为现有技术中的YOLOv3网络的结构示意图;
图3为本发明实施例所提供的改进型YOLOv3网络的结构示意图;
图4为本发明实施例所提供的一种过渡模块的结构示意图;
图5-1为YOLOv3和本发明实施例的Dense-YOLO-1的mAP曲线对比图;图5-2为YOLOv3和本发明实施例的Dense-YOLO-1的loss曲线对比图;
图6-1为本发明实施例的Dense-YOLO-1和MultiScale-YOLO-1的mAP曲线对比图; 图6-2为本发明实施例的Dense-YOLO-1和MultiScale-YOLO-1的loss曲线对比图;
图7-1为本发明实施例的Dense-YOLO-1和Dense-YOLO-2的mAP曲线对比图;图7-2为本发明实施例的Dense-YOLO-1和Dense-YOLO-2的loss曲线对比图;
图8-1为本发明实施例的Dense-YOLO-1和MultiScale-YOLO-2的mAP曲线对比图;图8-2为本发明实施例的Dense-YOLO-1和MultiScale-YOLO-2的loss曲线对比图;
图9-1为本发明实施例选择的参数组合5的权重偏移图;图9-2为本发明实施例选择的参数组合5的权重交叠图;
图10为本发明实施例的改进型YOLOv3网络(YOLO-Terse)与YOLOv3网络的性能对比图;
图11为本发明实施例所提供的一种车载电子设备的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行描述。
为了提出一种基于车联网的道路检测方法,以实现高精度和高实时性检测,本发明实施例提供了一种基于车联网的道路检测方法及车载电子设备。
需要说明的是,本发明实施例所提供的一种基于车联网的道路检测方法的执行主体可以为一种基于车联网的道路检测装置,该基于车联网的道路检测装置可以运行于车载电子设备中。其中,该车载电子设备可以为一图像处理工具中的插件,或者独立于一图像处理工具之外的程序,当然并不局限于此。
第一方面,本发明实施例提供一种基于车联网的道路检测方法。下面,首先对该基于车联网的道路检测方法进行介绍。
如图1所示,本发明实施例所提供的一种基于车联网的道路检测方法,应用于车载端,可以包括如下步骤:
S1,获取图像采集端拍摄的目标道路图像;
目标道路图像为图像端的图像采集设备针对道路区域拍摄的图像。
图像采集端可以是通过车联网技术与当前车辆通信连接的其余车辆、行人、路面设施、服务平台等。比如图像端可以为路边的路灯杆、天桥等位置较高的路面设施,也可以为无人机等飞行设备,这些图像采集端上部署有图像采集设备。
图像采集设备可以包括摄像头、摄像机、照相机、手机等等;可选的实施方式中,图像采集设备可以为高分辨率摄像头。
图像采集设备可以以一定的时间间隔不断采集对应区域的道路图像,比如以30fps的速率进行拍摄,并由所在的图像采集端将采集到的道路图像传输给对应的车辆。当然,时间间隔也可以根据道路上物体的密集程度或者根据需求进行调整。
车联网中的一大问题就是超视距问题。由于在路面行驶过程中,行驶者的视距有限,无法用肉眼观测到超过视距范围的道路状况,尤其在前方有大型车辆、交叉路口等情形,视距更加受限。但是为了增加对路况的了解,车联网要解决超视距的问题,使得行驶者 能够获得视距之外的路况信息,及早的调整行驶计划。通过设置于距离当前车辆很远的图像采集端不断采集目标道路图像,如果这些目标道路图像能够得到有效检测,势必能够为当前车辆解决上述超视距问题,为行驶者带来极大的便利。
本发明实施例中,目标道路图像的尺寸为416×416×3。因此,在该步骤,一种实施方式中,车载端可以直接从图像采集端获得416×416×3尺寸的目标道路图像;另一种实施方式中,车载端可以获得图像采集端发送的任意尺寸的图像,车载端再将获得的图像经过一定的尺寸缩放处理,得到416×416×3尺寸的目标道路图像。
并且,在上述两种实施方式中,还可以对获取到的图像可以进行裁剪、拼接、平滑、滤波、边缘填充等图像增强操作,以增强图像中感兴趣的特征,扩展数据集的泛化能力。
S2,将目标道路图像输入预先训练得到的改进型YOLOv3网络中,利用密集连接形式的主干网络对目标道路图像进行特征提取,得到x个不同尺度的特征图;x为大于等于4的自然数;
为了便于理解本发明实施例所提出的改进型YOLOv3网络的网络结构,首先,对现有技术中的YOLOv3网络的网络结构进行介绍,请参见图2,图2为现有技术中的YOLOv3网络的结构示意图。在图2中,虚线框内的部分为YOLOv3网络。其中点划线框内的部分为YOLOv3网络的主干(backbone)网络,即darknet-53网络;YOLOv3网络的主干网络由CBL模块和5个resn模块串接构成。CBL模块为卷积网络模块,包括串行连接的conv层(Convolutional layer,卷积层,简称conv层)、BN(Batch Normalization,批量归一化)层和激活函数Leaky relu对应的Leaky relu层,CBL即表示conv+BN+Leakyrelu。resn模块为残差模块,n代表自然数,如图2所示,具体地,沿输入方向依次有res1、res2、res8、res8、res4;resn模块包括串行连接的zero padding(零填充)层、CBL模块和残差单元组,残差单元组用Res unit*n表示,含义是包括n个残差单元Res unit,每个残差单元包括采用残差网络(Residual Network,简称为ResNets)连接形式连接的多个CBL模块,特征融合方式采用并行方式,即add方式。
主干网络之外的其余部分为FPN(FeaturePyramidNetworks,特征金字塔网络)网络,FPN网络又分为三个预测支路Y 1~Y 3,预测支路Y 1~Y 3的尺度分别与沿输入逆向的3个残差模块res4、res8、res8分别输出的特征图的尺度一一对应。各预测支路的预测结果分别以Y1、Y2、Y3表示,Y1、Y2、Y3的尺度依次增大。
FPN网络的各个预测支路中均包括卷积网络模块组,具体包括5个卷积网络模块,即图2中的CBL*5。另外,US(up sampling,上采样)模块为上采样模块;concat表示特征融合采用级联方式,concat为concatenate的简称。
YOLOv3网络中各个主要模块的具体构成请参见图2中虚线框下的示意图。
本发明实施例中,改进型YOLOv3网络包括密集连接形式的主干网络、改进型FPN网络;改进型YOLOv3网络是在YOLOv3网络基础上,将主干网络中的残差模块更换为密集连接模块、增加特征提取尺度、优化FPN网络的特征融合方式,以及进行剪枝及结 合知识蒸馏引导网络恢复处理后形成的;改进型YOLOv3网络是根据样本道路图像,以及样本道路图像对应的目标的位置和类别训练得到的。关于网络训练过程在后文中予以介绍。
为了便于理解本发明方案,以下先对改进型YOLOv3网络的结构进行介绍,首先是其主干网络部分。
本发明实施例的改进型YOLOv3网络的结构请参见图3,图3为本发明实施例所提供的改进型YOLOv3网络的结构示意图;在图3中,可以看到主干网络发生了变化,请参见图3中的点划线框内的部分。
本发明实施例所提供的改进型YOLOv3网络的主干网络相比于YOLOv3网络的主干网络,一方面的改进思想在于,借鉴密集卷积网络DenseNet的连接方式,提出一种具体的密集连接模块,用来替换YOLOv3网络的主干网络中的残差模块(resn模块)。即改进型YOLOv3网络的主干网络采用的是密集连接形式的主干网络,已知的是,ResNets在将特征传递到图层之前通过求和来组合特征,即采用并行方式进行特征融合。而密集连接方式为了确保信息以最大程度在网络中各层之间流动,将所有层(具有匹配的特征图大小)彼此直接连接。具体的,针对每个层,它之前层的所有特征图被用作它的输入,它本身的特征图被用作它所有后续层的输入,也就是特征融合采用级联方式(也称为串联方式)。因此,相比于YOLOv3网络使用残差模块,改进型YOLOv3网络通过改用密集连接模块,获得的特征图的信息量更多,在进行道路图像检测时,能够增强特征传播,提高检测精度。同时,因为它不需要重新学习冗余的特征图,可以大大减小参数数量,减少计算量,还可以减轻梯度消失问题。另一方面,本发明实施例将特征图由浅到深进行传递,提取至少四个尺度的特征图,让网络能够检测不同尺度的物体,通过增加细粒度的特征提取尺度,可以使得在后续目标检测时,针对小目标能够提高检测精度。本发明实施例中的小目标包括道路上本身体积较小的物体,如道路标志牌、小障碍物、小动物等,或者由于拍摄距离远导致图像中面积较小的物体。
示例性的,请参见图3,密集连接形式的主干网络可以包括:
间隔串接的密集连接模块和过渡模块;图3中密集连接模块表示为denm。密集连接模块的数量为y;密集连接模块包括串行连接的卷积网络模块和密集连接单元组;卷积网络模块包括串行连接的卷积层、BN层、Leaky relu层;密集连接单元组包括m个密集连接单元;每个密集连接单元包括多个采用密集连接形式连接的卷积网络模块,并采用级联方式融合多个卷积网络模块输出的特征图;其中,y为大于等于4的自然数,m为大于1的自然数。
作为示例,图3中密集连接模块的数量为5个,相比于4个密集连接模块,5个密集连接模块所构成的改进型YOLOv3网络的精度更高。
卷积网络模块,如前表示为CBL;密集连接单元组表示为den unit*m,其含义是,密集连接单元组包括m个密集连接单元,m可以为2。每个密集连接单元表示为den unit; 其包括多个采用密集连接形式连接的卷积网络模块,并采用级联方式融合多个卷积网络模块输出的特征图,级联方式即concat,含义为张量拼接,该操作和残差模块中的add的操作是不一样的,concat会扩充张量的维度,而add只是直接相加不会导致张量维度的改变。因此,上述改进型YOLOv3网络的主干网络在进行特征提取时,利用密集连接模块,将特征融合方式从并行改为了串行,能够直接将早期的特征图作为后面每一层的输入,强化特征的传递,并通过复用浅层网络的特征图参数来减少参数的数量以及运算量。
本发明实施例中,密集连接形式的主干网络至少提取4个尺度的特征图以进行后续预测支路的特征融合,因此,密集连接模块的数量y大于等于4,以便将自身输出的特征图对应融合进各个预测支路。可见,改进型YOLOv3网络相比于YOLOv3网络,明显在主干网络增加了至少一个更细粒度的特征提取尺度。请参见图3,相比于YOLOv3网络增加了提取沿输入逆向的第四个残差模块输出的特征图进行后续的特征融合。因此,密集连接形式的主干网络沿输入逆向的四个密集连接模块分别输出对应的特征图,这四个特征图的尺度依次增大。具体的,各个特征图的尺度分别为13×13×72、26×26×72、52×52×72、104×104×72。
当然,在可选的实施方式中,也可以设置五个特征提取尺度,即再增加提取沿输入逆向的第五个密集连接模块输出的特征图进行后续的特征融合,等等。
具体的,针对S2步骤,得到x个不同尺度的特征图,包括:
得到沿输入逆向的x个密集连接模块输出的、尺度依次增大的x个特征图。
参见图3,即得到沿输入逆向的第一个密集连接模块至第四个密集连接模块分别输出的特征图,这四个特征图尺寸依次增大。
在本发明实施例中,对于过渡模块的结构:
可选的第一种实施方式中,过渡模块为卷积网络模块。也就是使用CBL模块作为过渡模块。那么,在搭建改进型YOLOv3网络的主干网络时,仅需要将残差模块更换为密集连接模块,再将密集连接模块和原有的CBL模块进行串联即可得到。这样,网络搭建过程会较为快速,所得到的网络结构较为简单。但这样的过渡模块仅使用卷积层进行过渡,即直接通过增加步长来对特征图进行降维,这样做只能照顾到局部区域特征,而不能结合整张图的信息,因此会使得特征图中的信息丢失较多。
可选的第二种实施方式中,过渡模块包括卷积网络模块和最大池化层;卷积网络模块的输入和最大池化层的输入共用,卷积网络模块输出的特征图和最大池化层输出的特征图采用级联方式融合。该种实施方式中过渡模块的结构请参见图4,图4为本发明实施例所提供的一种过渡模块的结构示意图。该种实施方式中,用tran模块表示该种过渡模块,MP层为最大池化层(Maxpool,缩写MP,含义为最大池化)。进一步的,MP层的步长可以选择为2。在该种实施方式中,引入的MP层可以以较大的感受野对特征图进行降维;使用的参数比较少,因此不会过多地增加计算量,可以减弱过拟合的可能,提高网络模型的泛化能力;并且结合原有的CBL模块,可以看做从不同的感受野对特征图进行 降维,因此可以保留更多信息。
针对上述第二种实施方式,可选的,过渡模块包括的卷积网络模块的数量为两个或三个,且各个卷积网络模块之间采用串接方式。相比于使用一个卷积网络模块,采用串接的两个或三个卷积网络模块,能够增加模型的复杂度,充分提取特征。
S3,利用改进型FPN网络对x个不同尺度的特征图进行自顶向下、密集连接方式的特征融合,得到各尺度对应的预测结果;
以下结合图3的改进型FPN网络的结构,对其自顶向下、密集连接方式的特征融合方式进行介绍。
改进型FPN网络包括尺度依次增大的x个预测支路Y 1~Y x;预测支路Y 1~Y x的尺度与x个特征图的尺度一一对应;示例性的,图3的改进型FPN网络有4个预测支路Y 1~Y 4,其尺度分别与前述的4个特征图的尺度一一对应。
针对S3步骤,利用改进型FPN网络对x个不同尺度的特征图进行自顶向下、密集连接方式的特征融合,包括:
针对预测支路Y i,从x个特征图中,获取对应尺度的特征图并进行卷积处理;
将卷积处理后的特征图,与预测支路Y i-1~Y 1分别经上采样处理后的特征图进行级联融合;
其中,预测支路Y i-j的上采样倍数为2 j;i=2、3,…,x;j为小于i的自然数。
参见图3,以i=3也就是预测支路Y 3为例说明,其进行级联融合处理的特征图来源于三方面:第一方面,是从4个特征图中,获取对应尺度的特征图并进行卷积处理,也就是沿输入逆向的第三个密集连接模块输出的特征图经CBL模块后的特征图,该特征图也可以理解为经过1倍上采样,尺寸是52×52×72;第二方面来源于预测支路Y 2(即Y i-1=Y 2),即沿输入逆向的第二个密集连接模块输出的特征图(尺寸是26×26×72)经过预测支路Y 2的CBL模块再经2 1=2倍上采样处理后的特征图(尺寸是52×52×72);第三方面来源于预测支路Y 1(即Y i-2=Y 1),即沿输入逆向的第一个密集连接模块输出的特征图(尺寸是13×13×72)经预测支路Y 1的CBL模块后再经2 2=4倍上采样处理后的特征图(尺寸是52×52×72);那么,本领域技术人员可以理解的是,上述过程将其主干网络输出的3个不同尺度的特征图经过不同倍数的上采样处理后,可以使得待级联融合的3个特征图的尺寸一致,均为52×52×72。这样,预测支路Y 3可以在级联融合之后,继续进行卷积等处理,得到预测结果Y3,Y3尺寸为52×52×72。
关于预测支路Y 2和Y 4的特征融合过程,请参见预测支路Y 3,在此不再赘述。而针对预测支路Y 1,其获取沿输入逆向的第一个密集连接模块输出的特征图后自行进行后续的预测过程,并不接受其余预测支路的特征图与之融合。
原先的YOLOv3网络的FPN网络的特征融合方式中,使用的是先将深层和较浅层网络特征相加,再一起进行上采样的方法,这种方法在将特征相加后,要通过卷积层提取特征图,这样的操作会破坏一些原始的特征信息。而在本实施方式中,特征融合结合横 向方式与自顶向下密集连接方式,在这种方式中,原有的自顶向下的方式变成了尺度较小的预测支路的特征图直接向每一个尺度较大的预测支路传递自身的特征,将特征融合方式变为了密集的融合方法,即深层特征直接进行不同倍数的上采样,以此来使得传递的所有特征图具有相同的尺寸。将这些特征图和浅层的特征图通过串联的方式融合起来,对融合的结果再次提取特征来消除里面的噪声,保留主要信息,然后进行预测,这样可以利用到更多的原始信息,在浅层网络中也有高维语义信息的参与。因此,这样可以发挥密集连接网络保留更多特征图原始语义特征的优势,只不过对于自顶向下的方法来讲,保留的原始语义是更加高维的语义信息,这样可以对于物体的分类有帮助。通过直接接收更浅层网络的特征,可以得到更加具体的特征,这样将有效地减少特征的损失,并且可以减少需要运算的参数量,加速预测过程。
以上,主要针对特征融合方式进行介绍,各预测支路在特征融合之后主要是利用一些卷积操作进行预测,关于如何获取各自的预测结果请参见相关的现有技术,在此不进行说明。
那么,在本发明实施例中,可以针对采用两种不同形式过渡模块的改进型YOLOv3网络分别采用上述自顶向下、密集连接方式的特征融合。优选的实施方式中,在采用图4所示的过渡模块的改进型YOLOv3网络中实施该步骤。在后文中,改进型YOLOv3网络指的是图3结合图4得到的网络。
本发明实施例的改进型YOLOv3网络中,4个预测支路共输出四个尺度的特征图,分别为13×13×72、26×26×72、52×52×72、104×104×72,最小的13×13×72的特征图上由于其感受野最大,适合较大的目标检测;中等的26×26×72特征图上由于其具有中等感受野,适合检测中等大小的目标;较大的52×52×72特征图上由于其具有较小的感受野,适合检测较小的目标;最大的104×104×72特征图上由于其具有更小的感受野,故适合检测再小的目标。可见,本发明实施例对图像的划分更加精细,预测结果对尺寸较小的物体更有针对性。
以下对网络训练过程予以介绍。网络训练是在服务器中完成的,网络训练可以包括网络预训练、网络剪枝和网络微调三个过程。具体可以包括以下步骤:
(一),搭建网络结构;可以在YOLOv3网络基础上进行改进,将主干网络中的残差模块更换为密集连接模块、增加特征提取尺度、优化FPN网络的特征融合方式,改进过渡模块,得到如图3和图4结合得到的网络结构,作为搭建好的网络;其中m=4。
(二),获得若干样本道路图像,以及样本道路图像对应目标的位置和类别。在该过程中,各样本道路图像对应目标的位置和类别是已知的,确定各样本道路图像对应目标的位置和类别的方式可以是:通过人工识别,或者通过其他图像识别工具识别等等。之后,需要对样本道路图像进行标记,可以采用人工标记方式,当然也可以利用其余人工智能方法进行非人工标记,这都是合理的。其中,各样本道路图像对应目标的位置是以包含目标的目标框的形式标记的,这个目标框是真实准确的,各个目标框标记有坐标信 息,以此来体现目标在图像中的位置。
(三),确定样本道路图像中的锚盒尺寸;可以包括以下步骤:
a)确定针对样本道路图像中锚盒尺寸的待聚类数量;
在目标检测领域,锚盒(anchor box)就是从训练集中真实框(ground truth)中统计或聚类得到的几个不同尺寸的框;锚盒其实就是对预测的对象范围进行约束,并加入了尺寸先验经验,从而实现多尺度学习的目的。在本发明实施例中,由于希望加入更细粒度的特征提取尺度,需要利用聚类方式对样本道路图像中已经标注好的各个目标框(也就是真实框)的尺寸进行聚类,以得到适合本发明实施例场景的合适的锚盒尺寸。
其中,确定针对样本道路图像中锚盒尺寸的待聚类数量,包括:
确定每个尺度对应的锚盒尺寸的种类数;将每个尺度对应的锚盒尺寸的种类数与x的乘积,作为样本道路图像中锚盒尺寸的待聚类数量。
具体的,在本发明实施中,选择每个尺度对应的锚盒尺寸的种类数为3;有4个尺度,那么,得到的样本道路图像中锚盒尺寸的待聚类数量=3×4=12。
b)获取已标注目标框尺寸的若干样本道路图像;
该步骤实际是获取样本道路图像中各个目标框的尺寸。
c)基于已标注目标框尺寸的若干样本道路图像,利用K-Means聚类方法,获得样本道路图像中锚盒尺寸的聚类结果;
具体的,可以将各个目标框的尺寸利用K-Means聚类方法进行聚类,获得锚盒尺寸的聚类结果;关于聚类过程在此不再赘述。
其中,对于不同锚盒距离的定义即为其宽高的欧式距离:
Figure PCTCN2021130684-appb-000001
其中,d 1,2代表两个锚盒的欧氏距离,w 1,w 2代表锚盒的宽,h 1,h 2代表锚盒的高。
针对待聚类数量为12,锚盒尺寸的聚类结果可以为:(13,18)、(20,27)、(26,40)、(38,35)、(36,61)、(56,45)、(52,89)、(70,61)、(85,89)、(69,155)、(127,112)、(135,220)。具体的:
针对预测支路Y 1的锚盒尺寸:(69,155)、(127,112)、(135,220);
针对预测支路Y 2的锚盒尺寸:(52,89)、(70,61)、(85,89);
针对预测支路Y 3的锚盒尺寸:(38,35)、(36,61)、(56,45);
针对预测支路Y 4的锚盒尺寸:(13,18)、(20,27)、(26,40);
d)将聚类结果写入道路图像检测网络的配置文件中。
本领域技术人员可以理解的是,将聚类结果按照不同预测支路对应的锚盒尺寸,写入道路图像检测网络的各预测支路的配置文件中,之后可以进行网络预训练。
(四),利用各样本道路图像,以及各样本道路图像对应目标的位置和类别,预训练搭建好的网络,包括以下步骤:
1)将每一样本道路图像对应目标的位置和类别作为该样本道路图像对应的真值,将各样本道路图像和对应的真值,通过搭建好的网络进行训练,获得各样本道路图像的 训练结果。
2)将每一样本道路图像的训练结果与该样本道路图像对应的真值进行比较,得到该样本道路图像对应的输出结果。
3)根据各个样本道路图像对应的输出结果,计算网络的损失值。
4)根据损失值,调整网络的参数,并重新进行1)-3)步骤,直至网络的损失值达到了一定的收敛条件,也就是损失值达到最小,这时,意味着每一样本道路图像的训练结果与该样本道路图像对应的真值一致,从而完成网络的预训练,得到一个准确率较高的复杂网络。
(五),网络剪枝和网络微调;该过程也就是进行剪枝及结合知识蒸馏引导网络恢复处理。
进行剪枝及结合知识蒸馏引导网络恢复处理,包括:
①对YOLOv3网络基础上将主干网络中的残差模块改为密集连接模块、增加特征提取尺度、优化FPN网络的特征融合方式后得到的网络(即前述的复杂网络)中,主干网络的密集连接模块进行层剪枝,得到YOLOv3-1网络;
通常在对YOLOv3网络的简化处理过程中会直接进行通道剪枝,但是发明人在实验中发现,仅通过通道剪枝仍难达到速度快速提升的效果。因此在通道剪枝前加入了层剪枝的处理过程。
具体的,该步骤可以对前述复杂网络中,主干网络的密集连接模块进行层剪枝,即将密集连接模块中密集连接单元的数量m进行层剪枝,将m减小为2,得到YOLOv3-1网络。
②对YOLOv3-1网络进行稀疏化训练,得到BN层缩放系数稀疏分布的YOLOv3-2网络;
示例性的,将YOLOv3-1网络进行稀疏化训练,得到了BN层缩放系数稀疏分布的YOLOv3-2网络;可以包括:
将YOLOv3-1网络进行稀疏化训练,训练过程中,为缩放因子γ添加稀疏正则化,稀疏训练的损失函数为:
Figure PCTCN2021130684-appb-000002
其中,
Figure PCTCN2021130684-appb-000003
表示网络原始的损失函数,(x,y)表示训练过程的输入数据和目标数据,W表示可训练的权重,
Figure PCTCN2021130684-appb-000004
为比例系数添加的正则项,g(γ)是对比例系数进行稀疏训练的惩罚函数,λ为权重。由于要使得比例系数γ具有稀疏性,惩罚函数选择L1范数。同时,由于不知道后一项所占比重,引入λ参数进行调整。
由于λ的取值与稀疏训练的收敛速度相关,本发明实施例的应用场景为道路目标检测场景,其待检测的目标的种类数量可以设定为13种,远远小于原YOLOv3数据集中的80个种类,所以λ的取值可以选取较大的λ值,稀疏训练的收敛速度也不会很慢,同 时也可以通过提高模型学习率的方法来进一步加快收敛;但是考虑到参数选取过大又会对网络模型的精度造成一定的损失,在不断调整学习率和λ参数后,最后确定将学习率为0.25×,λ为0.1×的组合作为稀疏化训练的优选参数组合。本发明实施例优选的学习率与权重组合对于系数训练过后的权重的分布更有利,且网络模型的精度也较高。
③将YOLOv3-2网络进行通道剪枝,得到YOLOv3-3网络;
在稀疏化训练之后,得到了一个BN层缩放系数稀疏分布的网络模型,这就便于确定哪些通道的重要性更小。由此,可以剪去这些不太重要的通道,剪枝的方法是删除传入和传出连接以及相应的权重。
对网络进行通道剪枝操作,修剪一个通道基本上对应于删除该通道的所有传入和传出连接,可以直接获得一个轻量化的网络,而不需要使用任何特殊的稀疏计算包。通道剪枝过程中,缩放因子充当频道选择的代理;由于它们与网络权重共同优化,因此网络可以自动识别无关紧要的通道,这些通道可以安全地移除而不会极大地影响泛化性能。
具体的,该步骤可以包括以下步骤:
在所有层的所有通道中设定一个通道剪枝比例,然后将YOLOv3-2网络中所有的BN层缩放系数按照升序排列,按通道剪枝比例剪掉排在前面的BN层缩放系数对应的通道。
优选的实施方式中,通道剪枝比例可以为60%。
通过通道剪枝,可以删去冗余的通道,减少计算量,加快检测的速度。
但是在通道剪枝后,可能会由于参数减少而带来一些精度的下降,从不同的剪枝比例对网络精度的影响进行分析,如果网络剪枝比例过大,网络体积压缩更多,但也会造成网络精度的剧烈下降,会给网络准确率造成一定的损失,由此需要进行一个网络压缩比例与压缩后网络精度的权衡,因此引入知识蒸馏策略对网络进行微调,以使网络准确率回升。
④将YOLOv3-3网络进行知识蒸馏,得到改进型YOLOv3网络。
经过剪枝,获得了一个更为紧凑的YOLOv3-3网络模型,然后需要进行微调使得精度恢复。这里引入了知识蒸馏的策略。
具体的,对YOLOv3-3网络引入知识蒸馏,将前述复杂网络作为教师网络,YOLOv3-3网络作为学生网络,由教师网络引导学生网络进行精度恢复和调整,得到改进型YOLOv3网络。
作为一种优选的实施方式,可以前述复杂网络的Softmax层之前的输出结果除以温度系数,来使得教师网络最终输出的预测值软化,而后学生网络利用软化后的预测值作为标签来辅助训练YOLOv3-3网络,最终实现YOLOv3-3网络的精度与和教师网络相当;其中,温度系数是预先设定的值,不随网络训练发生变化。
引入温度参数T的原因是,一个完成训练、精度很高的网络对于输入数据的分类结果,和真实的标签是基本一致的。以三分类为例,真实已知的训练类别标签是[1,0,0], 预测结果可能会是[0.95,0.02,0.03],和真实的标签值是非常逼近的。因此,对于学生网络来讲,使用教师网络的分类结果辅助训练和直接利用数据进行训练,差别不大。温度参数T可以用来控制预测标签的软化程度,即可以增加教师网络分类结果的偏差。
将添加了知识蒸馏策略的微调过程和一般的微调过程进行对比,经过知识蒸馏调整恢复的网络精度更高。
通过对预训练的网络进行层剪枝、稀疏化训练、通道剪枝,以及知识蒸馏处理,并在各个处理过程选取优化的处理参数,得到精简的网络,该网络体积大幅缩小,摒除了大部分的冗余计算,经过该步骤后得到的网络即为后续对目标道路图像进行检测的改进型YOLOv3网络,基于该网络的检测速度可以得到大幅度的提升,而且能够维持检测的精度。可以满足对检测实时性高的要求,由于网络体积小,对资源需求小,完全可以部署在边缘设备上进行,即部署在车载设备上。车载设备可以为放置于车内的设备,如导航仪,手机等。
S4,基于所有预测结果,得到目标道路图像的属性信息,属性信息包括目标道路图像中目标的位置和类别;
改进型YOLOv3网络还包括分类网络和非极大值抑制模块;分类网络和非极大值抑制模块串接在FPN网络之后。
基于所有预测结果,得到目标道路图像的属性信息,包括:
对所有预测结果经由分类网络进行分类处理,再经由非极大值抑制模块进行预测框去重处理,得到目标道路图像的属性信息;
其中,分类网络包括SoftMax分类器。目的是实现多个车辆类别互斥的分类。可选的,分类网络也可以沿用YOLOv3网络的logistic回归进行分类,以实现多个独立的二分类。
非极大值抑制模块用于进行NMS(non_max_suppression,非极大值抑制)处理。用于在重复框选同一目标的多个预测框中,排除置信度相对较小的预测框。
关于分类网络和非极大值抑制模块的内容,可以参见现有技术相关说明,在此不再赘述。
针对每一目标,检测结果的形式为一向量,包含预测框的位置、预测框内车辆的置信度、预测框内目标的类别。预测框的位置用来表征目标在目标道路图像中的位置;具体的,每个预测框的位置用bx,by,bw,bh四个值表示,bx,by用于表示预测框的中心点位置,bw,bh为用于表示预测框的宽和高。比如道路中有1辆公交车、5辆小轿车和2个行人,它们分别位于目标道路图像的不同位置,公交车位于图像以左上角为原点,横向230像素,纵向180像素,公交车在图像中宽20,高50,那么其属性信息可以“230,180,20,50,bus”。
目标的类别为目标所属物体的种类,比如人、动物、建筑物、车辆、标志牌等等。
在可选的一种实施方式中,目标可以仅为车辆,类别可以包括小汽车、单层公交车、双层公交车、大卡车、面包车、自行车和摩托车等。
在可选的一种实施方式中,得到属性信息之后,方法还可以包括:
输出属性信息。
一种实施方式中,可以显示属性信息,包括:在车载设备上显示属性信息。
具体的,可以将属性信息显示在车内的显示屏上,显示屏可以为导航装置的显示屏幕或者驾驶者手机的显示屏幕。可以直接将标注有属性信息的目标道路图像显示在车内的显示屏上,使得车内的驾驶员可以直接观察属性信息,以便于了解目标道路图像中显示的各个目标的位置和类别,这样,在远处的驾驶者就可以获得他视距范围外的道路情况,提前做出适当的驾驶行为,例如减速慢行、路线规划、物体避让等,实现安全驾驶的目的。当然,也可以将属性信息以其他文本的形式进行显示,这都是合理的。
另一种实施方式中,可以将属性信息以语音形式进行播放,以便于驾驶员在驾驶状态,不方便观看图像的情况下,也能够较为方便地接收到属性信息,有利于安全驾驶。当然,上述两种方式可以结合。
可选的,在车载设备上显示属性信息,可以包括:
基于目标的位置和/或类别确定目标是否属于待提醒的小目标;
如果是,在车载设备上采用提醒模式显示属性信息;如果否,在车载设备上采用一般模式显示属性信息。
在该种实施方式中,可以针对小目标进行特别提醒,比如,根据目标的位置,可以确定目标所在的预测框的尺寸,判断一预测框的尺寸是否小于预设的预测框尺寸,如果是,则可以确定该目标属于待提醒的小目标;或者,可以提前针对目标的类别进行划分,将标志牌等一些明显属于较小的物体的类别预设为小目标类别,通过判断一目标的类别是否属于预设的小目标类别,来确定该目标是否属于待提醒的小目标,当然,为了精确确定小目标,可以结合目标的位置和类别来确定待提醒的小目标。
如果一目标属于待提醒的小目标,则可以在车载设备上采用提醒模式显示属性信息;比如在目标道路图像上采用颜色鲜艳的字体进行标注,或者以闪烁的形式进行标注,或者辅以语音提示,等等。当然,可以采用多种提醒方式的结合。
如果一目标不属于待提醒的小目标,则可以在车载设备上采用一般模式显示属性信息,即对所有目标采用一致的模式,该模式在此不再赘述。
在可选的一种实施方式中,得到属性信息之后,本方法还可以包括:
基于属性信息进行反馈。
具体的,驾驶员得到属性信息后,可以将属性信息发送给图像采集端或者其他车辆、行人等,以便于车联网系统内的多个终端可以获得该属性信息实现信息统计、安全驾驶等目的,进一步的,在本车辆发送该信息时,可以携带本车辆的当前位置信息,比如通过GPS(Global Positioning System,全球定位系统)得到的坐标信息,以及当前时间信息,以便于接收方对路况信息有更清晰的了解。
在可选的一种实施方式中,可以获取预定时间段内的多张目标道路图像进行目标检 测,利用同一目标的位置和类别实现目标轨迹追踪,等等。
并且,原有的YOLOv3网络中含有较多的卷积层,原因在于其针对的目标的类别较多,有80种。而在本发明实施例中,目标主要为道路上的物体,目标的类别数较少,那么,大量的卷积层是没有必要的,这样会浪费网络资源,降低处理速度。
因此,如前所述,相比原有YOLOv3网络中主干网络的多个残差模块含有的卷积层数量,改进型YOLOv3网络中,通过将密集连接模块中含有的密集连接单元数量设置为2,可以在不影响网络精度的情况下,针对本发明实施例的目标道路图像,减少主干网络中卷积层的数量。
另外,可选的,改进型YOLOv3网络还可以是针对FPN网络中每个预测支路的卷积网络模块组中的k的数值进行调整后得到的,即将k从原有YOLOv3网络中的5减少为4或3,也就是将原有的CBL*5改为CBL*4或者CBL*3;这样也可以减少FPN网络中卷积层的数量,在不影响网络精度的情况下,针对本发明实施例的目标道路图像,整体实现网络层数精简,提升网络处理速度。
本发明实施例所提供的方案中,一方面,将YOLOv3网络的主干网络中的残差模块更换为密集连接模块,将特征融合方式从并行改为了串行,使得主干网络在进行特征提取时,能够直接将早期的特征图作为后面每一层的输入,获得的特征图的信息量更多,强化了特征的传递,因此在进行目标道路图像检测时,能够提高检测精度。并且,通过复用浅层网络的特征图参数能够减少参数的数量以及运算量。另一方面,采用多个特征提取尺度,为小目标增加细粒度的特征提取尺度,能够提升目标道路图像中小目标的检测精度。又一方面,改变FPN网络的特征融合方式,对主干网络提取的特征图采用自顶向下、密集连接的方式进行特征融合,将深层特征直接进行不同倍数的上采样,以此来使得传递的所有特征图具有相同的尺寸,将这些特征图和浅层的特征图通过串联的方式融合起来,可以利用到更多的原始信息,在浅层网络中也有高维语义信息的参与,有助于提高检测的精度;同时通过直接接收更浅层网络的特征,可以得到更加具体的特征,将有效的减少特征的损失,可以减少需要运算的参数量,提高检测速度,实现实时检测。再一方面,通过对预训练的网络进行层剪枝、稀疏化训练、通道剪枝,以及知识蒸馏处理,并在各个处理过程选取优化的处理参数,可以精简网络体积,摒除大部分的冗余计算,能够在维持检测精度的情况下大幅度提升检测速度。本发明将云端的检测过程部署在了存储资源和计算资源都非常有限的边缘设备当中,由车载设备就可以实现超视距道路检测,可以对道路上的目标实现高精度和高实时性检测,有利于驾驶者进行安全驾驶。
以下结合发明人的实验过程,对本发明实施例的网络改进及对道路图像检测性能进行说明,以便于深入了解其性能。
本发明选用了UA-DETRAC数据集进行实验。UA-DETRAC数据集的拍摄地点是在北京和天津的道路过街天桥,拍摄设备是Cannon EOS550D,视频帧率为25FPS,数据格式是JPEG,图像尺寸是960×540。数据集当中包含了60个视频,分别在晴天、 阴天、雨天以及夜晚进行拍摄,包含了不同气象下的数据。图像总数是82085,对目标进行了标注。这些标注都是手动标注出来的,因此标注数据比较准确。每个视频中的所有图像在同一个文件夹下顺序编号,每个视频当中所有图像的标注数据都记录在和视频文件夹同名的XML文件当中。
为了使得数据分布更加具有随机性,让模型的泛化能力得到充分的提升,使用随机抽样的方法来抽取数据集当中的数据。整个数据集共包含82,085张图片,本文将从中抽取10,000张图片来进行实验。并且按照4:1的比例来分配训练集和测试集。为了保证训练集和测试集不包含相同的图片,要在抽取到的10,000张图片中,再次进行随机抽取,进行数据集分配。另外,训练YOLO网络需要使用VOC格式或者COCO格式的数据,即用五个数字来表示框选物体的种类、左上角的位置和物体的长、宽,并将这些数据存储在文本文档当中。所以这里用Python脚本进行数据集标注格式的转换,同时对数据集当中目标的种类及比例进行统计。
本发明实施例将YOLOv3网络主干网络中的残差模块更换为密集连接模块并改进过渡模块之后的网络命名为Dense-YOLO-1;Dense-YOLO-1网络的结构请参见图2网络,以及图3的主干网络进行理解,此处不再赘述。将Dense-YOLO-1与YOLOv3网络进行测试。选择模型的mAP(Mean Average Precision,平均精度均值)作为评估对象。mAP的值位于0和1之间,mAP越大,就说明模型精度越好。当然,还要参考模型损失loss曲线,观察模型的收敛情况。其中损失函数的构造仍然按照YOLOv3的损失函数。网络的体积和检测的速度也是需要考虑的,所以要记录不同网络的模型文件大小,以及不同模型分别在服务器Tesla V100和边缘设备Jetson TX2平台上进行道路图像检测的时间。请参见图5-1和图5-2,图5-1为YOLOv3和本发明实施例的Dense-YOLO-1的mAP曲线对比图;图5-2为YOLOv3和本发明实施例的Dense-YOLO-1的loss曲线对比图;从图中可以看到,Dense-YOLO-1的网络精度提升了4%左右,模型的损失函数差别极其细微,所以这里利用半对数坐标来放大它们之间的差异,可以看到,Dense-YOLO-1的loss比YOLOv3稍低。因此,从精度和损失曲线当中可以看到,将YOLOv3当中的残差结构更更换成为密集连接,并将密集连接模块之间的过渡模块进行改进,可以对网络性能有很大的提升。
表1YOLOv3和Dense-YOLO-1网络模型的体积及其在不同平台的检测时间
网络 模型大小 Tesla V100上的检测时间 Jetson TX2上的检测间
YOLOv3 236M 42.8ms 221.1ms
Dense-YOLO-1 131M 39.0ms 214.7ms
网络在不同平台进行道路图像检测的时间如表1所示,可以看到,给网络增加密集连接可以减小网络的体积,减少检测所需时间。
在Dense-YOLO-1基础上,多尺度的一个改进思路是,为YOLO v3增加一个更细粒度的目标检测的尺度,让网络能够检测更小的物体。本发明实施例具体增加了104× 104的尺度,并设定相应的锚盒尺寸,将得到的网络命名为MultiScale-YOLO-1。网络结构请结合图2和图3理解,不再赘述。Dense-YOLO-1和MultiScale-YOLO-1网络的mAP和loss曲线如图6-1和图6-2所示。可以看到,多尺度网络相对于密集连接的网络而言,有所提升,但是变化并不明显,只有7%左右,loss曲线的差异依然不明显。这可能是因为数据集当中小尺寸的物体数量不多,对于细粒度的识别需求并不强烈,增加更加细致的物体检测粒度对于网络精度的增益不明显。对此,一方面可以寻找对小目标标注更加细致的数据集,让网络在训练过程中就能进行更细粒度的训练,在识别过程中也就能够识别更微小的物体了。当然,如果要求较高,在时间和精力充足并且没有合适数据集的情况下,可以自行标注数据集。
在Dense-YOLO-1基础上,多尺度的另外一个改进思路是,从特征融合的方法下手,尝试通过改进特征融合的方法,让检测过程融合更多维度的语义信息,由此来提高目标识别精度。因此对FPN网络的特征融合方式进行改进,采用自顶向下,密集连接形式的融合方式,将得到的网络命名为Dense-YOLO-2。网络结构不再示出。Dense-YOLO-1和Dense-YOLO-2网络的mAP和loss曲线如图7-1和图7-2所示。改变了融合方式,添加了自顶向下的密集连接特征融合方法的多尺度特征融合网络当中,多尺度的优势体现的更为明显,这可能是因为由于密集连接的特征融合方法当中,保留了比横向连接更多的高维度抽象语义信息,让模型对于物体的判别更加清晰。改变融合方式之后的网络精度,比原来提升了18.2%,loss曲线也比先前出现了一些降低。根据以上曲线图,可以看出融合方式的改进对于网络精度的提升十分明显。
综合考虑在Dense-YOLO-1基础上将上述两种多尺度改进方法结合,不仅利用多尺度的特征融合模型,增大网络的视野,提升不同尺度物体的定位准确性,而且使用自顶向下的密集连接方法来更加充分融合高维语义信息,使得网络对不同物体的分类效果增强,将最终得到的网络结构命名为MultiScale-YOLO-2,结构不再示出。该网络的精度和loss与Dense-YOLO-1的对比如图8-1和8-2所示。可以看到,相对于Dense-YOLO-1,该更细粒度视野的密集融合网络结构的精度提升了24.5%,loss曲线也进一步下降,这表明这样的改进方法是行之有效的。
作为希望在车联网当中使用的神经网络模型,网络应当具有较小的参数体积和较快的检测速度。进行多尺度改进后的网络模型的体积及其在不同平台上进行道路图像检测的时间如表2所示。
表2多尺度改进的网络模型的体积及其在不同平台的检测时间
网络 模型大小 Tesla V100上的检测时间 Jetson TX2上的检测时间
Dense-YOLO-2 489M 35.1ms 300.0ms
MutiScale-YOLO-1 132M 41.2ms 243.4ms
MutiScale-YOLO-2 491M 44.8ms 350.6ms
相对于表1中给出的Dense-YOLO的各项参数,虽然增加更细粒度的视野对网络 的精度增益比较小,但是其对网络参数体积和检测时间的影响更是微乎其微,因此,本文选择使用细粒度的视野。同时,使用密集的特征融合方式对引起了网络体积的增加,但是网络的检测时间受影响不大,后期还要进行网络的裁剪,因此,也保留密集连接的特征融合方式。根据上面的分析,最终选用MultiScale-YOLO-2作为改进后的网络。该网络也就是前文的复杂网络。
针对于稀疏化训练,学习率和μ可以此消彼长的进行调整,以保证收敛速度和精度。本方案尝试不同学习率和μ的取值,如表3所示。通过比较γ权重分布情况图,最终选取参数组合5。参数组合5的γ权重分布图请参见图9-1和9-2。图9-1为参数组合5的权重偏移图;图9-2为参数组合5的权重交叠图;
表3不同的学习率和λ组合
组合 学习率 λ
1
2 0.1×
3 0.1×
4 0.025×
5 0.25× 0.1×
事实上,本文最初的实验设计,并不包含对网络层的剪枝过程,原计划直接进行通道剪枝。但是,根据通道剪枝的结果分析,发现有超过半数的密集连接层的权重都很接近于0,因此按照通道剪枝的规则,整层的通道都将被剪去。这就表明,在前面设计的4个密集连接单元一组的密集连接模块中,存在有冗余的单元,因此在通道剪枝前,可以进行层剪枝来大幅降低冗余,而后再做相对更细粒度的通道剪枝。由于超过半数的密集连接单元都是冗余的单元,层剪枝的做法是,将密集连接块由4个密集连接单元一组,改为2个一组,简化网络结构,同时能够将网络的参数量和运算量减小接近一半。将经过层剪枝后的网络命名MultiScale-YOLO-3网络,也可以简称为YOLOv3-1网络。
之后,对YOLOv3-1网络进行稀疏化训练,得到BN层缩放系数稀疏分布的YOLOv3-2网络;
将YOLOv3-2网络进行通道剪枝,得到YOLOv3-3网络;
通道剪枝比例可以为60%。这是由于待检测的目标道路图像中数量较少的目标种类在网络压缩过程中受到影响比较大,这就会直接影响mAP,因此,要从数据集和网络压缩比例方面来考虑。对数据集的处理,本发明的实施例选择合并数量较少的目标的种类来使得不同种类的数量达到均衡,或者是直接采用种类分布更加均衡的数据集,这与本发明实施例的应用场景相吻合。另外就是控制压缩比例,保证数量较少的种类的预测精度不会下降太多。根据mAP仿真结果来看,50%-60%的压缩比例是精度变化的转折点,因此可以初步选择60%的压缩比例。
除了从精度来分析压缩的影响外,还要考虑检测时间和模型压缩比例的关系,通过 对不同剪枝比例处理的网络模型不同平台上(在Tesla V100服务器和Jetson TX2边缘设备)的上进行道路图像检测的时间进行仿真,根据仿真结果可以发现,不同网络压缩比例对于检测的时间影响很微弱,而对于NMS(非极大值抑制)所需时间影响较大,在压缩比例达到60%之前,检测速度随着网络压缩而加快,但是压缩比例超过60%后,检测速度反而出现了减慢。由此,最终选定通道剪枝比例为60%。
将YOLOv3-3网络进行知识蒸馏,得到改进型YOLOv3网络。
其中,前述的复杂网络,即MultiScale-YOLO-2网络,作为教师网络。
将最终得到的网络,即改进型YOLOv3网络命名为YOLO-Terse。
将YOLO-Terse与YOLOv3进行性能对比,结果请参见图10,图10为本发明实施例的改进型YOLOv3网络(YOLO-Terse)与YOLOv3网络的性能对比图。可以看出YOLO-Terse的精度比YOLOv3提升了9.0%,而模型大小却减小了72.9%,在Tesla V100和JetsonTX2上的检测时间分别减少了18.9%和15.3%。这说明在精度有部分提升的情况下,大幅缩小了模型体积,提高道路图像的检测速度。
第二方面,相应于上述方法实施例,本发明实施例还提供了一种车载电子设备,如图11所示,包括处理器1101、通信接口1102、存储器1103和通信总线1104,其中,处理器1101、通信接口1102、存储器1103通过通信总线1104完成相互间的通信,
存储器1103,用于存放计算机程序;
处理器1101,用于执行存储器1103上所存放的程序时,实现前述任意一种基于车联网的道路检测方法的步骤。
上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral ComponentInterconnect,PCI)总线或扩展工业标准结构(Extended Industry StandardArchitecture,EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
通信接口用于上述电子设备与其他设备之间的通信。
存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital SignalProcessing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
以上仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本发明的保护范围内。

Claims (10)

  1. 一种基于车联网的道路检测方法,其特征在于,应用于车载端,包括:
    获取图像采集端拍摄的目标道路图像;
    将所述目标道路图像输入预先训练得到的改进型YOLOv3网络中,利用密集连接形式的主干网络对所述目标道路图像进行特征提取,得到x个不同尺度的特征图;x为大于等于4的自然数;
    利用改进型FPN网络对所述x个不同尺度的特征图进行自顶向下、密集连接方式的特征融合,得到各尺度对应的预测结果;
    基于所有预测结果,得到所述目标道路图像的属性信息,所述属性信息包括所述目标道路图像中目标的位置和类别;
    其中,所述改进型YOLOv3网络包括所述密集连接形式的主干网络、所述改进型FPN网络;所述改进型YOLOv3网络是在YOLOv3网络基础上,将主干网络中的残差模块更换为密集连接模块、增加特征提取尺度、优化FPN网络的特征融合方式,以及进行剪枝及结合知识蒸馏引导网络恢复处理后形成的;所述改进型YOLOv3网络是根据样本道路图像,以及所述样本道路图像对应的目标的位置和类别训练得到的。
  2. 根据权利要求1所述的方法,其特征在于,所述密集连接形式的主干网络,包括:
    间隔串接的密集连接模块和过渡模块;所述密集连接模块的数量为y;所述密集连接模块包括串行连接的卷积网络模块和密集连接单元组;所述卷积网络模块包括串行连接的卷积层、BN层、Leaky relu层;所述密集连接单元组包括m个密集连接单元;每个密集连接单元包括多个采用密集连接形式连接的所述卷积网络模块,并采用级联方式融合多个卷积网络模块输出的特征图;其中,y为大于等于4的自然数,m为大于1的自然数。
  3. 根据权利要求2所述的方法,其特征在于,所述得到x个不同尺度的特征图,包括:
    得到沿输入逆向的x个密集连接模块输出的、尺度依次增大的x个特征图。
  4. 根据权利要求2所述的方法,其特征在于,所述过渡模块包括所述卷积网络模块和最大池化层;所述卷积网络模块的输入和所述最大池化层的输入共用,所述卷积网络模块输出的特征图和所述最大池化层输出的特征图采用级联方式融合。
  5. 根据权利要求4所述的方法,其特征在于,所述过渡模块包括的所述卷积网络模块的数量为两个或三个,且各个卷积网络模块之间采用串接方式。
  6. 根据权利要求3所述的方法,其特征在于,所述利用改进型FPN网络对所述x个不同尺度的特征图进行自顶向下、密集连接方式的特征融合,包括:
    针对预测支路Y i,从所述x个特征图中,获取对应尺度的特征图并进行卷积处理;
    将卷积处理后的特征图,与预测支路Y i-1~Y 1分别经上采样处理后的特征图进行级联融合;
    其中,所述改进型FPN网络包括尺度依次增大的x个预测支路Y 1~Y x;所述预测支路Y 1~Y x的尺度与所述x个特征图的尺度一一对应;预测支路Y i-j的上采样倍数为2 j;i=2、3,…,x;j为小于i的自然数。
  7. 根据权利要求2所述的方法,其特征在于,所述进行剪枝及结合知识蒸馏引导网络恢复处理,包括:
    对YOLOv3网络基础上将主干网络中的残差模块改为密集连接模块、增加特征提取尺度、优化FPN网络的特征融合方式后得到的网络中,主干网络的密集连接模块进行层剪枝,得到YOLOv3-1网络;
    对所述YOLOv3-1网络进行稀疏化训练,得到BN层缩放系数稀疏分布的YOLOv3-2网络;
    将所述YOLOv3-2网络进行通道剪枝,得到YOLOv3-3网络;
    将所述YOLOv3-3网络进行知识蒸馏,得到所述改进型YOLOv3网络。
  8. 根据权利要求1所述的方法,其特征在于,对所述改进型YOLOv3网络进行训练之前还包括:
    确定针对样本道路图像中锚盒尺寸的待聚类数量;
    获取已标注目标框尺寸的若干样本道路图像;
    基于已标注目标框尺寸的若干样本道路图像,利用K-Means聚类方法,获得样本道路图像中锚盒尺寸的聚类结果;
    将所述聚类结果写入所述改进型YOLOv3网络的配置文件中。
  9. 根据权利要求1所述的方法,其特征在于,所述改进型YOLOv3网络还包括分类网络和非极大值抑制模块;
    所述基于所有预测结果,得到所述目标道路图像的属性信息,包括:
    对所有预测结果经由所述分类网络进行分类处理,再经由所述非极大值抑制模块进行预测框去重处理,得到所述目标道路图像的属性信息;
    其中,所述分类网络包括SoftMax分类器。
  10. 一种车载电子设备,其特征在于,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;
    存储器,用于存放计算机程序;
    处理器,用于执行存储器上所存放的程序时,实现权利要求1-9任一所述的方法步骤。
PCT/CN2021/130684 2020-10-23 2021-11-15 一种基于车联网的道路检测方法 WO2022083784A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/564,524 US20230154202A1 (en) 2020-10-23 2021-12-29 Method of road detection based on internet of vehicles

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011147522.6A CN112380921A (zh) 2020-10-23 2020-10-23 一种基于车联网的道路检测方法
CN202011147522.6 2020-10-23

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/564,524 Continuation US20230154202A1 (en) 2020-10-23 2021-12-29 Method of road detection based on internet of vehicles

Publications (1)

Publication Number Publication Date
WO2022083784A1 true WO2022083784A1 (zh) 2022-04-28

Family

ID=74580793

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/130684 WO2022083784A1 (zh) 2020-10-23 2021-11-15 一种基于车联网的道路检测方法

Country Status (3)

Country Link
US (1) US20230154202A1 (zh)
CN (1) CN112380921A (zh)
WO (1) WO2022083784A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881227A (zh) * 2022-05-13 2022-08-09 北京百度网讯科技有限公司 模型压缩方法、图像处理方法、装置和电子设备
CN114912532A (zh) * 2022-05-20 2022-08-16 电子科技大学 一种自动驾驶汽车多源异构感知数据融合方法
CN115019071A (zh) * 2022-05-19 2022-09-06 昆明理工大学 光学图像与sar图像匹配方法、装置、电子设备及介质
CN115115974A (zh) * 2022-06-08 2022-09-27 中国船舶集团有限公司系统工程研究院 基于神经网络的智能航行态势感知系统
CN115272412A (zh) * 2022-08-02 2022-11-01 电子科技大学重庆微电子产业技术研究院 一种基于边缘计算的低小慢目标检测方法及跟踪系统
CN115272763A (zh) * 2022-07-27 2022-11-01 四川大学 一种基于细粒度特征融合的鸟类识别方法
CN115359360A (zh) * 2022-10-19 2022-11-18 福建亿榕信息技术有限公司 一种电力现场作业场景检测方法、系统、设备和存储介质
CN116343063A (zh) * 2023-05-26 2023-06-27 南京航空航天大学 一种路网提取方法、系统、设备及计算机可读存储介质
CN116563800A (zh) * 2023-04-26 2023-08-08 北京交通大学 基于轻量化YOLOv3的隧道内车辆检测方法及系统
CN116665188A (zh) * 2023-07-20 2023-08-29 南京博融汽车电子有限公司 一种大客车图像系统数据分析方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380921A (zh) * 2020-10-23 2021-02-19 西安科锐盛创新科技有限公司 一种基于车联网的道路检测方法
CN112949500A (zh) * 2021-03-04 2021-06-11 北京联合大学 一种基于空间特征编码改进的YOLOv3车道线检测方法
CN112949604A (zh) * 2021-04-12 2021-06-11 石河子大学 一种基于深度学习的主动悬架智能控制方法及装置
CN113177937B (zh) * 2021-05-24 2022-09-13 河南大学 基于改进YOLOv4-tiny的布匹缺陷检测方法
CN113592784A (zh) * 2021-07-08 2021-11-02 浙江科技学院 一种基于轻量级卷积神经网络检测路面病害的方法及装置
CN116342894B (zh) * 2023-05-29 2023-08-08 南昌工程学院 基于改进YOLOv5的GIS红外特征识别系统及方法
CN116612379B (zh) * 2023-05-30 2024-02-02 中国海洋大学 一种基于多知识蒸馏的水下目标检测方法及系统
CN116416626B (zh) * 2023-06-12 2023-08-29 平安银行股份有限公司 圆形印章数据的获取方法、装置、设备及存储介质
CN117218129B (zh) * 2023-11-09 2024-01-26 四川大学 食道癌图像识别分类方法、系统、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815886A (zh) * 2019-01-21 2019-05-28 南京邮电大学 一种基于改进YOLOv3的行人和车辆检测方法及系统
AU2019101142A4 (en) * 2019-09-30 2019-10-31 Dong, Qirui MR A pedestrian detection method with lightweight backbone based on yolov3 network
CN111401148A (zh) * 2020-02-27 2020-07-10 江苏大学 一种基于改进的多级YOLOv3的道路多目标检测方法
CN111553406A (zh) * 2020-04-24 2020-08-18 上海锘科智能科技有限公司 基于改进yolo-v3的目标检测系统、方法及终端
CN112380921A (zh) * 2020-10-23 2021-02-19 西安科锐盛创新科技有限公司 一种基于车联网的道路检测方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815886A (zh) * 2019-01-21 2019-05-28 南京邮电大学 一种基于改进YOLOv3的行人和车辆检测方法及系统
AU2019101142A4 (en) * 2019-09-30 2019-10-31 Dong, Qirui MR A pedestrian detection method with lightweight backbone based on yolov3 network
CN111401148A (zh) * 2020-02-27 2020-07-10 江苏大学 一种基于改进的多级YOLOv3的道路多目标检测方法
CN111553406A (zh) * 2020-04-24 2020-08-18 上海锘科智能科技有限公司 基于改进yolo-v3的目标检测系统、方法及终端
CN112380921A (zh) * 2020-10-23 2021-02-19 西安科锐盛创新科技有限公司 一种基于车联网的道路检测方法

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881227B (zh) * 2022-05-13 2023-07-04 北京百度网讯科技有限公司 模型压缩方法、图像处理方法、装置和电子设备
CN114881227A (zh) * 2022-05-13 2022-08-09 北京百度网讯科技有限公司 模型压缩方法、图像处理方法、装置和电子设备
CN115019071A (zh) * 2022-05-19 2022-09-06 昆明理工大学 光学图像与sar图像匹配方法、装置、电子设备及介质
CN115019071B (zh) * 2022-05-19 2023-09-19 昆明理工大学 光学图像与sar图像匹配方法、装置、电子设备及介质
CN114912532A (zh) * 2022-05-20 2022-08-16 电子科技大学 一种自动驾驶汽车多源异构感知数据融合方法
CN114912532B (zh) * 2022-05-20 2023-08-25 电子科技大学 一种自动驾驶汽车多源异构感知数据融合方法
CN115115974A (zh) * 2022-06-08 2022-09-27 中国船舶集团有限公司系统工程研究院 基于神经网络的智能航行态势感知系统
CN115272763A (zh) * 2022-07-27 2022-11-01 四川大学 一种基于细粒度特征融合的鸟类识别方法
CN115272763B (zh) * 2022-07-27 2023-04-07 四川大学 一种基于细粒度特征融合的鸟类识别方法
CN115272412A (zh) * 2022-08-02 2022-11-01 电子科技大学重庆微电子产业技术研究院 一种基于边缘计算的低小慢目标检测方法及跟踪系统
CN115272412B (zh) * 2022-08-02 2023-09-26 电子科技大学重庆微电子产业技术研究院 一种基于边缘计算的低小慢目标检测方法及跟踪系统
CN115359360A (zh) * 2022-10-19 2022-11-18 福建亿榕信息技术有限公司 一种电力现场作业场景检测方法、系统、设备和存储介质
CN116563800A (zh) * 2023-04-26 2023-08-08 北京交通大学 基于轻量化YOLOv3的隧道内车辆检测方法及系统
CN116343063B (zh) * 2023-05-26 2023-08-11 南京航空航天大学 一种路网提取方法、系统、设备及计算机可读存储介质
CN116343063A (zh) * 2023-05-26 2023-06-27 南京航空航天大学 一种路网提取方法、系统、设备及计算机可读存储介质
CN116665188A (zh) * 2023-07-20 2023-08-29 南京博融汽车电子有限公司 一种大客车图像系统数据分析方法
CN116665188B (zh) * 2023-07-20 2023-10-10 南京博融汽车电子有限公司 一种大客车图像系统数据分析方法

Also Published As

Publication number Publication date
CN112380921A (zh) 2021-02-19
US20230154202A1 (en) 2023-05-18

Similar Documents

Publication Publication Date Title
WO2022083784A1 (zh) 一种基于车联网的道路检测方法
CN110766098A (zh) 基于改进YOLOv3的交通场景小目标检测方法
CN110348384B (zh) 一种基于特征融合的小目标车辆属性识别方法
CN111460919B (zh) 一种基于改进YOLOv3的单目视觉道路目标检测及距离估计方法
CN112487862A (zh) 基于改进EfficientDet模型的车库行人检测方法
CN112417973A (zh) 一种基于车联网的无人驾驶系统
CN112800906B (zh) 一种基于改进YOLOv3的自动驾驶汽车跨域目标检测方法
CN111428558A (zh) 一种基于改进YOLOv3方法的车辆检测方法
CN110781850A (zh) 道路识别的语义分割系统和方法、计算机存储介质
CN112364719A (zh) 一种遥感图像目标快速检测方法
CN113762209A (zh) 一种基于yolo的多尺度并行特征融合路标检测方法
CN114092917B (zh) 一种基于mr-ssd的被遮挡交通标志检测方法及系统
CN112528934A (zh) 一种基于多尺度特征层的改进型YOLOv3的交通标志检测方法
CN112364721A (zh) 一种道面异物检测方法
CN112990065A (zh) 一种基于优化的YOLOv5模型的车辆分类检测方法
CN114445430A (zh) 轻量级多尺度特征融合的实时图像语义分割方法及系统
CN114821492A (zh) 一种基于YOLOv4的道路车辆检测系统及方法
CN112819000A (zh) 街景图像语义分割系统及分割方法、电子设备及计算机可读介质
CN112395953A (zh) 一种道面异物检测系统
CN115346071A (zh) 高置信局部特征与全局特征学习的图片分类方法及系统
CN114639067A (zh) 一种基于注意力机制的多尺度全场景监控目标检测方法
CN112364864A (zh) 一种车牌识别方法、装置、电子设备及存储介质
CN112288701A (zh) 一种智慧交通图像检测方法
CN112308066A (zh) 一种车牌识别系统
CN112288702A (zh) 一种基于车联网的道路图像检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21882196

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21882196

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21882196

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10.10.2023)