CN117541881B

CN117541881B - Road damage detection method and system

Info

Publication number: CN117541881B
Application number: CN202410003960.7A
Authority: CN
Inventors: 刘美; 苏鹏; 韩惠子; 杨涛; 刘世杰
Original assignee: Guangdong University of Petrochemical Technology
Current assignee: Guangdong University of Petrochemical Technology
Priority date: 2024-01-03
Filing date: 2024-01-03
Publication date: 2024-04-16
Anticipated expiration: 2044-01-03
Also published as: CN117541881A

Abstract

The invention discloses a road damage detection method and a road damage detection system, which relate to the technical field of image detection and comprise the following steps: constructing a multi-scale channel shuffling fusion pipeline MSFTube module based on a YOLOX network architecture and on a multi-scale channel shuffling fusion convolution MSFConv, constructing an MSFLayer module based on the MSFTube module, and providing MSFHead by DSFConv; further constructing an MSF-YOLO road damage detection model; inputting image data of a road to be detected into an MSF-YOLO road damage detection model, and extracting a feature layer through an MSFTube module; the extracted feature layers are subjected to feature fusion through an MSFLayer module, the feature layers after feature fusion are input into MSFHead, and an image feature map is extracted to detect road damage; the accuracy of model detection is improved, and accidents caused by road hazards are reduced.

Description

Road damage detection method and system

Technical Field

The invention relates to the technical field of image detection, in particular to a road damage detection method and system.

Background

At present, the road damage detection mode is mainly divided into the traditional digital image processing and machine learning methods. The detection of road damage using conventional digital image processing methods, such as wavelet transform, threshold segmentation, edge detection, etc., may greatly reduce the detection accuracy due to a large number of uncertain environmental factors (e.g., fog, light, etc.).

To solve these problems, the learner tries to detect the road damage using a machine learning method, such as a new framework for road damage detection based on random structure forests and two new feature extraction methods based on region analysis and orthogonal axis feature cartesian analysis for road damage detection. In recent years, deep learning methods in machine learning have been paid attention to in various industries in their excellent performances. Many students have attempted to detect road damage using deep learning based methods such as a completely new network architecture for road damage detection.

The current target detection method based on deep learning is mainly divided into a two-stage target detection method and a one-stage target detection method. The two-stage target detection method mainly comprises R-CNN, fast R-CNN and the like, and has the advantages in accuracy, but has the defect of possibly lacking in detection speed in the field with high real-time requirement on road damage detection; the representation of the one-stage target detection method mainly comprises SSD, YOLO series, retinonet and the like, and the one-stage target detection method has good performance in detection precision and detection speed, and can well meet the real-time requirement of road damage detection tasks.

However, the road damage detection task generally has the problems that the coverage range of the road damage area in the visual field changes along with the distance from the road damage area in the running process of the vehicle, and the like, and the single receptive field in the existing detection method cannot well cope with the above situations, so that the gray value of the pixel is close to the gray value of the road damage position, and the accuracy of the damage model detection is reduced.

Disclosure of Invention

Aiming at the defect that a single receptive field in the prior art cannot well solve the problem that the coverage area of road hazards in the visual field continuously changes along with the distance of vehicles in the driving process, and the accuracy of damage model detection is reduced, the invention provides a road damage detection method and a road damage detection system, so that the problem that the coverage area of the road hazards in the visual field cannot well solve the problem that the coverage area of the road hazards continuously changes along with the distance of vehicles in the driving process, and the accuracy of damage model detection is reduced in the prior art is solved.

A road damage detection method, comprising:

acquiring image data of a road to be detected;

based on a YOLOX network architecture, constructing a multi-scale channel shuffling fusion pipeline MSFTube module based on a multi-scale channel shuffling fusion convolution MSFConv, constructing a MSFLayer module based on the MSFTube module, replacing a CSPLayer module with the MSFLayer module, replacing a Bottleneck module in the CSPLayer module with the MSFTube module, and constructing a MSFHead with multi-scale extraction capability by using a depth separable channel shuffling convolution DSFConv; further constructing an MSF-YOLO road damage detection model;

inputting image data of a road to be detected into the MSF-YOLO road damage detection model, and extracting feature layers of different sizes of receptive fields and multi-channel information interaction through an MSFTube module;

feature fusion is carried out on the feature layer extracted by the MSFTube module through the MSFLayer module, so that a feature layer with multi-scale cross-channel interaction information is obtained;

inputting a feature layer with multi-scale cross-channel interaction information into MSFHead to extract an image feature map;

and obtaining a detection image of the road damage to be detected according to the image feature map.

Further, the extracting of the receptive fields with different sizes and the feature layer of the multi-channel information interaction by the MSFTube module comprises two processing procedures: the original characteristic information of the device is kept; or the feature layers with different sizes of receptive fields and fully interacted channel information are obtained through MSFConv, and the information of each receptive field is integrated through one BaseConv.

Further, the method for obtaining the feature layer with different sizes of receptive fields and fully interacted channel information through MSFConv comprises the following steps:

aggregating channel information by 1 x 1 convolution and adjusting the number of channels;

extracting features using a convolution of 3 x 3 convolution kernel size;

carrying out feature extraction on the extracted features through convolution with the size of a 3 multiplied by 3 convolution kernel of 2;

and sending the extracted features into a convolution with the 3×3 convolution kernel size of 3 for feature extraction.

Further, the method also comprises the steps of adding a residual strategy in the process of extracting the receptive fields with different sizes, and respectively distributing four parameter weights w to the continuously overlapped multiple receptive fields ₁ ，w ₂ ，w ₃ ，w ₄ Multiple receptive fields are fused through a concat function, and characteristic information is fully interacted with each different receptive field through channel shuffle operation.

Further, the feature layers of the receptive fields with different sizes and the multi-channel information interaction are extracted through the MSFTube module, and the specific process is as follows:

wherein,representing the input feature layer, +.>、/>、/>、/>Characteristic information representing four different scale sizes in MSFConv, +.>For the characteristic information of different scale sizes, breaking the new characteristic layer of the information gap between different scales after channel shuffle fusion operation, the new characteristic layer is added with->1 x 1 convolution layers, respectively, and 3 x 3 convolution layers representing the convolution void fraction {1,2,3}, respectively->Representing a channel shuffle operation, < +.>Representing 5 weight parameters for adaptive channel interaction information,/->Representing the feature layer after integrating the feature information via the 1 x 1 convolution layer,/and>representation->The new feature layer with different channel interaction information after channel shutdown operation,representing a completely new feature layer with multi-scale cross-channel interaction characteristics.

Further, the feature layer extracted by the MSFTube module is input into the MSFLayer module and divided into two routes for carrying out, wherein the two routes are respectively used as residual structures to keep original information, or the features are extracted by n MSFTube modules, and the extracted feature layer is subjected to channel shutdown operation to obtain a new feature layer.

Further, the feature fusion is performed on the feature layer extracted by the msf sub-module through the msf sub-module, which includes adding and fusing an original feature layer and the feature layer extracted by the n msf sub-modules, or adding and fusing the original feature layer and a new feature layer after channel shutdown operation of the feature layer extracted by the n msf sub-modules.

Further, the MSFHead performs feature fusion by dividing a feature layer output by a Neck layer into two routes, so as to respectively complete classification prediction of features; the two characteristic fusion routes comprise:

the MSFHead adds and fuses the original characteristic layer and the characteristic layer subjected to DWConv operation through DSFConv;

or the feature layer subjected to DWConv operation is subjected to channel shuffle operation to obtain a new feature layer, and the original feature layer is connected and fused with the new feature layer.

Further, the categories of road damage include seven categories of longitudinal cracks, transverse cracks, crazes, pits, fuzzy intersections, fuzzy white lines and well covers.

Further, a road damage detection system includes:

the acquisition module is used for acquiring image data of the road to be detected;

the model construction module is used for constructing a multi-scale channel shuffling fusion pipeline MSFTube module based on a YOLOX network architecture and based on a multi-scale channel shuffling fusion convolution MSFConv, and constructing an MSFLayer module based on the MSFTube module so as to simultaneously represent depth separable channel shuffling convolution DSFConv of the original inter-channel association information and the cross-channel interaction information and provide MSFHead with multi-scale extraction capability; further constructing an MSF-YOLO road damage detection model;

the first feature extraction module is used for inputting image data of a road to be detected into the MSF-YOLO road damage detection model, and extracting feature layers of different sizes of receptive fields and multi-channel information interaction through the MSFTube module;

the feature fusion module is used for carrying out feature fusion on the feature layer extracted by the MSFTube module through the MSFLayer module to obtain a feature layer with multi-scale cross-channel interaction information;

the second feature extraction module is used for inputting a feature layer with multi-scale cross-channel interaction information into the MSFHead and extracting an image feature map;

and the detection module is used for obtaining a detection image of the road damage to be detected according to the image feature map.

The invention provides a road damage detection method and a road damage detection system, which have the following beneficial effects:

according to the invention, based on a YOLOX network architecture, MSFConv, MSFConv can be designed by using the idea of multi-scale channel interaction, so that the influence of the scale change of a road hazard area on the network can be reduced, and the information gap between features with different scales is reduced; meanwhile, MSFTube is designed based on MSFConv, so that the original bottleneck feature extraction form is changed, rich channel information is protected in a straight-tube feature extraction mode, and the extracted features can perform information interaction with original information in more channel layers; the method is characterized in that an MSFLayer is built on the basis of MSFTube, a mode that the traditional feature is firstly reduced in dimension and then extracted, the inter-channel association information is possibly damaged is abandoned, the feature is extracted in a non-dimension reduction mode, more adjustable inter-channel association information is endowed to a network through channel interaction information self-adaption, a depth separable convolution DSFConv capable of simultaneously representing the original inter-channel association information and inter-channel interaction information is provided, an MSFHead with multi-scale feature extraction capability is designed on the basis of DSFConv, the accuracy of model detection is improved, accidents caused by road hazards are reduced, and driving safety is improved.

Drawings

FIG. 1 is a schematic diagram of a MSF-YOLO algorithm framework in an embodiment of the present invention;

FIG. 2 is a flow chart of the MSFConv module operation in an embodiment of the present invention;

FIG. 3 is a flowchart of the MSFTube module operation in an embodiment of the present invention;

FIG. 4 is a MSFLayer workflow diagram in an embodiment of the present invention;

FIG. 5 is a workflow diagram of DSFConv in an embodiment of the present invention;

FIG. 6 is a flowchart of MSFHead operation in accordance with an embodiment of the present invention;

FIG. 7 is a schematic diagram of a NVIDIA Jetson orin nano developer kit device according to an embodiment of the invention;

fig. 8 is a schematic diagram of a process of detecting a road damage detection model according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.

The invention provides a road damage detection method, which comprises the following steps:

and step 1, acquiring image data of a road to be detected.

Step 2, constructing an MSF-YOLO road damage detection model; the method comprises the following steps:

(1) Designing a network architecture of MSF-YOLO based on the YOLOX network architecture; the backbone feature extraction network adopted by the YOLOX algorithm architecture when extracting features is CSPDarknet, and features are extracted by adopting convolution with a single-scale convolution kernel size (such as 3×3) by default, which may not be a good solution to the problems of missed detection and false detection caused by insufficient receptive field in the road damage detection. In order to solve the above problems, MSF-YOLO has been proposed to solve the above problems, and the network architecture of the re-designed MSF-YOLO is shown in FIG. 1.

(2) A multi-scale channel shuffling fusion pipeline (MSFTube) is constructed based on multi-scale channel shuffling fusion convolution (MSFConv), and the MSFTube is a pipeline module which keeps channel information from reducing dimension and has a multi-scale channel shuffling fusion function, so that feature extraction is carried out on road hazard features with different coverage areas.

As shown in fig. 2, MSFConv first aggregates channel information and adjusts the number of channels by a 1×1 convolution, and then extracts features using a convolution of 3×3 convolution kernel size. The features extracted from the previous layer are then subjected to feature extraction by means of a convolution with a 3×3 convolution kernel size of 2, and then the features of the previous layer are subjected to feature extraction by means of a convolution with a 3×3 convolution kernel size of 3. It is worth noting that the residual strategy is added in the continuous superposition process of the receptive fields, so that the extracted features cannot be subjected to perceptual information fault, and the receptive fields can better understand the information expressed by the previous layer in the continuous superposition process. Finally, respectively distributing four trainable parameter weights w to the continuously overlapped multiple receptive fields ₁ ,w ₂ ,w ₃ ,w ₄ Thereby balancing different duty ratios among different receptive fields, and then fusing through a concat function and enabling each different receptive field to fully interact characteristic information through a channel shuffle operation so as to solveThe problem that channel information with different receptive fields can be simply interacted only in adjacent areas is solved.

In fig. 3, the MSFTube module designed by the present invention can see that the input feature information does not adopt the conventional bottleck feature extraction strategy for reducing the number of parameters, but adopts a feature extraction strategy without losing channel information. The input features are divided into two routes, one route keeps the original feature information of the route, and the other route firstly obtains feature layers with different receptive fields and fully interacted channel information through MSFConv and fully integrates the information of each receptive field through one BaseConv, so that the information between channels is fully interacted.

In the aboveRepresenting the input feature layer, +.>、/>、/>、/>Characteristic information representing four different scale sizes in MSFConv, +.>For the characteristic information of different scale sizes, breaking the new characteristic layer of the information gap between different scales after channel shuffle fusion operation, the new characteristic layer is added with->1 x 1 convolution layers, respectively, and 3 x 3 convolution layers representing the convolution void fraction {1,2,3}, respectively->Representing a channel shuffle operation, < +.>Representing 5 trainable weight parameters for the adaptive channel interaction information. />Representing the feature layer after integrating the feature information via the 1 x 1 convolution layer,/and>representation->Having a different characteristic after the channel-shuffle operationNew feature layer of previous channel interaction information, < >>Representing a completely new feature layer with multi-scale cross-channel interaction characteristics.

(3) And constructing an MSFLayer module based on the MSFTube module, and replacing the CSPLayer module with the MSFLayer module to replace a Bottleneck module in the CSPLayer module. In order to make the network acquire more feature information as much as possible in the continuous iterative process, the msfliyer is designed to acquire more inter-channel interaction information when feature fusion is performed, as shown in fig. 3. The msfliyer continues to optimize the features extracted by the n MSFTube modules. Firstly, inputting features, namely dividing the input features into two routes, wherein one route is used as a residual structure to keep original information, the other route is used for extracting features through n MSFTube modules, and the extracted features are layered X ₁ Obtaining a new feature layer X through channel shuffle operation ₂ . Then dividing the main route into two routes according to the form of a main route and a secondary route, wherein the main route is an original characteristic layer X and a characteristic layer X ₁ Adding and fusing, wherein the secondary route is an original characteristic layer X and a characteristic layer X ₁ New feature layer X after channel shutdown operation ₂ And performing cross-channel addition fusion.

In the aboveRepresenting n MSFTube module feature extraction operations on the input feature X,representing a channel shuffly operation, w ₁ A trainable weight parameter for the adaptive channel interaction information is represented. X is X ₁ New feature layer representing input feature X through n times of MSFTube module feature extraction operation ₂ X represents ₁ Obtaining a new feature layer with channel interaction information different from the previous channel interaction information through channel shutdown operation, X _out Representing a completely new feature layer with multi-scale cross-channel interaction information.

(4) Constructing MSFHead with multi-scale extraction capacity by depth separable channel shuffling convolution DSFConv; the conventional YOLO series detection head generally adopts a method of unifying channel dimensions by 1×1 convolution, but the calculation amount is inevitably increased and channel information is damaged in the process of unifying the channel number. Thus, depth separable channel shuffle convolution (DSFConv) and MSFHead were proposed to address this problem, as shown in fig. 4, 5. DSFConv protects inter-channel related information missing due to DWConv operation, and combines original feature layer rich in inter-channel related information with feature layer X after DWConv operation ₁ And the addition fusion is carried out, so that the abundant inter-channel association information is effectively protected on the premise of greatly reducing the parameter quantity and the calculated quantity. At the same time, the feature layer X after DWConv operation is performed ₁ Obtaining a new feature layer X through channel shuffle operation ₂ The method is carried out according to a main route and a secondary route, wherein the main route is formed by an original characteristic layer X and a characteristic layer X ₁ The original inter-channel association information is protected by adding and fusing, and the secondary route is composed of an original characteristic layer X and a characteristic layer X subjected to channel shutdown operation ₂ And adding and fusing are carried out, so that the network can learn more characteristic information.

F in the above _DW Represents Depthwise convlution operation, f _CS Representing a channel shuffle operation, w ₁ Representing trainable weight parameters for adaptively adjusting cross-channel interaction information, X ₁ Representing a new feature layer obtained by Depthwise convlution operation of input feature X ₂ X represents ₁ The feature layer is subjected to channel shuffle operation to obtain a new feature layer with channel interaction information different from the previous channel interaction information, and Xout represents a brand new feature layer with multi-scale cross-channel interaction information. X is X _Cls ，X _Reg ，X _IoU And respectively representing the classification, prediction frame regression and prediction results of the confidence coefficient.

And 3, inputting image data of the road to be detected into the MSF-YOLO road damage detection model, and extracting feature layers of different sizes of receptive fields and multi-channel information interaction through an MSFTube module.

And 4, carrying out feature fusion on the feature layer extracted by the MSFTube module through the MSFLayer module to obtain a feature layer with multi-scale cross-channel interaction information.

And 5, inputting the feature layer with multi-scale cross-channel interaction information into MSFHead to extract an image feature map.

And 6, obtaining a detection image of the road damage to be detected according to the image feature map.

Experiments prove that on the premise that the quantity and calculated quantity of MSF-YOLO-s parameters are respectively reduced by 8.7 percent and 20.9 percent compared with that of YOLOX-s, the accuracy rate on a GRDDC data set is improved by 5.8 percent and reaches 65.9 percent. Detection performance exceeds other mainstream algorithms.

The MSFConv, MSFConv is designed by the thought of multi-scale channel interaction, so that the influence of the scale change of the road hazard area on the network can be reduced, and the channel shuffle fusion strategy is adopted, so that the information gap between the features with different scales is reduced. Meanwhile, the design MSFTube, MSFTube based on MSFConv changes the original bottleck feature extraction form, the abundant channel information is protected in a straight-cylinder feature extraction mode, and the extracted features can interact with the original information in more channel layers through channel shuffle fusion strategies. The MSFLayer is built on the basis of the MSFTube, a traditional mode that the inter-channel association information is possibly damaged by firstly reducing the dimension and then extracting the features is abandoned, the features are extracted in a non-dimension reduction mode, and more adjustable inter-channel association information is endowed to the network through channel interaction information self-adaption. And a depth separable convolution DSFConav for simultaneously representing the original inter-channel association information and the cross-channel interaction information is provided, and an MSFHead with multi-scale feature extraction capability is designed based on the DSFConav, and a working flow chart of the MSFHead is shown in fig. 6.

Compared with the traditional YOLO series algorithm, the YOLOX is remarkably improved, compared with the traditional algorithm, the original Anchor free mechanism is innovatively replaced by the Anchor free mechanism, and the simplified SimOTA dynamic positive sample matching method is used, so that the overall detection efficiency of the model is improved. The overall architecture of YOLOX can be roughly divided into a Backbone feature extraction layer Backbone, a feature fusion layer neg, and a prediction layer. After the input picture is processed to be 640×640 pixels in size through Resize, the input picture is firstly input into a Backbone for feature extraction, and three layers (dark 3: 80×80×256, dark4: 40×40×512, dark5: 20×20×1024) of the back Backbone are input into a rock layer for fusion of shallow features and deep features before prediction. The feature fusion mode adopts gradual fusion from deep layer to shallow layer, gradual fusion from shallow layer to deep layer and transfer of the fused features into the prediction layer. The prediction layer decouples the regression and classification tasks, calculates the regression and classification tasks respectively, integrates the results, and outputs a prediction result through related calculation.

The road damage detection task generally has the problems that the coverage range of a road damage area in the visual field changes along with the distance from the road damage area in the running process of a vehicle, at the moment, a single receptive field in the conventional network obviously cannot well cope with the gray value of a road damage part caused by reasons such as the situation, the model robustness when the situation is faced can be improved by adopting a multi-scale design, and meanwhile, the original inter-channel interaction information is richer by adopting a cross-channel interaction strategy, so that the network can learn more useful information in the conventional characteristic information, and the detection precision of the model is improved.

Experiments prove that: all experiments and tests of the present invention used the environmental parameters and training strategies shown in table 1. The GRDDC2020 dataset employed contains 21041 images of road damage, containing three countries: "Czech", "India", "Japan"; the sample amounts are respectively as follows: 2829. 7706, 10506. 10000 road damage images are randomly selected for experiments, wherein the images comprise 7 categories of D00 (longitudinal crack), D10 (transverse crack), D20 (crack), D40 (pit hole), D43 (intersection blurring), D44 (white line blurring) and D50 (well cover).

TABLE 1 environmental parameters and training strategies

The experimental results are shown in table 2, and it can be seen that when MSFTube was used, the maps were increased by 2.0% and the number of parameters was reduced by 18.2% and the calculated amount was reduced by 16.7%. When MSFLayer is adopted, although the quantity of parameters is greatly increased, the brought precision improvement is directly 4.7%, when MSFHead is adopted, mAP is improved by 2.6%, the quantity of parameters is reduced by 12.9%, the quantity of calculation is reduced by 33.0%, and the detection speed is 2.0ms. When a plurality of components are combined, the detection performance is improved, the components are complemented, especially when all the components are used, mAP reaches 65.9%, the parameter quantity is reduced by 8.7%, and the calculated quantity is reduced by 21.2%. Experimental results show that the detection performance of the MSF-YOLO-s algorithm is remarkably improved compared with that of the YOLOX-s algorithm.

Table 2 experimental results

The new algorithm was compared to other mainstream algorithms such as SSD, FCOS, centernet, YOLOv, YOLOv4, YOLOv5, YOLOX, YOLOv7, YOLOv8 on the GRDDC2020 road damage dataset. In order to demonstrate the advantages of the new algorithm proposed by the present invention. The comparative test results are shown in Table 3, where Size represents the image input Size, latency represents the network detection speed, and parameters represents the network parameter number.

Table 3 results of comparative tests

As shown in table 3, although the YOLOv5 series algorithm is advantageous in terms of detection speed, it is inferior in terms of detection accuracy, and the YOLOv5-l version with the largest parameter amount only achieves a score of 44.3% on mAP. The YOLOX series algorithm performed better in all the comparison algorithms, with 60.1 bright eye scores achieved on the mAP at lower parameter levels in the YOLOX-s version, and the YOLOX-m version achieved similar scores on the mAP with significantly lower parameter levels than the FCOS and center algorithms. The YOLOv7 series of algorithms performed poorly, where YOLOv7-tiny achieved a 5.8ms advantage in detection speed, but achieved only 37.5% of the performance in mAP. The YOLOv8 series performed better overall, with the YOLOv8-l version achieving 64.8% performance on the mAP, which is the best performance except for the MSF-YOLO algorithm. The MSF-YOLO series algorithm achieved significant detection performance advantages in all the compared mainstream algorithms, with MSF-YOLO-l achieving the highest 67.1% of the total set of experiments on mAP. MSF-YOLO-nano achieves 1.5% improvement on mAP with a low parameter of 1.55M which is lower than twice the parameter of YOLOv8-n, and the tiny and s versions of MSF-YOLO achieve significant advantages on mAP on the premise that the parameter is significantly lower than the s, M and l versions of YOLOv8, wherein MSF-YOLO-s achieves 65.9% eye-brightening performance on mAP with the parameter of 8.16M, which exceeds 64.8% achieved by YOLOv 8-l. The MSF-YOLO series algorithm proves to be significantly advantageous over other algorithms in road hazard detection tasks.

In order to verify that the method can be well applied in the actual situation, the invention sets a crack detection scheme for whole vehicle deployment. The edge computing and processing equipment adopted by the scheme is a NVIDIA Jetson orin nano developer kit device, and the crack detection model is an MSF-YOLO-s model. The device of the invention is fixed on the vehicle and moves along with the vehicle in the urban road, acquires real-time information about the characteristics of the crack and predicts the position of the crack. FIG. 7 illustrates the deployment location of the edge computing device of the present invention on a complete vehicle, presented in three orientations.

As seen in fig. 8, the deployment scheme flow of the present invention is: image data is acquired through a camera externally connected with the Jetson orin nano developer kit device (road crack picture capture module), then the acquired image data possibly with crack characteristics is transmitted to a crack detection model in the Jetson orin nano developer kit device for detection (road crack picture recognition module), and finally a result graph with crack prediction is displayed on a screen (road crack result display module), so that a driver is reminded that cracks can exist in front of the driver, and careful driving is needed.

The invention provides a novel MSF-YOLO series network for detecting road hazards. Aiming at the problem that the road hazard area in the road hazard detection task can change along with the change of the distance between the road hazard area and the vehicle, the MSF-YOLO provides a multi-scale receptive field to reduce the influence of the change of the scale of the road hazard area, and reduces the information gap between different scales through channel interaction. The road hazard experimental results show that: (1) on the premise that the quantity and calculated amount of MSF-YOLO-s parameters are respectively reduced by 8.7 percent and 20.9 percent compared with that of YOLO-s, the accuracy rate on a GRDDC data set is improved by 5.8 percent and reaches 65.9 percent. The detection performance exceeds other mainstream algorithms; (2) the real vehicle deployment scheme for crack detection is provided to realize road hazard detection of a vehicle when the vehicle runs on a road, so that accidents caused by road hazard are reduced, and driving safety is improved.

Based on the same inventive concept, the invention also provides a road damage detection system, comprising:

and the acquisition module is used for acquiring the image data of the road to be detected.

The model construction module is used for constructing a multi-scale channel shuffling fusion pipeline MSFTube module based on a YOLOX network architecture and based on a multi-scale channel shuffling fusion convolution MSFConv, and constructing an MSFLayer module based on the MSFTube module so as to simultaneously represent depth separable channel shuffling convolution DSFConv of the original inter-channel association information and the cross-channel interaction information and provide MSFHead with multi-scale extraction capability; and then constructing an MSF-YOLO road damage detection model.

The first feature extraction module is used for inputting image data of a road to be detected into the MSF-YOLO road damage detection model, and extracting feature layers of different sizes of receptive fields and multi-channel information interaction through the MSFTube module.

And the feature fusion module is used for carrying out feature fusion on the feature layer extracted by the MSFTube module through the MSFLayer module to obtain a feature layer with multi-scale cross-channel interaction information.

And the second feature extraction module is used for inputting the feature layer with multi-scale cross-channel interaction information into the MSFHead and extracting an image feature map.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. A method for detecting road damage, comprising:

acquiring image data of a road to be detected;

based on a YOLOX network architecture, constructing a multi-scale channel shuffling fusion pipeline MSFTube module based on a multi-scale channel shuffling fusion convolution MSFConv, constructing a MSFLayer module based on the MSFTube module, replacing a CSPLayer module with the MSFLayer module, replacing a Bottleneck module in the CSPLayer module with the MSFTube module, and constructing a MSFHead with multi-scale extraction capability by using a depth separable channel shuffling convolution DSFConv; further constructing an MSF-YOLO road damage detection model; the MSFConv adopts multi-scale convolution to extract road hazard characteristics, uses cavity convolution to extract the road hazard characteristics, and performs channel shuffling on the extracted characteristic layers; the MSFTube is used for carrying out MSFConv feature extraction without dimension reduction on channel information, and carrying out residual connection on the extracted feature layer and the original input feature layer; the MSFLayer comprises a plurality of MSFTubes, and is used for carrying out residual connection on the feature layer after the feature extraction of the plurality of MSFTubes and the original input feature layer; the DSFConv is used for protecting inter-channel associated information which is missing due to the deep convolution operation, and carrying out residual connection, addition and fusion on an original characteristic layer rich in the inter-channel associated information and a characteristic layer subjected to the deep convolution operation; the MSFHead reduces the calculated amount in the aspect of unified channel number in a mode of not reducing the dimension and protects rich channel information; the MSF-YOLO is used for obtaining a detection image of road damage to be detected;

2. The method of claim 1, wherein the extracting the receptive fields of different sizes and the feature layer of the multi-channel information interaction by the MSFTube module includes two processing routes, which are respectively to keep the original feature information of the input feature layer of the MSFTube module, or obtain the feature layer with the receptive fields of different sizes and fully interacted with the channel information by the MSFConv, and integrate the information of the receptive fields of different sizes obtained by the MSFConv by one BaseConv.

3. The method for detecting road damage according to claim 2, wherein the obtaining the feature layer with different sizes of receptive fields and fully interactive channel information by MSFConv comprises the following steps:

after channel information is aggregated and the number of channels is adjusted through 1×1 convolution, features are extracted by using convolution with the size of a 3×3 convolution kernel;

carrying out feature extraction again on the extracted features through convolution with the size of a 3 multiplied by 3 convolution kernel of 2;

and sending the re-extracted features into a convolution with the 3×3 convolution kernel size of 3 for feature extraction.

4. The method for detecting road damage according to claim 3, further comprising adding residual strategies in the process of extracting the receptive fields with different sizes, and respectively distributing four parameter weights w to the continuously overlapped multiple receptive fields ₁ ，w ₂ ，w ₃ ，w ₄ Multiple receptive fields are fused through a concat function, and characteristic information is fully interacted with each different receptive field through channel shuffle operation.

5. The method for detecting road damage according to claim 1, wherein the feature layers of different sizes of receptive fields and multi-channel information interaction are extracted by an MSFTube module, and the specific process is as follows:

X _w1 ＝f(X _in )

X _w2 ＝f _d1 (X _w1 )

X _w3 ＝f _d2 (X _w2 )

X _w4 ＝f _d3 (X _w3 )

X _out ＝f _CS (f _concat (w ₁ *(X _w1 ),w ₂ *(X _w2 ),w ₃ *(X _w3 ),w ₄ *(X _w4 )))

X ₁ ＝f(X _out )

X ₂ ＝f _CS (X ₁ )

X _output ＝(X ₁ +X)+w ₅ *(X ₂ +X)

wherein X is _in Representing input feature layer, X _w1 、X _w2 、X _w3 、X _w4 Characteristic information representing four different scale sizes in MSFConv, X _out New feature layers, f and f, for breaking information gaps among different scales after channel shuffle fusion operation of feature information with different scales _d1 ,f _d2 ,f _d3 1X 1 convolution layers, respectively, and 3X 3 convolution layers representing the convolution void fraction {1,2,3}, f _CS Representing a channel shuffle operation, w ₁ ,w ₂ ,w ₃ ,w ₄ ,w ₅ Representing 5 weight parameters, X, for adaptive channel interaction information ₁ Representing the feature layer after integrating the feature information through the 1X 1 convolution layer, X ₂ X represents ₁ New feature layer X with channel interaction information different from the previous channel interaction information after channel shutdown operation _output Representing a completely new feature layer with multi-scale cross-channel interaction characteristics.

6. The method for detecting road damage according to claim 1, wherein the feature layer extracted by the MSF Tube module is input into the msfliyer module in two ways, original features are maintained for a residual structure, or features are extracted by n MSFTube modules, and the extracted feature layer is subjected to channel shutdown operation to obtain a new feature layer.

7. The method for detecting road damage according to claim 6, wherein the feature fusion is performed on the feature layers extracted by the MSFTube modules by the msfliyer modules, and the method comprises adding and fusing original features with the feature layers extracted by the n MSFTube modules, or adding and fusing the original features with new feature layers after the feature layers extracted by the n MSFTube modules are subjected to channel buffering operation.

8. The method for detecting road damage according to claim 1, wherein the MSF Head performs feature fusion on feature layers output by a neg layer by two routes, so as to respectively complete classification prediction of features; the two characteristic fusion routes comprise:

the MSFHead adds and fuses the original features and the feature layers subjected to DWConv operation through DSFConv;

or obtaining a new feature layer from the feature layer subjected to DWConv operation through channel shuffle operation, and adding and fusing the original features and the new feature layer.

9. The method according to claim 1, wherein the categories of road damage include seven categories of longitudinal cracks, transverse cracks, crazes, pits, crossroad blurs, white lines blurs, and well covers.

10. A road damage detection system, comprising:

the model construction module is used for constructing a multi-scale channel shuffling fusion pipeline MSFTube module based on a YOLOX network architecture and based on a multi-scale channel shuffling fusion convolution MSFConv, and constructing an MSFLayer module based on the MSFTube module so as to simultaneously represent depth separable channel shuffling convolution DSFConv of the original inter-channel association information and the cross-channel interaction information and provide MSFHead with multi-scale extraction capability; further constructing an MSF-YOLO road damage detection model; the MSFConv adopts multi-scale convolution to extract road hazard characteristics, uses cavity convolution to extract the road hazard characteristics, and performs channel shuffling on the extracted characteristic layers; the MSFTube is used for carrying out MSFConv feature extraction without dimension reduction on channel information, and carrying out residual connection on the extracted feature layer and the original input feature layer; the MSFLayer comprises a plurality of MSFTubes, and is used for carrying out residual connection on the feature layer after the feature extraction of the plurality of MSFTubes and the original input feature layer; the DSFConv is used for protecting inter-channel associated information which is missing due to the deep convolution operation, and carrying out residual connection, addition and fusion on an original characteristic layer rich in the inter-channel associated information and a characteristic layer subjected to the deep convolution operation; the MSFHead reduces the calculated amount in the aspect of unified channel number in a mode of not reducing the dimension and protects rich channel information; the MSF-YOLO is used for obtaining a detection image of road damage to be detected;