CN115439750A

CN115439750A - Road disease detection method and device, electronic equipment and storage medium

Info

Publication number: CN115439750A
Application number: CN202211158865.1A
Authority: CN
Inventors: 许明; 张弓; 郑睿博; 骆庚; 郑文青; 王广涛; 熊茜楠; 孙权泽; 徐红; 党韫垚; 纪超
Original assignee: Arsc Underground Space Technology Development Co ltd
Current assignee: Arsc Underground Space Technology Development Co ltd
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2022-12-06

Abstract

The application provides a road disease detection method, a device, an electronic device and a storage medium, wherein the method comprises the following steps: a group of small target detection prediction heads are added in an original YOLOv5 neural network structure, so that a target detection model can extract a first target feature map, a second target feature map, a third target feature map and a fourth target feature map of an image to be detected, and the four target feature maps represent features with different granularities, so that the features of small targets can be detected according to the target feature maps obtained by the target detection model, the influence caused by severe target scale change is relieved, the detection performance of small target objects is improved, and the obtained target features are more comprehensive; and the prediction result of the image to be detected, which is obtained according to the four target characteristic graphs, is more accurate.

Description

Road disease detection method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of intelligent traffic, in particular to a road disease detection method and device, electronic equipment and a storage medium.

Background

In recent years, road collapse accidents are frequent due to factors such as damage and leakage of urban underground pipe networks, production and life of the masses are influenced, the traffic capacity and the transportation efficiency of a road network can be guaranteed through road maintenance, the probability of traffic accidents is reduced, and meanwhile the service life of road surfaces can be prolonged. Before road maintenance, the road needs to be subjected to disease detection, and the problems of cavities, looseness, void and the like existing in the road are found in time.

In the prior art, the road disease condition can be detected based on a neural network model.

However, the existing neural network model has the problem that the tiny disease condition cannot be detected.

Disclosure of Invention

An object of the present application is to provide a road fault detection method, device, electronic device and storage medium, which improve the accuracy of detecting road faults.

In order to achieve the above purpose, the embodiments of the present application adopt the following technical solutions:

in a first aspect, an embodiment of the present application provides a method for detecting a road disease, where the method includes:

receiving an image to be detected sent by terminal equipment;

inputting the image to be detected into a pre-trained target detection model, extracting a first target feature map, a second target feature map, a third target feature map and a fourth target feature map corresponding to the image to be detected from the target detection model, and obtaining a prediction result of the image to be detected according to the first target feature map, the second target feature map, the third target feature map and the fourth target feature map, wherein the prediction result is used for indicating disease information of a road corresponding to the image to be detected, the first target feature map, the second target feature map, the third target feature map and the fourth target feature map are respectively feature maps output by four prediction heads with different scales in the target detection model, the first target feature map, the second target feature map, the third target feature map and the fourth target feature map are respectively used for representing features of one granularity, and the granularity corresponding to the first target feature map, the second target feature map, the third target feature map and the fourth target feature map is sequentially coarsened.

Optionally, the extracting, in the target detection model, a first target feature map, a second target feature map, a third target feature map, and a fourth target feature map corresponding to the image to be detected, and obtaining a prediction result of the image to be detected according to the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map includes:

inputting the image to be detected into a feature extraction sub-network of the target detection model to obtain a first intermediate feature map, a second intermediate feature map, a third intermediate feature map and a fourth intermediate feature map;

inputting a first intermediate feature map, a second intermediate feature map, a third intermediate feature map and a fourth intermediate feature map into a feature fusion sub-network of the target detection model to obtain a first target feature map, a second target feature map, a third target feature map and a fourth target feature map;

and inputting the first target characteristic diagram, the second target characteristic diagram, the third target characteristic diagram and the fourth target characteristic diagram into a detection sub-network of the target detection model to obtain a prediction result of the image to be detected.

Optionally, the inputting the image to be detected into the feature extraction sub-network of the target detection model to obtain a first intermediate feature map, a second intermediate feature map, a third intermediate feature map, and a fourth intermediate feature map includes:

inputting the image to be detected to a first standard convolution module in a feature extraction sub-network of the target detection model, and performing feature extraction by the first standard convolution module and a first feature extraction module connected with the first standard convolution module to obtain a first intermediate feature map;

inputting the first intermediate feature map into a second standard convolution module in a feature extraction sub-network, and performing feature extraction by the second standard convolution module and a second feature extraction module connected with the second standard convolution module to obtain a second intermediate feature map;

inputting the second intermediate feature map into a third standard convolution module in a feature extraction sub-network, and performing feature extraction by the third standard convolution module and a third feature extraction module connected with the third standard convolution module to obtain a third intermediate feature map;

and inputting the third intermediate feature map into a fourth standard convolution module in a feature extraction sub-network, and performing feature extraction by the fourth standard convolution module and a fourth feature extraction module connected with the fourth standard convolution module to obtain a fourth intermediate feature map.

Optionally, the inputting the first intermediate feature map, the second intermediate feature map, the third intermediate feature map, and the fourth intermediate feature map into a feature fusion sub-network of the target detection model to obtain the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map includes:

in the feature fusion sub-network, obtaining a first merged output feature map according to the first intermediate feature map, the second intermediate feature map, the third intermediate feature map and the fourth intermediate feature map, inputting the first merged output feature map into a fifth feature extraction module of the feature fusion sub-network, and performing feature fusion by the fifth feature extraction module and a first convolution layer connected to the fifth feature extraction module to obtain a first target feature map;

obtaining a second combined output feature map according to the second intermediate feature map, the third intermediate feature map and the fourth intermediate feature map, obtaining a third combined output feature map according to the second combined output feature map and the first combined output feature map, inputting the third combined output feature map into a sixth feature extraction module of the feature fusion sub-network, and performing feature fusion by the sixth feature extraction module and a second convolution layer connected with the sixth feature extraction module to obtain a second target feature map;

obtaining a fourth combined output feature map according to the third intermediate feature map and the fourth intermediate feature map, obtaining a fifth combined output feature map according to the fourth combined output feature map and the third combined output feature map, inputting the fifth combined output feature map into a seventh feature extraction module of the feature fusion sub-network, and performing feature fusion by the seventh feature extraction module and a third convolution layer connected with the seventh feature extraction module to obtain a third target feature map;

and obtaining a sixth combined output feature map according to the fifth combined output feature map and the fourth intermediate feature map, inputting the sixth combined output feature map into an eighth feature extraction module of the feature fusion sub-network, and performing feature fusion by the eighth feature extraction module and a fourth convolution layer connected with the eighth feature extraction module to obtain a fourth target feature map.

Optionally, each feature extraction module respectively comprises a first convolution module, a second convolution module, an attention mechanism module, a merging module and a third convolution module, which are connected in sequence;

the attention mechanism module comprises a fourth convolution module, a fifth convolution module, a channel attention module and a space attention module which are connected in sequence.

Optionally, the inputting the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map into a detection sub-network of the target detection model to obtain a prediction result of the image to be detected includes:

in the detection subnetwork, fusing the first target feature map, the second target feature map, the third target feature map and the fourth target feature map to obtain a fused target feature map, and predicting the fused target feature map to obtain information of a predicted bounding box, where the information of the predicted bounding box includes: predicting position information, height and width information of the bounding box;

and determining the prediction result of the image to be detected according to the information of the prediction boundary box and the height and width of the real box.

Optionally, the method further includes:

acquiring a plurality of sample atlas data, and inputting the plurality of sample atlases into an initial detection model;

the initial detection model performs feature extraction and prediction on the atlas data of each sample to obtain a prediction result of the atlas data of each sample;

determining loss information of the initial detection model according to the prediction result of the atlas data of each sample;

and performing iterative optimization on the initial detection model according to the loss information of the initial detection model to obtain the target detection model.

In a second aspect, an embodiment of the present application further provides a road disease detection device, where the device includes:

the receiving module is used for receiving the image to be detected sent by the terminal equipment;

the extraction module is configured to input the image to be detected into a pre-trained target detection model, extract a first target feature map, a second target feature map, a third target feature map and a fourth target feature map corresponding to the image to be detected in the target detection model, and obtain a prediction result of the image to be detected according to the first target feature map, the second target feature map, the third target feature map and the fourth target feature map, where the prediction result is used to indicate disease information of a road corresponding to the image to be detected, the first target feature map, the second target feature map, the third target feature map and the fourth target feature map are feature maps output by prediction heads of four different scales in the target detection model, the first target feature map, the second target feature map, the third target feature map and the fourth target feature map are respectively used to represent features of one granularity, and the granularities corresponding to the first target feature map, the second target feature map, the third target feature map and the fourth target feature map are sequentially changed to be coarse.

Optionally, the extracting module is specifically configured to:

Optionally, the extraction module is specifically configured to:

Optionally, the extracting module is specifically configured to:

Optionally, the extraction module is specifically configured to:

Optionally, the extracting module is specifically configured to:

In a third aspect, an embodiment of the present application further provides an electronic device, including: the road disease detection method comprises a processor, a storage medium and a bus, wherein the storage medium stores program instructions executable by the processor, when an application program runs, the processor and the storage medium communicate through the bus, and the processor executes the program instructions to execute the steps of the road disease detection method according to the first aspect.

In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is read and executes the steps of the road damage detection method according to the first aspect.

The beneficial effect of this application is:

according to the road disease detection method, the device, the electronic equipment and the storage medium, a group of small target detection prediction heads are added in an original YOLOv5 neural network structure, so that a target detection model can extract a first target feature map, a second target feature map, a third target feature map and a fourth target feature map of an image to be detected, and the four target feature maps represent features with different granularities, so that the features of a small target can be detected according to the target feature maps obtained by the target detection model, the influence caused by severe target scale change is relieved, the detection performance of a small target object is improved, and the obtained target features are more comprehensive; and the prediction result of the image to be detected, which is obtained according to the four target characteristic graphs, is more accurate.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic diagram of an exemplary scenario provided in an embodiment of the present application;

fig. 2 is a schematic diagram of a server according to an embodiment of the present disclosure;

fig. 3 is a schematic view of a complete flow of a road disease detection method provided in the embodiment of the present application;

fig. 4 is a schematic structural diagram of a target detection model according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a feature extraction module according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of an attention mechanism module according to an embodiment of the present disclosure;

fig. 7 is an overall flow frame diagram of a road disease detection method provided in the embodiment of the present application;

fig. 8 is a schematic view of a device for a road disease detection method provided in an embodiment of the present application;

fig. 9 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and that steps without logical context may be reversed in order or performed concurrently. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

With the continuous updating of geological radar equipment, the technology is continuously improved, and the three-dimensional ground penetrating radar method becomes a preferred technical method and equipment for the acute detection and the census detection of the collapse hidden danger of the urban road due to the characteristics of high efficiency, high precision and high resolution, but the acquired detection data amount reaches the TB level, the mass radar map data still mainly depend on manual interpretation, and the efficiency is low; the interpretation standards of different personnel are not uniform, the accuracy of hidden danger judgment is unstable, and misjudgment and missed judgment are easy to occur; an intelligent identification system for screening road holes and other diseases is almost blank.

At present, researches on road disease (cavity) detection intelligent identification technologies are relatively few, many researches are directed at detection of pavement cracks of expressways, the detection methods of the technologies are similar, however, road disease characteristics of shallow underground spaces are more complex, and the phenomena of ' same-type and different-type and ' same-type and different-type ' of the diseases and radar map data are existed. Currently, mainstream target detection algorithms are classified into two types, one is a two-stage detection algorithm represented by Fast R-CNN and Fast R-CNN, and the other is a single-stage detection algorithm represented by SSD and YOLO. The two-stage algorithm model is large in size, long in training time and low in detection speed; the single-stage algorithm model is small in size, and the detection performance of the small target needs to be improved.

The application provides a detection model after a YOLO v5 model structure is improved, specifically, a prediction head of a tiny target is added in an original YOLO v5 model structure, an attention mechanism structure is introduced, a loss function of the model is optimized, and detection accuracy and efficiency of the detection model are improved.

Fig. 1 is a schematic view of an exemplary scenario provided by an embodiment of the present application, and as shown in fig. 1, the method is applied to a server, and the server is in communication connection with a terminal device, where the terminal device may be an electronic device such as a desktop computer, a notebook computer, a mobile phone, and a tablet computer. The terminal equipment sends a detection service request to the server, and the server performs detection by using the method of the embodiment of the application according to the received detection service request to obtain a detection result.

Fig. 2 is a schematic diagram of an architecture of a server according to an embodiment of the present application, and as shown in fig. 2, the server includes a database module, a model operation module, and a service module.

Optionally, the database module (Data-base) includes a model library (models), an application database (app-Data), and a log Data storage library (logs), where the model library is used to store algorithm models of different detection applications; the application database is used for storing atlas image data to be detected; the log data storage is used for storing log records of the system during operation.

Optionally, the model-runtime module (model-runtime) may support multiple language approaches, such as C + +, python, go, and other programming languages.

Optionally, the service module (API-server) may be configured to deploy a plurality of application services, and start one application service to detect the atlas image data that needs to be detected; when the application service is started, the application service can communicate with the model operation module, and the detection result is input into the database module for storage.

Fig. 3 is a schematic flow chart of a road disease detection method provided in an embodiment of the present application, where an execution main body of the method is the server described above. As shown in fig. 3, the method includes:

s101, receiving an image to be detected sent by the terminal equipment.

Optionally, after the server and the terminal device establish a communication connection, the terminal device may send a detection image request to the server, and the server receives an image to be detected sent by the terminal device, where the image to be detected may be an image to be detected that is pre-stored in a terminal device database, or may be an image that is acquired by the terminal device in real time.

Optionally, the image to be detected is a road disease image, and road disease image data acquired by detecting an actual geographic road by using a three-dimensional geological radar can be used, and general road disease types include voids, water-rich, loose and the like.

S102, inputting an image to be detected into a target detection model obtained through pre-training, extracting a first target characteristic diagram, a second target characteristic diagram, a third target characteristic diagram and a fourth target characteristic diagram corresponding to the image to be detected from the target detection model, and obtaining a prediction result of the image to be detected according to the first target characteristic diagram, the second target characteristic diagram, the third target characteristic diagram and the fourth target characteristic diagram.

Optionally, the server may detect the image to be detected by using a target detection model installed in the server in advance according to the received image detection request. The target detection model is improved based on a YOLOv5 neural network model, a prediction head for detecting a tiny target feature is added in an original YOLOv5 neural network model, a CBAM attention mechanism is introduced, and a C3-CBAM structure is used to replace an original C3 structure, so that fig. 4 is a structural schematic diagram of the target detection model provided by the embodiment of the application, as shown in fig. 4, the diagram includes a feature extraction sub-network and a feature fusion sub-network, the output of the original YOLOv5 neural network structure is only output layers 2, 3 and 4, wherein the layers 2, 3 and 4 are three pre-measuring heads of the original YOLOv5 neural network; the target detection model output layer in the application comprises 1 layer, 2 layer, 3 layer and 4 layer, wherein the first layer outputs a group of prediction heads of a tiny target added in the application, so that the target detection model can predict a boundary frame of an image to be detected on 4 scales by combining 3 original prediction heads, the influence caused by severe target scale change can be relieved, the detection on the tiny object is more accurate, and the detection performance on the tiny target is improved; in addition, the C3CBAM structure in fig. 4 is used to replace the C3 structure in the original yollov 5 neural network structure, so that the attention map can be sequentially inferred according to two independent dimensions of channel and space during feature extraction, and the extracted features are further refined.

Optionally, the server inputs the image to be detected into a pre-trained target detection model, and performs feature extraction on the image to be detected in the target detection model to obtain a first target feature map, a second target feature map, a third target feature map and a fourth target feature map, where the first target feature map, the second target feature map, the third target feature map and the fourth target feature map are feature maps output by four prediction heads with different scales in the target detection model respectively; the first target feature map, the second target feature map, the third target feature map and the fourth target feature map are respectively used for representing features of one granularity, and the corresponding granularities of the first target feature map, the second target feature map, the third target feature map and the fourth target feature map become thicker in sequence, wherein the granularity indicates the pixel size of the feature map to be extracted, namely each target feature map is the target feature map of an image of the granularity extracted under the condition of different granularity sizes; the corresponding granularity of the first target feature map, the second target feature map, the third target feature map and the fourth target feature map becomes thicker in sequence, that is, the targets corresponding to the target features obtained by the first target feature map, the second target feature map, the third target feature map and the fourth target feature map become larger and larger in sequence, for example, the first target feature map may obtain the features of a small target, the second target feature map may obtain the features of a normal-size target, the third target feature map may obtain the features of a large target, and the fourth target feature map may obtain the features of a larger target.

For example, if the size of the to-be-detected image is 640 × 640 pixels, the size of the first target feature map obtained by feature extraction of the target detection model is 160 × 160, the size of the second target feature map is 80 × 80, the size of the third target feature map is 40 × 40, and the size of the fourth target feature map is 20 × 20, where the resolution of the first target feature map is the highest, and the obtained target features are clearer, so that the detection effect of a small target can be improved.

Optionally, a prediction result of the image to be detected is obtained according to the first target feature map, the second target feature map, the third target feature map and the fourth target feature map, where the prediction result is used to indicate disease information, such as a type of a disease, of a road corresponding to the image to be detected.

In this embodiment, a group of small target detection prediction heads is added to an original YOLOv5 neural network structure, so that a target detection model can extract a first target feature map, a second target feature map, a third target feature map and a fourth target feature map of an image to be detected, and the four target feature maps represent features with different granularities, so that features of small targets can be detected according to the target feature maps obtained by the target detection model, influences caused by severe target scale changes are relieved, the detection performance of small target objects is improved, and the obtained target features are more comprehensive; and the prediction result of the image to be detected, which is obtained according to the four target characteristic graphs, is more accurate.

Optionally, the extracting, in the step S102, a first target feature map, a second target feature map, a third target feature map, and a fourth target feature map corresponding to the image to be detected from the target detection model, and obtaining the prediction result of the image to be detected according to the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map may include:

optionally, the image to be detected is input into a feature extraction sub-network of the target detection model to obtain a first intermediate feature map, a second intermediate feature map, a third intermediate feature map and a fourth intermediate feature map, where the feature extraction sub-network of the target detection model is used to perform feature extraction on a target in the image to be detected.

Optionally, the sizes of the first intermediate feature map, the second intermediate feature map, the third intermediate feature map, and the fourth intermediate feature map are different, the size of the pixel size is smaller, and the extracted target is larger.

Optionally, the first intermediate feature map, the second intermediate feature map, the third intermediate feature map, and the fourth intermediate feature map are input into a feature fusion sub-network of the target detection model to obtain a first target feature map, a second target feature map, a third target feature map, and a fourth target feature map, and feature fusion is performed on the feature maps of the four sizes in the feature fusion sub-network, that is, feature mixing and combining are performed on the first intermediate feature map, the second intermediate feature map, the third intermediate feature map, and the fourth intermediate feature map to obtain a target feature map.

Optionally, the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map are input into a detection subnetwork of the target detection model, and the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map are detected by the detection subnetwork of the target detection model, so as to obtain a prediction result of the image to be detected.

Optionally, the inputting the image to be detected into the feature extraction sub-network of the target detection model to obtain the first intermediate feature map, the second intermediate feature map, the third intermediate feature map, and the fourth intermediate feature map may include:

optionally, the image to be detected is input to a first standard convolution module in the feature extraction sub-network of the target detection model, and feature extraction is performed by the first standard convolution module and a first feature extraction module connected to the first standard convolution module, so as to obtain a first intermediate feature map. The standard convolution module (CBS) is a basic module in the target detection model, and is composed of a convolution layer (conv), a Normalization layer (BN for short), and a silu activation function. Before the first standard convolution module, the image to be detected needs to be input into a zeroth standard convolution module, and then the image to be detected is sequentially input into the zeroth standard convolution module, the first standard convolution module and the first feature extraction module to obtain a first intermediate feature map.

Optionally, the first intermediate feature map is input to a second standard convolution module in the feature extraction sub-network, and feature extraction is performed by the second standard convolution module and a second feature extraction module connected to the second standard convolution module, so as to obtain a second intermediate feature map.

Optionally, the second intermediate feature map is input to a third standard convolution module in the feature extraction sub-network, and feature extraction is performed by the third standard convolution module and a third feature extraction module connected to the third standard convolution module, so as to obtain a third intermediate feature map.

Optionally, the third intermediate feature map is input to a fourth standard convolution module in the feature extraction sub-network, feature extraction is performed by the fourth standard convolution module and a fourth feature extraction module connected to the fourth standard convolution module, so as to obtain a fourth intermediate feature map, and the feature map output by the fourth feature extraction module needs to be input to the pooling layer, so as to obtain the fourth intermediate feature map.

In the embodiment, the characteristic extraction is performed on the image to be detected layer by layer, so that the obtained characteristic graph is more comprehensive.

Optionally, the inputting the first intermediate feature map, the second intermediate feature map, the third intermediate feature map, and the fourth intermediate feature map into a feature fusion sub-network of the target detection model to obtain the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map may include:

optionally, the structure of the target detection model in fig. 4 is specifically: a zeroth CBS module-a first C3CBAM module-a second CBS module-a second C3CBAM module-a third CBS module-a third C3CBAM module-a fourth CBS module-a fourth C3CBAM module-a first SPFF module-a fifth CBS module-a first upsample layer-a first merge layer-a ninth C3CBAM module-a sixth CBS module-a second upsample layer-a second merge layer-a tenth C3CBAM module-a seventh CBS module-a third upsample layer-a third merge layer-a fifth C3CBAM module-an eighth CBS module-a fourth merge layer-a sixth C3CBAM module-a ninth CBS module-a fifth merge layer-a seventh C3CBAM module-a tenth CBS module-a sixth merge layer-an eighth C3CBAM module.

The CBS module is a standard convolution module, and the C3CBAM module is a feature extraction module.

Optionally, in the feature fusion sub-network, a first merged output feature map is obtained according to the first intermediate feature map, the second intermediate feature map, the third intermediate feature map, and the fourth intermediate feature map, the first merged output feature map is input into a fifth feature extraction module of the feature fusion sub-network, and the fifth feature extraction module and a first convolution layer connected to the fifth feature extraction module perform feature fusion to obtain a first target feature map.

Specifically, the fourth intermediate feature map is sequentially input into the fifth CBS module, the first upsample layer, the first merging layer, the ninth C3CBAM module, the sixth CBS module, the second upsample layer, the second merging layer, the tenth C3CBAM module, the seventh CBS module, the third upsample layer, and the third merging layer; inputting the third intermediate feature map into the first merging layer, the ninth C3CBAM module, the sixth CBS module, the second upsample layer, the second merging layer, the tenth C3CBAM module, the seventh CBS module, the third upsample layer and the third merging layer in sequence; and the second intermediate feature map is sequentially input into a second merging layer, a tenth C3CBAM module, a seventh CBS module, a third upsample layer and a third merging layer, a first merging output feature map of the third merging layer is obtained in the third merging layer according to the input first intermediate feature map, the second intermediate feature map, the third intermediate feature map and the fourth intermediate feature map, the first merging output feature map is input into a fifth feature extraction module of a feature fusion sub-network, and feature fusion is carried out by the fifth feature extraction module and a first convolution layer connected with the fifth feature extraction module to obtain a first target feature map.

Optionally, a second merged output feature map is obtained according to the second intermediate feature map, the third intermediate feature map and the fourth intermediate feature map, a third merged output feature map is obtained according to the second merged output feature map and the first merged output feature map, the third merged output feature map is input into a sixth feature extraction module of the feature fusion sub-network, and feature fusion is performed by the sixth feature extraction module and a second convolution layer connected to the sixth feature extraction module, so as to obtain a second target feature map.

Specifically, the fourth intermediate feature map is sequentially input to the fifth CBS module, the first upsample layer, the first merging layer, the ninth C3CBAM module, the sixth CBS module, the second upsample layer, and the second merging layer; inputting the third intermediate feature map into the first merging layer, the ninth C3CBAM module, the sixth CBS module, the second upsample layer and the second merging layer in sequence, inputting the second intermediate feature map into the second merging layer, obtaining a second merging output map in the second merging layer according to the input second intermediate feature map, the third intermediate feature map and the fourth intermediate feature map, and inputting the second merging output map into the tenth C3CBAM module, the seventh CBS module and the fourth merging layer in sequence; inputting the first merged output map to a fifth C3CBAM module, an eighth CBS module and a fourth merged layer in sequence; and obtaining a third combined output map of the fourth combined layer, inputting the third combined output map into a sixth C3CBAM module, and performing feature fusion by the sixth C3CBAM module and a second convolution layer connected with the sixth C3CBAM module to obtain a second target feature map.

Optionally, a fourth merged output feature map is obtained according to the third intermediate feature map and the fourth intermediate feature map, a fifth merged output feature map is obtained according to the fourth merged output feature map and the third merged output feature map, the fifth merged output feature map is input into a seventh feature extraction module of the feature fusion sub-network, and feature fusion is performed by the seventh feature extraction module and a third convolution layer connected to the seventh feature extraction module, so as to obtain a third target feature map.

Specifically, the fourth intermediate feature map is sequentially input into a fifth CBS module, a first upsample layer, and a first merge layer; inputting the third intermediate characteristic diagram into the first merging layer to obtain a fourth merging output characteristic diagram of the first merging layer; inputting the fourth merged output feature map to a ninth C3CBAM module, a sixth CBS module and a fifth merging layer in sequence; inputting the third combined output characteristic diagram to a sixth C3CBAM module, a ninth CBS module and a fifth combined layer in sequence; and obtaining a fifth merging output characteristic diagram of the fifth merging layer, and sequentially inputting the fifth merging output characteristic diagram to the seventh C3CBAM module-the third convolution layer to obtain a third target characteristic diagram.

Optionally, a sixth combined output feature map is obtained according to the fifth combined output feature map and the fourth intermediate feature map, the sixth combined output feature map is input into an eighth feature extraction module of the feature fusion sub-network, and the eighth feature extraction module and a fourth convolution layer connected to the eighth feature extraction module perform feature fusion to obtain a fourth target feature map.

Specifically, the fourth intermediate feature map is sequentially input to a fifth CBS module-a sixth merging layer; inputting the fifth combined output graph to a seventh C3CBAM module, a tenth CBS module and a sixth combined layer in sequence; and obtaining a sixth merging output feature map of the sixth merging layer, inputting the sixth merging output feature map into an eighth feature extraction module of the feature merging subnetwork, and performing feature merging by the eighth feature extraction module and a fourth convolution layer connected with the eighth feature extraction module to obtain a fourth target feature map.

In this embodiment, the feature fusion sub-network structure adopts a feature pyramid structure and a path aggregation network structure, and can enhance the capability of the network in fusing features of objects with different scaling scales, the feature pyramid structure mainly improves the detection effect of small target objects by fusing features of high and low layers, and a cancrinite upward path is added on the basis of the feature pyramid structure and the path aggregation network structure, so that the features of the low layer can be transmitted to the high layer, and the features of different layers can be fused by combining the two structures, so that the detection performance of dense targets can be improved.

Optionally, each feature extraction module includes a first convolution module, a second convolution module, an attention mechanism module, a merging module, and a third convolution module, which are connected in sequence, respectively, where the attention mechanism module includes a fourth convolution module, a fifth convolution module, a channel attention module, and a spatial attention module, which are connected in sequence. Fig. 5 is a schematic structural diagram of a feature extraction module according to an embodiment of the present application, in which an attention mechanism structure (CBAM) is combined with a C3 structure, where the C3 structure includes a first convolution module (CBS), a second convolution module (CBS), a merge module (concat), and a third convolution module (CBS), where each convolution module has the same structure as the standard convolution module, and is composed of a convolution layer (conv), a Normalization layer (Bath Normalization BN) and a silu activation function, which is referred to as CBS module for short; fig. 6 is a schematic structural diagram of an attention mechanism module according to an embodiment of the present application, where the attention mechanism module includes a fourth convolution module (CBS), a fifth convolution module (CBS), a channel attention module, and a spatial attention module, which are connected in sequence.

Optionally, the Attention mechanism structure (CBAM) is composed of two modules, namely a Channel Attention Module (CAM) and a Spatial Attention Module (SAM), where the CAM may focus on a foreground of the image by the network, so that the network focuses on a meaningful area more, and the SAM may focus on a position of the whole picture, where the position is rich in context information.

In the embodiment of the application, by combining the attention mechanism structure (CBAM) and the C3 structure, in the feature extraction process, attention maps can be sequentially inferred along two independent dimensions of a channel and a space, and the attention maps are multiplied by input feature maps, so that adaptive feature refinement can be performed, and a detection result is improved.

Optionally, the inputting the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map into a detection subnetwork of the target detection model to obtain the prediction result of the image to be detected may include:

optionally, in the sub-network of detectors, the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map are fused to obtain a fused target feature map, where the fused target feature map includes a plurality of detection frames, and the detection frames indicate types of road defects and position information of road defects.

Optionally, the fused target feature map may be processed by using a non-maximum suppression algorithm (NMS), the detection frames are eliminated according to the scores of the detection frames in the fused target feature map, the detection frame with the highest score is screened out as a prediction result of the image to be detected, for example, a red detection frame may represent a void, a purple detection frame may represent a loose, a brown detection frame may represent a rich water, and the like, and the detection frame may further include position information of the detection frame, that is, geographic coordinate information of the road disease.

Optionally, the method may further include:

optionally, a plurality of sample atlas data are obtained, and the plurality of sample atlases are input into the initial detection model, where the initial detection model is the initial detection model before the target detection model is optimized. The initial detection model performs feature extraction and prediction on each sample map data to obtain a prediction result of each sample map data, and specifically, the feature extraction and prediction process is the feature extraction and prediction process in the above specific embodiment, which is not described herein again.

Optionally, loss information of the initial detection model is determined according to a prediction result of each sample atlas data, iterative optimization is performed on the initial detection model according to the loss information of the initial detection model, and a target detection model is obtained, where the loss information may be a loss function of the detection model, the loss information of the initial detection model may be optimized by using a random gradient descent method of each sample atlas data, and when a convergence condition is reached, the initial detection model when the convergence condition is reached is used as the target detection model. The convergence condition may be that the obtained initial detection model is optimized or the number of iterative optimization times is preset, and when the preset number of iterative optimization times is reached, the initial detection model during the last optimization is used as the target detection model.

Optionally, in the training process of the target detection model, in the detection subnetwork, the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map are predicted, and the information of the prediction bounding box corresponding to each target feature map includes: position information, height and width information of the bounding box are predicted.

And optimizing the loss result of the target detection model according to the information of the prediction boundary box corresponding to each target feature map and the height and width of the real box, obtaining a first target feature map, a second target feature map, a third target feature map and a fourth target feature map of each sample map data according to a plurality of sample map data in the optimization process, performing iterative optimization on the loss result of the target detection model through the first target feature map, the second target feature map, the third target feature map and the fourth target feature map of each sample map data, and taking the loss result when the convergence condition is reached as the loss result of the target detection model.

Specifically, the calculation may be performed according to the following formula.

Where ρ (b) ^p ，b ^g ) Euclidean distance, rho (w), for the center points of the predicted and true frames ^p ，w ^g ) And ρ (h) ^p ，h ^g ) For predicting the wide and high difference of bounding and real frames, C _w For the width of the prediction box and the real box closed packet, C _h To predict the height of the box and the real box closed packet,

wherein A is the area of the real frame, B is the area of the predicted boundary frame, and the ratio of the intersection and union of the predicted boundary frame and the real frame is calculated.

The loss function of the target detection model obtained according to the method is l _focal-EIOU ＝IoUγl _EIou Wherein gamma is a parameter controlling the degree of inhibition of the abnormal value.

The aspect ratio is split on the basis of the CIoU, loss terms of the aspect ratio are split into the width and height difference values of the prediction frame and the real frame, so that the CIoU reflects the aspect ratio difference, the width and height difference is not the real difference of the boundary frame, and the effective optimization similarity of the model can be sometimes hindered. Fig. 7 is an overall flow frame diagram of a road disease detection method provided in an embodiment of the present application, and as shown in fig. 7, the recognition algorithm in fig. 7 is an algorithm in an initial detection model, the initial detection model is trained using a disease sample data set from test sample data to obtain a target detection model, and the target detection model is evaluated, and when the evaluation passes, the target detection model is derived and deployed, so as to be published and installed to a server.

The method comprises the steps that after a server receives a connection request and an image detection request of a terminal device, a target detection model in the server is started, an image to be detected is detected by using the target detection model, before the image to be detected is detected, the image to be detected needs to be cut and processed in a blocking mode, each divided image comprises a corresponding mark, the image to be detected is detected by using the target detection model respectively, after a prediction result of each divided image is obtained, the divided images are spliced and restored according to the marks of the divided images, the prediction result of the image to be detected is obtained, the prediction result of the image to be detected is returned to the terminal device, and the terminal device stores the received prediction result of the image to be detected.

Fig. 8 is a schematic view of a device of a road disease detection method provided in an embodiment of the present application, and as shown in fig. 8, the device includes:

a receiving module 201, configured to receive an image to be detected sent by a terminal device;

an extracting module 202, configured to input the image to be detected into a target detection model obtained through pre-training, extract a first target feature map, a second target feature map, a third target feature map, and a fourth target feature map corresponding to the image to be detected in the target detection model, and obtain a prediction result of the image to be detected according to the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map, where the prediction result is used to indicate disease information of a road corresponding to the image to be detected, the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map are feature maps output by prediction heads of four different scales in the target detection model, and the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map are respectively used to represent features of one granularity, and the corresponding granularities of the first target feature map, the second target feature map, the third target feature map, and the fourth target feature map are sequentially changed into coarse granularities.

Optionally, the extracting module 202 is specifically configured to:

inputting a first intermediate feature map, a second intermediate feature map, a third intermediate feature map and a fourth intermediate feature map into a feature fusion sub-network of the target detection model to obtain the first target feature map, the second target feature map, the third target feature map and the fourth target feature map;

Optionally, the extracting module 202 is specifically configured to:

and inputting the third intermediate feature map into a fourth standard convolution module in a feature extraction sub-network, and performing feature extraction by the fourth standard convolution module and a fourth feature extraction module connected with the fourth standard convolution module to obtain the fourth intermediate feature map.

Optionally, the extracting module 202 is specifically configured to:

in the feature fusion sub-network, obtaining a first merged output feature map according to the first intermediate feature map, the second intermediate feature map, the third intermediate feature map and the fourth intermediate feature map, inputting the first merged output feature map into a fifth feature extraction module of the feature fusion sub-network, and performing feature fusion by the fifth feature extraction module and a first convolution layer connected with the fifth feature extraction module to obtain a first target feature map;

Optionally, the extracting module 202 is specifically configured to:

in the detection subnetwork, fusing the first target feature map, the second target feature map, the third target feature map and the fourth target feature map to obtain a fused target feature map, and predicting the fused target feature map to obtain information of a predicted bounding box, wherein the information of the predicted bounding box includes: predicting position information, height and width information of the bounding box;

Optionally, the extracting module 202 is specifically configured to:

Fig. 9 is a block diagram of an electronic device 300 according to an embodiment of the present disclosure, and as shown in fig. 9, the electronic device may include: a processor 301, a memory 302.

Optionally, a bus 303 may be further included, wherein the memory 302 is configured to store machine-readable instructions executable by the processor 301, when the electronic device 300 is operated, the processor 301 and the memory 302 store communication via the bus 303, and the machine-readable instructions are executed by the processor 301 to perform the method steps in the above method embodiments.

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program executes the method steps in the above-mentioned road disease detection method embodiment.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.

Claims

1. A road disease detection method is applied to a server, the server is in communication connection with a terminal device, and the method comprises the following steps:

receiving an image to be detected sent by terminal equipment;

2. The method for detecting the road disease according to claim 1, wherein the step of extracting a first target feature map, a second target feature map, a third target feature map and a fourth target feature map corresponding to the image to be detected from the target detection model and obtaining a prediction result of the image to be detected according to the first target feature map, the second target feature map, the third target feature map and the fourth target feature map comprises the steps of:

3. The method for detecting road diseases according to claim 2, wherein the step of inputting the image to be detected into a feature extraction sub-network of the target detection model to obtain a first intermediate feature map, a second intermediate feature map, a third intermediate feature map and a fourth intermediate feature map comprises the steps of:

4. The method for detecting a road disease according to claim 2, wherein the step of inputting the first intermediate feature map, the second intermediate feature map, the third intermediate feature map and the fourth intermediate feature map into a feature fusion sub-network of the target detection model to obtain the first target feature map, the second target feature map, the third target feature map and the fourth target feature map comprises:

5. The road disease detection method according to claim 3, wherein each feature extraction module comprises a first convolution module, a second convolution module, an attention mechanism module, a merging module and a third convolution module which are connected in sequence;

6. The method for detecting a road disease according to claim 2, wherein the step of inputting the first target feature map, the second target feature map, the third target feature map and the fourth target feature map into a detection sub-network of the target detection model to obtain a prediction result of the image to be detected comprises:

in the detection sub-network, fusing the first target feature map, the second target feature map, the third target feature map and the fourth target feature map to obtain a fused target feature map, wherein the fused target feature map comprises a plurality of detection frames, and the detection frames indicate the types of the road diseases and the position information of the road diseases;

and screening the detection frames according to the scores of the detection frames in the fused target feature map, and taking the detection frame with the highest score as a prediction result of the image to be detected.

7. The method for detecting a road disease according to any one of claims 1 to 6, further comprising:

acquiring a plurality of sample map data, and inputting the sample maps into an initial detection model;

8. A road disease detection device, characterized by, includes:

the extraction module is used for inputting the image to be detected into a target detection model obtained through pre-training, extracting a first target feature map, a second target feature map, a third target feature map and a fourth target feature map corresponding to the image to be detected from the target detection model, and obtaining a prediction result of the image to be detected according to the first target feature map, the second target feature map, the third target feature map and the fourth target feature map, wherein the prediction result is used for indicating disease information of a road corresponding to the image to be detected, the first target feature map, the second target feature map, the third target feature map and the fourth target feature map are respectively used for representing features of one granularity, and the granularities corresponding to the first target feature map, the second target feature map, the third target feature map and the fourth target feature map are successively coarser.

9. An electronic device, comprising a memory and a processor, wherein the memory stores a computer program executable by the processor, and the processor implements the steps of the road damage detection method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the road disease detection method according to any one of claims 1 to 7.