CN114913162A

CN114913162A - Bridge concrete crack detection method and device based on lightweight transform

Info

Publication number: CN114913162A
Application number: CN202210577574.XA
Authority: CN
Inventors: 许华杰; 苏国韶; 秦远卓; 刘鑫; 梁金福; 候攀; 江浩; 李仁杰
Original assignee: Guangxi University
Current assignee: Guangxi University
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2022-08-16

Abstract

The invention discloses a bridge concrete crack detection method and a device based on lightweight transducer, comprising the following steps: collecting a bridge concrete crack image by using an unmanned aerial vehicle, and returning collected data; preprocessing an image, converting the image into a fixed input format, and establishing a data set; constructing a lightweight Transformer network, training and testing an image to obtain an identification model, wherein the lightweight Transformer adopts a two-stage Patch transformation method, abandons part of an input sequence of the Transformer during training, reduces the over-fitting problem and enhances the diversity among patches; and deploying an identification model, and carrying out identification on the bridge surface image to be identified. According to the method, the calculation load of the bridge concrete crack detection method is reduced through the provided lightweight transform, so that the bridge concrete crack detection method can be deployed on a platform with limited calculation resources.

Description

Bridge concrete crack detection method and device based on lightweight transform

Technical Field

The invention belongs to the field of crack detection and the field of computer vision, and particularly relates to a bridge concrete crack detection method and device based on a lightweight transform.

Background

Cracks are one of the common surface damages of concrete, and the formation of cracks is usually related to factors such as natural disasters, daily long-term overload use, structural corrosion and the like. Cracks occurring on the surface of structures such as bridges can weaken the structure and reduce the load capacity, thereby affecting the service life and traffic safety. Once the surface structure collapses, significant loss of life and property results. The earlier crack detection and timely repair work can be performed, the earlier these safety problems can be eliminated.

The early crack detection is mainly recognized by human eyes, and the method is greatly influenced by external environment and has low efficiency and low precision. The application of optical and acoustic nondestructive detection methods and the updating and upgrading of various monitoring devices are carried out, and the precision and the speed of crack detection are continuously improved. The computer vision technology utilizes non-contact means such as a camera and the like to acquire images on the surface of the structure and carries out automatic analysis, has the advantages of high accuracy, high flexibility, low cost and the like, and is more and more widely applied to the field of crack detection. The crack detection method based on deep learning utilizes deep learning methods such as a convolutional neural network to construct a depth model suitable for crack detection, and well solves the problems that the recognition precision is low, the image quality is poor and the effect is affected.

The emerging Vision Transformer model gets the performance which is equal to or even surpasses the performance of the convolutional neural network by using a complete attention mechanism to abandon the convolutional architecture, however, the Vision Transformer is a data hunger-thirst model, needs a large enough data set to embody the excellent performance, causes high requirements on computing resources, and lacks some induction deviation inherent in the convolutional neural network, so that the Vision Transformer model cannot be well generalized when being trained under the condition of insufficient data, and limits the application capability of the Vision Transformer model in the crack detection field.

Disclosure of Invention

The invention aims to solve the problem that the existing bridge concrete crack detection method needs overlarge computing resources, and provides a bridge concrete crack detection method and device based on a lightweight transform.

In order to achieve the purpose, the invention provides the following technical scheme:

on the one hand, the bridge concrete crack detection method based on the lightweight transform is provided, and comprises the following steps:

step S1: collecting a bridge concrete crack image by using an unmanned aerial vehicle, and returning collected data; the unmanned aerial vehicle firstly carries out flight preparation, including weather detection, flight scheme design and pre-flight test, then takes off to execute an image acquisition task, and then returns to the home and checks acquired data; if the image quality is not qualified, the shooting area corresponding to the image needs to be compensated;

step S2: preprocessing an image, converting the image into a fixed input format, and establishing a data set; importing the picture shot by the unmanned aerial vehicle into a local computer, and carrying out processing such as picture marking, picture noise reduction and quality enhancement, geometric correction, auxiliary information extraction and the like; then selecting a certain number of images with cracks and without cracks to establish a data set of the bridge surface image;

step S3: constructing a lightweight Transformer network, and training and testing images to obtain an identification model; the network comprises a Patch-based segmentation part and a transform part with sequence pooling, wherein the Patch-based segmentation comprises a block, two-stage Patch transformation and linear projection, and the transform with sequence pooling comprises a transform encoder, a sequence pooling layer and a linear layer; the position between the two parts needs to be inserted for coding; the bridge surface image is required to be segmented into 8 × 8 patches before being input into a network, then two-stage Patch transformation is carried out, an input sequence of a transform encoder is formed after linear projection, wherein the embedding dimension of each Patch is 256; the Transformer encoder has 6 blocks, each block comprising a multi-head attention layer (4 heads) and a location-based feed-forward network (FFN), wherein the FFN has a first layer dimension of 512, a second layer dimension of 256, and an activation function of a GELU; the outputs of the multi-head attention layer and the feedforward network are respectively sent to an 'add and norm' layer for processing, and the layer comprises a residual error structure and layer normalization; the sequence pooling layer converges the output of the transform encoder and maps the sequence output to a single class index; the last linear layer is a two-layer perceptron, the output dimensionality of the first layer is 1024, the second layer is used for classification, the output dimensionality is 2 in category number and represents a crack-free image and a crack-containing image;

step S4: deploying an identification model, and carrying out identification on the surface image of the bridge to be identified; integrating the trained lightweight transform model into bridge concrete crack detection software; the user transmits the image to be identified to a local computer, and crack detection is carried out by using software; during detection, a user can upload a single picture or a plurality of pictures, the software calls the model to identify, and an identification result is returned to the software interface.

Optionally, in one possible implementation, the two-stage Patch transform method is as follows:

in the first stage, a certain number of patches are discarded at grid-shaped intervals among all 8-by-8 patches generated from the image using grid-shaped Patch dropping. Let Patches be a Patch matrix of m rows and n columns, and the formula of the grid-shaped Patch dropping is as follows:

Drop＝Reshape(Cycle(0,1,p))

Patches＝Patches*Drop

firstly, calculating a Drop matrix which is a binary matrix only containing 0 and 1, wherein 0 represents a patch to be discarded, 1 represents a patch to be reserved, a Cycle function is used for generating an alternating vector of 0 and 1, and p is m × n to represent the total number of the patches; then, taking a continuous vector with the length of p from the alternating matrix, and reconstructing the vector into a matrix with the same size as the Patches; multiplying the Patches with the Drop matrix to obtain a final reserved Patches matrix;

in the second stage, the patch discarded in the first stage is converted into a 4 × 4 thumbnail, and randomly mixed with the input patch kept, and the formula is as follows:

x′＝Mx ₁ +Φ(T(x ₂ ))

where x' is a new blend sample (Patch), M is a binary mask representing the region of the Patch that is clipped and preserved, and Φ represents a fill operation that generates an AND operation ₂ Patch, T (x) of the same size ₂ ) The operation is a thumbnail;

the two-stage Patch transform method discards part of the input sequence of the Transformer during training, forces the Transformer to use an incomplete sequence for training, does not use the method during testing, and uses the complete input sequence for calculation.

In another aspect, a lightweight Transformer-based bridge concrete crack detection device is provided, including:

the acquisition module is used for acquiring a bridge concrete crack image by using an unmanned aerial vehicle and returning acquired data;

and the data set module is used for preprocessing the image and converting the image into a fixed input format to establish a data set.

The training module is used for constructing a lightweight Transformer network, and training and testing the image to obtain an identification model;

and the evaluation module is used for deploying an identification model and carrying out identification on the surface image of the bridge to be identified.

Compared with the prior art, the invention has the beneficial effects that:

in the two-stage Patch transformation method used in model training, the transformation in the first stage reduces the problem of overfitting of a Transformer to a small data set, and improves the operating efficiency of the Transformer. In the transformation of the second stage, the patch deleted in the first stage is used as a thumbnail and randomly covered to the reserved patch, so that the diversity among the patches is enhanced, the study of the model on the fine cracks can be enhanced by the patch thumbnail with the cracks, the patch thumbnail without the cracks can be used as shielding, and the generalization capability of the model is improved. In addition, the lightweight Transformer reduces the calculation load of the bridge concrete crack detection method, so that the bridge concrete crack detection method can be deployed on a platform with limited calculation resources.

Drawings

FIG. 1 is a flow chart of a lightweight transform-based bridge concrete crack detection method;

fig. 2 is a flow chart of unmanned aerial vehicle data acquisition;

FIG. 3 is an architecture diagram of a lightweight Transformer;

FIG. 4 is a flow chart of a lightweight Transformer-based bridge concrete crack detection device;

FIG. 5 is an operational flow of bridge concrete crack detection software;

FIG. 6 is a main interface of bridge concrete crack detection software;

FIG. 7 is an identification interface of bridge concrete crack detection software;

FIG. 8 is a graph of the recognition results of the bridge concrete crack detection software.

Detailed Description

Technical solutions in embodiments of the present application will be described below in order to enable those skilled in the art to better understand the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The technical solution of the present invention will be described in detail with reference to the accompanying drawings and embodiments.

The invention provides a bridge concrete crack detection method based on lightweight transducer, which is shown as a flow chart in figure 1 and comprises the following specific processes:

step S1: collecting a bridge concrete crack image by using an unmanned aerial vehicle, and returning collected data;

the flowchart of step S1 is shown in fig. 2, and includes:

s101, unmanned aerial vehicle flight preparation;

and carrying out weather detection, flight scheme design and pre-flight test. Weather detection is for mastering the particular case of weather, observes illumination, air visibility etc. and prevents weather such as strong wind, cloudy day from to the influence that unmanned aerial vehicle caused. The design of the flight scheme comprises planning a flight route, a flight height, a flight speed, an overlapping degree and searching a proper flying starting point. Before formal flight, the test before flight is carried out, whether each function of the unmanned aerial vehicle normally operates is checked, whether an operation instruction can be completely responded is checked, and information such as weather conditions, lifting positions and the like of the day is recorded for future reference and analysis.

S102, carrying out flying by an unmanned aerial vehicle and collecting images;

the unmanned aerial vehicle needs 2 technical staff when flying, a pilot and an observer, the pilot is used for controlling the movement of the aircraft body, and the observer is used for observing the flight state of the unmanned aerial vehicle and acquiring data. During taking off, the pilot controls the airplane in real time through the remote controller, and the observer observes the state of the airplane through the parameters transmitted back by the airplane. According to the specific flight condition, the manual mode is switched at any time to adjust the state of the airplane, so that the flight safety is ensured. The unmanned aerial vehicle flies along the detected surface of the bridge, and image shooting of the bottom surface, the cylindrical surface, the cross beam and other structural surfaces of the bridge is completed.

S103, unmanned aerial vehicle return and data inspection;

the unmanned aerial vehicle lands to predetermined place after accomplishing the mission, should ensure to land the safety of ground point when descending, avoids having irrelevant personnel to be close to, if meet special circumstances, then need to replace the suitable safe place of selection to land. After the landing is finished, the image data is checked on line through application software (such as a mobile phone or a tablet personal computer) which is installed at the mobile terminal and is matched with the unmanned aerial vehicle, whether relevant information such as the image data in a camera carried by the unmanned aerial vehicle is complete is checked, and if the image quality is unqualified, the area shot by the unmanned aerial vehicle needs to be supplemented.

Step S2: preprocessing an image, converting the image into a fixed input format, and establishing a data set;

s201, exporting and processing the picture;

the storage card carried by the unmanned aerial vehicle is taken out and is placed into the card reader, and the photos shot by the unmanned aerial vehicle are guided into the local computer. Firstly, processing a shot picture, including picture marking (with cracks and without cracks), picture noise reduction and quality enhancement, geometric correction, auxiliary information extraction and the like. If the quality of the processed pictures is still low, such as the problems of picture blurring, exposure error and the like, the pictures can be deleted.

S202, constructing a data set;

and establishing a data set of the bridge surface image, selecting a certain number of pictures with cracks and without cracks, and keeping the two pictures in the same amount as much as possible. And then, screening the pictures with the high overlapping degree, and if the overlapping degree of the two pictures reaches 90%, considering that the similarity of the two pictures is high, and selecting one picture to add into the data set. And (3) converting all the two types of pictures into RGB images in PNG format, and respectively putting the RGB images into two folders to complete data set construction, wherein the number of the pictures in the data set is more than 500 in order to ensure the accuracy of the training model.

Step S3: constructing a lightweight Transformer network, and training and testing images to obtain an identification model;

s301, a lightweight Transformer network;

the structure of the lightweight neural network is shown in fig. 3, and comprises two parts, namely a Patch-based segmentation (Tokenization) and a transform with Sequence Pooling (Sequence Pooling), wherein the Patch-based segmentation comprises a partitioning (Patching), a two-stage Patch transform and a Linear Projection (Linear Projection & response), and the transform with Sequence Pooling comprises a transform encoder, a Sequence Pooling layer and a Linear layer (Linear layer); position Encoding (Positional Encoding) is needed between the two parts;

the patch is a patch that slices the input image into 8 × 8; the two-stage Patch transform is a method for the model training stage, which transforms the input Patch during training, discards part of the input information, lets the Transformer train with an incomplete input sequence, and uses the complete input sequence for computation during inference. The first stage of the transformation is a Patch level transformation, and a latticed Patch dropping is used, so that the Patches is a Patch matrix with m rows and n columns (usually, m is equal to n), and the latticed Patch dropping formula is as follows:

Drop＝Reshape(Cycle(0,1,p))

Patches＝Patches*Drop

first, a Drop matrix is computed, which is a binary matrix containing only 0 and 1, where 0 represents a patch to be discarded and 1 represents a patch to be retained. The Cycle function is used to generate alternating vectors of 0 and 1, where p ═ m × n represents the total number of dispatches, and then a continuous segment of vectors of length p is taken from the alternating matrix, and finally a matrix of the same size as dispatches is reconstructed. Multiplying Patches with the Drop matrix yields the remaining patch.

The second stage of transformation is internal to the patch, the patch discarded in the first stage is changed into a 4 x 4 thumbnail, and the thumbnail is randomly mixed with the rest input patches in pairs, and two samples x are set ₁ And x ₂ Using thumbnails

To replace x ₁ But without changing the label, h and w represent the thumbnail images T (x), respectively ₂ ) Is typically set to half the width and height of the original patch. The transformation formula is as follows:

x′＝Mx ₁ +Φ(T(x ₂ ))

wherein x' is after mixingM is a binary mask, representing the region where patch is clipped and reserved, Φ represents the fill operation, which generates an and x ₂ The same size patch. To obtain the binary mask M, the pair x is required ₁ The coordinate B of the bounding box of the upper clipping area is equal to (r) _x ,r _y ,r _w ,r _h ) Sampling is performed and then x is removed ₁ And in thumbnail view T (x) ₂ ) And (6) filling. The coordinates of the bounding box are uniformly sampled, as shown in the following equation:

r _x ～Uniform(0,W),r _w ＝w

r _y ～Uniform(0,H),r _h ＝h

where W and H are the width and height, respectively, of the original patch. This hybrid strategy enables the network to learn the same image at different scales.

After the two-stage Patch transform, the input sequence of the transform encoder is formed by linear projection, with an embedding dimension of 256 for each Patch.

The Transformer encoder has 6 blocks, each block includes a Multi-head Attention Layers (4 heads) and a position-based feed-forward neural network (FFN), the FFN has a first layer dimension of 512 and a second layer dimension of 256, and the activation function is a GELU. The outputs of the multi-head attention layer and the feedforward network are respectively sent to an 'add and norm' layer for processing, and the layer comprises a residual error structure and layer normalization; the sequence pooling layer converges the output of the transform encoder and maps the sequence output to a single class index; the last linear layer is a two-layer perceptron, the output dimensionality of the first layer is 1024, the second layer is used for classification, the output dimensionality is 2 in category number and represents a non-crack image and a crack image.

The Transformer encoder architecture includes two parts, a self-attention layer (multi-head attention mechanism) and a feedforward neural network. In the self-attention layer, the input sequence is first converted into three different vectors: query vector Q, key vector K and value vector V, with dimensions d _q 、d _k And d _v . The attention function between the input vectors is calculated as follows:

first, the fraction QK between each pair of input vectors is calculated ^T When encoding data for the current location, the score determines the degree of interest in other input data. Then multiplying the fraction by

Normalization is performed to enhance gradient stability. The score is then converted to a probability using the Softmax function. Finally, the probability is multiplied by the value vector, so that vectors with larger probability values will get additional attention. In addition, the self-attention layer lacks the ability to capture position information in the input sequence, and to solve this problem, additional position coding needs to be added to the original input embedding.

The self-attention layer used in the transform architecture is a multi-headed self-attention mechanism. The multi-head self-attention mechanism enables the ability to focus on multiple specific locations by giving the attention level different subspace representations, with vectors generated by multiple heads concatenated and mapped to the final output. In this mechanism, the input vector is mapped into three different sets of matrices

And

the process of the multi-head self-attention mechanism can be expressed as:

head _i ＝Attention(Q _i ,K _i ,V _i ),i＝1...h

MultiHead(Q′,K′,V′)＝Concat(head ₁ ,head ₂ ,...head _h )W ^O

wherein h is the number of attention heads, Q ', K ', V ' are respectively

And

the series-connection of (a) and (b),

a linear transformation matrix is represented (d is the embedding dimension). Similar to the sparse connection of convolution, multi-head attention separates the input into h independent attention heads with d/h dimensional vectors and integrates the features of each head, enriching the diversity of feature subspaces without additional computational cost.

The output from the attention layer is then fed into a feed forward neural network (FFN) consisting of two linear conversion layers and a nonlinear activation function, which can be expressed as the following function:

FFN(x)＝W ₂ σ(W ₁ x)

wherein W ₁ And W ₂ Is two parameter matrices for two linear transformation layers, σ represents the nonlinear activation function GELU. This position-based feed-forward layer can be viewed as a point-by-point convolution that transforms the representation of all positions in the input sequence using the same multi-layer perceptron.

The Transformer encoder is followed by a sequence pooling layer. Sequence pooling is a method proposed in the literature "nesting the Big Data partner with Compact transforms" to pool the transform backbone outputs throughout the sequence and map the sequence outputs to a single class index. The Vision Transformer typically adds an additional learnable [ class ] label to the sequence embedded in patch, representing the classification parameters of the entire image, including the potential information for classification, whose output after the Transformer encoder is available for classification. Sequence pooling increases the flexibility of the model and enables the inclusion of information about different parts of the input image, thereby eliminating the mandatory need for a class label and making the model compact. This operation can be seen as a mapping transformation:

given a

Wherein x _L Or f (x) ₀ ) Is the output of the L layer transform encoder, b is the mini-batch size, n is the length of the input sequence, and d is the embedding dimension. x is the number of _L Inputting a linear layer

And applying the Softmax function:

then, calculating:

obtained by converging a second dimension

This output can then be fed into a linear classifier for classification.

The last linear layer is a two-layer perceptron (MLP), the first layer is 1024, the second layer is used for classification, the number of output dimensions is 2, which represents no crack and crack.

S302, training and testing;

pre-training was first performed on the ImageNet-1K dataset with an input picture resolution of 224 x 224. The pre-trained model is then used to fine tune the data set obtained in step S2.

The data set is as follows 8: the scale of 2 is divided into a training set and a test set. The model parameters were set as follows: the batch _ size is set to 32, the learning rate lr is set to 5e-4, the minimum learning rate min _ lr is set to 0.00001, the learning rate attenuation mode sched is set to cosine, the warm-up learning rate warmup _ lr is set to 1e-6, the number of warm-up learning rate iterations warmup _ epochs is set to 10, the weight attenuation coefficient weight _ decay is set to 5e-2, and the number of iterations is set to 100. During model training, the training set image is subjected to random left-right turning, random noise addition, random brightness adjustment, random contrast adjustment and normalization processing, and the test set image is only subjected to normalization processing.

Training by using a PyTorch deep learning frame, inputting images of a training set into a lightweight transform network, randomly sampling 32 training samples to form a batch in an iterative optimization process, optimizing the network by combining a back propagation algorithm and updating network parameters; and after the training is finished, storing the model with the highest accuracy on the test set so as to obtain a crack detection model.

Step S4: deploying an identification model, and carrying out identification on the surface image of the bridge to be identified;

and integrating the trained model into bridge concrete crack detection software. This example uses an Electron open source framework to build PC-end bridge concrete crack detection software. The Electron is a cross-platform desktop application framework based on Chrominum and node. js, desktop applications can be developed by using Web technologies such as HTML, CSS and JavaScript, and the applications developed on the framework can run on Windows, macOS and Linux operating systems. When the Electron combines a chromosome v8 engine and a Node to run, a front-end and a back-end separated framework is adopted, the front end is developed according to a Web mode, and the back end carries out interprocess communication. And placing the model file and the written program for calling the model in the same file directory in an Electron project to finish deployment.

And the user transmits the image to be identified to a local computer, and crack detection is carried out by using bridge concrete crack detection software. During detection, a user can upload a single picture or a plurality of pictures, the software calls the model to identify, and an identification result is returned to the software interface.

The invention provides a bridge concrete crack detection device based on lightweight transform, as shown in fig. 4, comprising:

The training module is used for constructing a lightweight Transformer network, and training and testing the images to obtain an identification model;

Further, the acquisition module comprises: unmanned aerial vehicle flight preparation, including weather detection, flight scheme design and pre-flight test; the unmanned aerial vehicle carries out flying and acquires images; unmanned aerial vehicle returns a journey and data inspection.

Weather detection is for mastering the particular case of weather, observes illumination, air visibility etc. and prevents weather such as strong wind, cloudy day from to the influence that unmanned aerial vehicle caused. The design of the flight scheme comprises planning a flight route, a flight height, a flight speed, an overlapping degree and searching a proper flying starting point. Before formal flight, the test before flight is carried out, whether each function of the unmanned aerial vehicle can normally run or not and whether an operation instruction can be completed or not are checked, information such as the wind speed, the weather and the take-off and landing coordinates of the day is recorded, and data are stored for future reference and analysis.

The unmanned aerial vehicle needs 2 technical staff when flying, a pilot and an observer, the pilot is used for controlling the movement of the aircraft body, and the observer is used for observing the flight state of the unmanned aerial vehicle and acquiring data. Unmanned aerial vehicle is followed the bridge and is detected the face flight, accomplishes the image shooting of structural plane such as bridge bottom surface, cylinder and crossbeam through controlgear such as remote controller. The unmanned aerial vehicle lands to predetermined place after accomplishing the mission, should ensure to land the safety of ground point when descending, avoids having irrelevant personnel to be close to. And after landing is finished, whether relevant information such as image data in the unmanned aerial vehicle carrying camera is complete or not is checked, and if the image quality is not qualified, the area shot by the unmanned aerial vehicle carrying camera needs to be subjected to flying compensation.

Further, the data set module comprises 1) picture exporting and processing, wherein a storage card carried by the unmanned aerial vehicle is taken out and placed into a card reader, and a picture shot by the unmanned aerial vehicle is imported into a local computer; processing the shot picture, including picture marking (with cracks and without cracks), picture noise reduction and quality enhancement, geometric correction, auxiliary information extraction and the like; if the quality of the processed pictures is still low, such as the problems of picture blurring, exposure error and the like, the pictures can be deleted. 2) And (3) constructing a data set, establishing the data set of the bridge surface image, selecting a certain number of pictures with cracks and pictures without cracks, and keeping the two pictures in an equal amount as much as possible. And then, screening the pictures with the high overlapping degree, and if the overlapping degree of the two pictures reaches 90%, considering that the similarity of the two pictures is high, and selecting one picture to add into the data set. And (3) converting all the two types of pictures into RGB images in PNG format, and respectively putting the RGB images into two folders to complete data set construction, wherein the number of the pictures in the data set is more than 500 in order to ensure the accuracy of the training model.

Further, the training module comprises a lightweight Transformer network and training and testing thereof, wherein the lightweight Transformer comprises two parts of a Patch-based segmentation and a Transformer with sequence pooling, the Patch-based segmentation comprises blocking, two-stage Patch transform and linear projection, and the Transformer with sequence pooling comprises a Transformer encoder, a sequence pooling layer and a linear layer; the position between the two parts needs to be inserted for coding; the two-stage Patch transform method discards part of the input sequence of the Transformer during training, forces the Transformer to use an incomplete sequence for training, does not use the method during testing, and uses the complete input sequence for calculation.

Further, the evaluation module comprises a deployment unit for deploying the lightweight transform model to bridge concrete crack detection software; and the identification unit is used for detecting the crack of the bridge concrete.

The front end of the bridge concrete crack detection software is responsible for file selection, identification and result return, and the rear end of the bridge concrete crack detection software is responsible for main process and communication. The front end was developed using the desktop end component library Element Plus based on Vue 3. The file selection function is used for selecting the image to be identified on the local computer, a user can select a single file or a folder, the single file is used for identifying a single bridge surface image, the folder is used for identifying a plurality of bridge surface images, and a plurality of images are required to be placed under the path of the folder; and common image formats such as jpg, png, bmp, tiff and the like are supported. The recognition function utilizes a child _ process module of the Electron to create a sub-process, then an external executable program is run in the sub-process, and then a recognition model is called; taking a Windows system as an example, the python file is packaged to generate an exe program, and when the child _ process is used, the cmd execution command of the Windows system can be called. The result returning function is used for capturing and displaying the result output by the identification unit; and capturing the recognition result output by the subprocess unit by using a JavaScript regular expression mode, and listing the recognition result in a table form in an interface after analyzing the recognition result. The back end is responsible for controlling the life cycle of the crack detection program, creating and managing a software window, and controlling desktop function modules such as a menu and a dialog box; providing access to various menus and closing logic.

The deployment unit further comprises a calling program, the calling program is used for calling the lightweight transform model by bridge concrete crack detection software, receiving path parameters (user input of the file selection unit, a path of a single file or a folder), positioning an image to be identified according to the path parameters, calling the identification model to identify the image, obtaining an identification result (including whether the crack is the crack or not and identification accuracy), and outputting the identification result in a character string mode. And packaging the calling program through a third party python library PyInsteller to generate an executable program. The PyInstaller can pack the python source file into an independent executable program, so that the python source file can be independently executed in an environment without installing python, and can also be used as an independent file to facilitate transmission and management. After the executable program is generated, the program and the model file are placed in the same file directory in an Electron project, and deployment is completed. And then packaging the software by using an electron-builder module, wherein the generated installation program can be installed on computers of other users.

A user installs the generated installation program on a local computer needing crack detection, transmits the bridge surface image to be identified to the computer, and uses the software to detect cracks, wherein the operation flow of the software is shown in FIG. 5;

main interface As shown in FIG. 6, click the "crack identification" button in the main interface, select the file (single image inspection) or folder (batch image inspection) to be inspected. And entering a recognition interface after the selection is finished, clicking a 'start detection' button as shown in fig. 7, calling the trained crack detection model by software, and inputting the image selected by the user into the model for detection. After the user waits for the system to finish detection, the software interface returns the file names, detection results and detection probabilities of all selected images, as shown in fig. 8; for each image table, two columns of "filename", "result and probability" are listed: "filename" refers to the filename of the uploaded image; the result and probability refer to the detection result of whether cracks exist or not and the confidence coefficient of the detection result, and are limited to crack existence and crack free;

for batch crack detection, the software generates a txt file list of cracked images, including the name of the image file identified as a crack. the txt file can be generated according to a probability threshold, and the image path is added into the txt file when the probability of the model for identifying the crack is larger than 50% by default, namely the probability of the model for identifying the crack is larger than 50%. Additionally, a probability threshold of 50%, 60%, 70%, 80%, 90%, or a custom threshold may be selected. The generated txt file can be used for subsequent detection.

The user transmits the image to be identified to a local computer, and crack detection is carried out by using software; during detection, a user can upload a single picture or a plurality of pictures, the software calls the model to identify, and an identification result is returned to the software interface.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the spirit of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A bridge concrete crack detection method based on a lightweight transform is characterized by comprising the following steps:

step S3: constructing a lightweight Transformer network, and training and testing images to obtain an identification model; the network comprises a Patch-based segmentation part and a transform part with sequence pooling, wherein the Patch-based segmentation part comprises a block, two-stage Patch transformation and linear projection, and the transform part with sequence pooling comprises a transform encoder, a sequence pooling layer and a linear layer; the position between the two parts needs to be inserted for coding; the bridge surface image is required to be segmented into 8 × 8 patches before being input into a network, then two-stage Patch transformation is carried out, an input sequence of a transform encoder is formed after linear projection, wherein the embedding dimension of each Patch is 256; the Transformer encoder has 6 blocks, each block comprising a multi-head attention layer (4 heads) and a location-based feed-forward network (FFN), wherein the FFN has a first layer dimension of 512, a second layer dimension of 256, and an activation function of a GELU; the outputs of the multi-head attention layer and the feedforward network are respectively sent to an 'add and norm' layer for processing, and the layer comprises a residual error structure and layer normalization; the sequence pooling layer converges the output of the transform encoder and maps the sequence output to a single class index; the last linear layer is a two-layer perceptron, the output dimensionality of the first layer is 1024, the second layer is used for classification, the output dimensionality is 2 in category number and represents a crack-free image and a crack-containing image;

2. The bridge concrete crack detection method of claim 1, wherein the two-stage Patch transform method in step S3 is as follows:

in the first stage, a grid-like Patch dropping is used, and a certain number of patches are discarded at intervals of grid shape from all 8 × 8 patches generated in the image generation, and the result isPatchesIs onemLine ofnThe formula of the grid-like Patch dropping of the Patch matrix of columns is as follows:

Drop = Reshape(Cycle(0, 1, p))

Patches = Patches * Drop

first of all, calculateDropA matrix, which is a binary matrix containing only 0 and 1, 0 representing a patch to be discarded, 1 representing a patch to be retained, a Cycle function for generating an alternating vector of 0 and 1,p=m*nrepresents the total number of patches; then take a continuous length from the alternating matrix aspVector of (2), reconstructed sumPatchesMatrices of the same size; will be provided withPatchesAndDropmatrix multiplication to obtain final reservedPatchesA matrix;

in the second stage, the patch discarded in the first stage is converted into a 4 × 4 thumbnail, and is randomly mixed with the input patch which is kept, and the formula is as follows:

x' = Mx ₁ + Φ(T(x ₂ ))

whereinx'Is a new blent sample (Patch),Mis a binary mask, representing the region where the patch is clipped and reserved, phi represents a fill operation, which generates an ANDx ₂ Patch, T: (x ₂ ) The operation is a thumbnail;

3. The utility model provides a bridge concrete crack detection device based on lightweight transform which characterized in that includes following module:

the data set module is used for preprocessing the image and converting the image into a fixed input format to establish a data set;

the lightweight transform network comprises a Patch-based segmentation and a transform with sequence pooling, wherein the Patch-based segmentation comprises blocking, two-stage Patch transformation and linear projection, and the transform with sequence pooling comprises a transform encoder, a sequence pooling layer and a linear layer; the position between the two parts needs to be inserted for coding;

the two-stage Patch conversion method comprises the following steps:

in the first stage, a grid-shaped Patch dropping method is used, and a certain number of patches are discarded from all 8 × 8 patches generated by the image according to grid shape intervals;

in the second stage, the patch discarded in the first stage is converted into a 4 x 4 thumbnail, and the thumbnail is randomly mixed with the input patch reserved in the first stage;

the two-stage Patch transform method discards part of the input sequence of the Transformer during training, forces the Transformer to use an incomplete sequence for training, and does not use the method during testing and uses the complete input sequence for calculation;

the evaluation module is used for deploying an identification model and developing identification on the surface image of the bridge to be identified, and comprises the following steps: a deployment unit for deployment of a lightweight transform model; and the recognition unit is used for detecting the bridge concrete cracks, calling software to detect the cracks of the image to be recognized transmitted to the local computer by the user, and returning the recognition result to the software interface.