CN109408985A

CN109408985A - The accurate recognition methods in bridge steel structure crack based on computer vision

Info

Publication number: CN109408985A
Application number: CN201811295283.1A
Authority: CN
Inventors: 李顺龙; 郭亚朋; 徐阳; 李惠
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2018-11-01
Filing date: 2018-11-01
Publication date: 2019-03-01

Abstract

The present invention relates to a kind of accurate recognition methods in bridge steel structure crack based on computer vision, solve the disadvantage that existing bridge steel box girder crack identification the degree of automation is too low and corresponding intelligent algorithm, method includes: to use steel box-girder common site crack image as initial data, the artificial mark of pixel scale is carried out, obtaining indicates different classes of mark image using different colours；Original image and mark image are split, distribution different digital is different classes of to do label label after mark image is extracted according to different colours；Training set is input in the full convolutional neural networks of depth and is trained, will be input in trained neural network after image to be identified cutting, carry out the Pixel-level recognition result that splicing returns to original image size after obtaining Pixel-level recognition result.The present invention is convenient, accurate, improves the accuracy and stability of steel box-girder crack identification result.

Description

The accurate recognition methods in bridge steel structure crack based on computer vision

Technical field

The present invention relates to science of bridge building health monitoring and detection care fields, and in particular to one kind is based on computer vision The accurate recognition methods in bridge steel structure crack.

Background technique

Bridge is the throat in communications and transportation main artery.In recent years, bridge construction steps into climax, more and more to build up bridge Beam is on active service, to improve China's traffic transportation efficiency, the economic rapid growth in China being pulled to play considerable effect.Steel box-girder As the chief component of bridge especially long-span bridge, had been widely used in China and in the world.So And since the reciprocation cycle of vehicular load acts on, the U-shaped rib of steel box-girder and top plate junction are easier to generate minute crack, this Carrying out for a little cracks can generate huge adverse effect to the normal service of steel box-girder, therefore the detection in steel box-girder crack becomes The hot issue of scientific circles and industrial circle research in recent years.Recently as the development of Bridge Health Monitoring Technology, researcher A variety of non destructive detection mothods based on vibration are developed to be used to detect steel box girder surface and internal damage.However, These methods are haveed the shortcomings that some inevitable: first is that sphere of action is too small, since these methods are by environment and structure sheet Body vibration influence is excessive, therefore can only be detected in smaller range；Second is that required hardware device is excessively complicated, cost It is excessively high and not strong for environmental robustness.

Now with the burning hot development of computer vision, using computer vision technique carry out steel box-girder disease detection and The considerations of identification has entered researcher range.However, most of crack detection method based on computer vision is all Based on image classification principle, it can not be accurate to pixel scale, there is huge hidden danger for subsequent crack quantitative judge for this. Therefore, how it to be directed to steel box-girder damageable zone original image, the crack identification method of an effective robust is proposed, by steel box-girder The precision improvement of crack identification is a urgently open question to Pixel-level.

Summary of the invention

The purpose of the present invention is to solve existing bridge steel box girder crack identification the degree of automation is too low and corresponding The shortcomings that intelligent algorithm, proposes a kind of accurate recognition methods in bridge steel structure crack based on computer vision, including walks as follows It is rapid:

Step 1: using common steel box-girder common site crack image as initial data, the artificial of pixel scale is carried out Mark, obtaining indicates different classes of Pixel-level mark image using different colours；

Step 2: original image and mark image are split according to certain size, to reduce calculating cost, will mark Distribution different digital is different classes of to do label label after image is extracted according to different colours；

It is trained Step 3: training set is input in the full convolutional neural networks of depth, is damaged used in training process Mistake function is Dice related coefficient loss function, and optimization algorithm is adaptive moments estimation optimization algorithm；

Step 4: by trained neural network is input to after bridge steel box girder region of interest image to be identified cutting In, the Pixel-level recognition result that splicing returns to original image size is carried out after obtaining Pixel-level recognition result.

The present invention also has following technical characteristic:

1, step 1 as described above specifically includes that steps are as follows:

One, using steel box-girder crack disease image data base, more than 4000 × 3000 pixel of picture size；

Two, different types of mark is carried out using pixel annotation tool to original image, covered in different colors not of the same race Class region；

Three, original image and mark image are subjected to flip horizontal, flip vertical simultaneously, or original image is carried out Tri- channels BGR apply 10% interference, the original image and its corresponding mark image after respectively obtaining overturning or interference.

2, in step 3 as described above each layer of the full convolutional neural networks of depth structure are as follows:

L1-0 layers: the width of input is 512, depth 3；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 32, step pitch 1, zero padding 1；

L1-1 layers: the width of input is 512, depth 32；Execute activation layer operation；

L1-2 layers: the width of input is 512, depth 32, executing rule layer operation；

L1-3 layers: the width of input is 512, depth 32；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 32, step pitch 1, zero padding 1；

L1-4 layers: the width of input is 512, depth 32；Execute activation layer operation；

L1-5 layers: the width of input is 512, depth 32, executing rule layer operation；

L1-6 layers: the width of input is 512, depth 32；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 32, step pitch 1, zero padding 1；

L1-7 layers: the width of input is 512, depth 32；Execute activation layer operation；

L1-8 layers: the width of input is 512, depth 32, executing rule layer operation；

L1-9 layers: the width of input is 512, and depth 32 executes pond layer operation；

L2-0 layers: the width of input is 256, depth 32；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 32, step pitch 1, zero padding 1；

L2-1 layers: the width of input is 256, depth 32；Execute activation layer operation；

L2-2 layers: the width of input is 256, depth 32, executing rule layer operation；

L2-3 layers: the width of input is 256, depth 32；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 64, step pitch 1, zero padding 1；

L2-4 layers: the width of input is 256, depth 64；Execute activation layer operation；

L2-5 layers: the width of input is 256, depth 64, executing rule layer operation；

L2-6 layers: the width of input is 256, depth 64；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 64, step pitch 1, zero padding 1；

L2-7 layers: the width of input is 256, depth 64；Execute activation layer operation；

L2-8 layers: the width of input is 256, depth 64, executing rule layer operation；

L2-9 layers: the width of input is 256, and depth 64 executes pond layer operation；

L3-0 layers: the width of input is 128, depth 64；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 64, step pitch 1, zero padding 1；

L3-1 layers: the width of input is 128, depth 64；Execute activation layer operation；

L3-2 layers: the width of input is 128, depth 64, executing rule layer operation；

L3-3 layers: the width of input is 128, depth 64；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 128, step pitch 1, zero padding 1；

L3-4 layers: the width of input is 128, depth 128；Execute activation layer operation；

L3-5 layers: the width of input is 128, depth 128, executing rule layer operation；

L3-6 layers: the width of input is 128, depth 128；Convolution layer operation is executed, the width of convolution layer operation is 3, Quantity is 128, step pitch 1, zero padding 1；

L3-7 layers: the width of input is 128, depth 128；Execute activation layer operation；

L3-8 layers: the width of input is 128, depth 128, executing rule layer operation；

L3-9 layers: the width of input is 128, and depth 128 executes pond layer operation；

L4-0 layers: the width of input is 64, depth 128；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 128, step pitch 1, zero padding 1；

L4-1 layers: the width of input is 64, depth 128；Execute activation layer operation；

L4-2 layers: the width of input is 64, depth 128, executing rule layer operation；

L4-3 layers: the width of input is 64, depth 128；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 256, step pitch 1, zero padding 1；

L4-4 layers: the width of input is 64, depth 256；Execute activation layer operation；

L4-5 layers: the width of input is 64, depth 256, executing rule layer operation；

L4-6 layers: the width of input is 64, depth 256；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 256, step pitch 1, zero padding 1；

L4-7 layers: the width of input is 64, depth 256；Execute activation layer operation；

L4-8 layers: the width of input is 64, depth 256, executing rule layer operation；

L4-9 layers: the width of input is 64, and depth 256 executes pond layer operation；

L5-0 layers: the width of input is 32, depth 256；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 256, step pitch 1, zero padding 1；

L5-1 layers: the width of input is 32, depth 256；Execute activation layer operation；

L5-2 layers: the width of input is 32, depth 256, executing rule layer operation；

L5-3 layers: the width of input is 32, depth 256；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 512, step pitch 1, zero padding 1；

L5-4 layers: the width of input is 32, depth 512；Execute activation layer operation；

L5-5 layers: the width of input is 32, depth 512, executing rule layer operation；

L5-6 layers: the width of input is 32, depth 512；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 512, step pitch 1, zero padding 1；

L5-7 layers: the width of input is 32, depth 512；Execute activation layer operation；

L5-8 layers: the width of input is 32, depth 512, executing rule layer operation；

L5-9 layers: the width of input is 32, and depth 512 executes pond layer operation；

L6-0 layers: the width of input is 16, depth 512；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 1024, step pitch 1, zero padding 1；

L6-1 layers: the width of input is 16, depth 1024；Execute activation layer operation；

L6-2 layers: the width of input is 16, depth 1024, executing rule layer operation；

L6-3 layers: the width of input is 16, depth 1024；Convolution layer operation is executed, the width of convolution layer operation is 3, Quantity is 1024, step pitch 1, zero padding 1；

L6-4 layers: the width of input is 16, depth 1024；Execute activation layer operation；

L6-5 layers: the width of input is 16, depth 512, executing rule layer operation；

L6-6 layers: the width of input is 16, depth 1024；Convolution layer operation is executed, the width of convolution layer operation is 3, Quantity is 1024, step pitch 1, zero padding 1；

L6-7 layers: the width of input is 16, depth 1024；Execute activation layer operation；

L6-8 layers: the width of input is 16, depth 1024, executing rule layer operation；

L7-0 layers: the width of input is 32, depth 1536；Deconvolution layer operation is executed, the width of convolution layer operation is 3, quantity 1536, step pitch 1, zero padding 1；

L7-1 layers: the width of input is 32, depth 1536；Execute activation layer operation；

L7-2 layers: the width of input is 32, depth 1536, executing rule layer operation；

L7-3 layers: the width of input is 32, depth 1536；Convolution layer operation is executed, the width of convolution layer operation is 3, Quantity is 512, step pitch 1, zero padding 1；

L7-4 layers: the width of input is 32, depth 512；Execute activation layer operation；

L7-5 layers: the width of input is 32, depth 512, executing rule layer operation；

L7-6 layers: the width of input is 32, depth 3；Convolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 512, step pitch 1, zero padding 1；

L7-7 layers: the width of input is 32, depth 512；Execute activation layer operation；

L7-8 layers: the width of input is 32, depth 512, executing rule layer operation；

L8-0 layers: the width of input is 64, depth 768；Deconvolution layer operation is executed, the width of convolution layer operation is 3, Quantity is 768, step pitch 1, zero padding 1；

L8-1 layers: the width of input is 64, depth 768；Execute activation layer operation；

L8-2 layers: the width of input is 64, depth 768, executing rule layer operation；

L8-3 layers: the width of input is 64, depth 768；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 256, step pitch 1, zero padding 1；

L8-4 layers: the width of input is 64, depth 256；Execute activation layer operation；

L8-5 layers: the width of input is 64, depth 256, executing rule layer operation；

L8-6 layers: the width of input is 64, depth 256；Convolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 256, step pitch 1, zero padding 1；

L8-7 layers: the width of input is 64, depth 256；Execute activation layer operation；

L8-8 layers: the width of input is 64, depth 256, executing rule layer operation；

L9-0 layers: the width of input is 128, depth 384；Deconvolution layer operation is executed, the width of convolution layer operation is 3, quantity 384, step pitch 1, zero padding 1；

L9-1 layers: the width of input is 128, depth 384；Execute activation layer operation；

L9-2 layers: the width of input is 128, depth 384, executing rule layer operation；

L9-3 layers: the width of input is 128, depth 384；Convolution layer operation is executed, the width of convolution layer operation is 3, Quantity is 128, step pitch 1, zero padding 1；

L9-4 layers: the width of input is 128, depth 128；Execute activation layer operation；

L9-5 layers: the width of input is 128, depth 128, executing rule layer operation；

L9-6 layers: the width of input is 128, depth 128；Convolution layer operation is executed, the width of convolution layer operation is 3, Quantity is 128, step pitch 1, zero padding 1；

L9-7 layers: the width of input is 128, depth 128；Execute activation layer operation；

L9-8 layers: the width of input is 128, depth 128, executing rule layer operation；

L10-0 layers: the width of input is 256, depth 192；Deconvolution layer operation is executed, the width of convolution layer operation is 3, quantity 192, step pitch 1, zero padding 1；

L10-1 layers: the width of input is 256, depth 192；Execute activation layer operation；

L10-2 layers: the width of input is 256, depth 192, executing rule layer operation；

L10-3 layers: the width of input is 256, depth 192；Convolution layer operation is executed, the width of convolution layer operation is 3, Quantity is 64, step pitch 1, zero padding 1；

L10-4 layers: the width of input is 256, depth 64；Execute activation layer operation；

L10-5 layers: the width of input is 256, depth 64, executing rule layer operation；

L10-6 layers: the width of input is 256, depth 64；Convolution layer operation is executed, the width of convolution layer operation is 3, Quantity is 64, step pitch 1, zero padding 1；

L10-7 layers: the width of input is 256, depth 64；Execute activation layer operation；

L10-8 layers: the width of input is 256, depth 64, executing rule layer operation；

L11-0 layers: the width of input is 512, depth 96；Deconvolution layer operation is executed, the width of convolution layer operation is 3, quantity 96, step pitch 1, zero padding 1；

L11-1 layers: the width of input is 512, depth 96；Execute activation layer operation；

L11-2 layers: the width of input is 512, depth 96, executing rule layer operation；

L11-3 layers: the width of input is 512, depth 96；Convolution layer operation is executed, the width of convolution layer operation is 3, Quantity is 32, step pitch 1, zero padding 1；

L11-4 layers: the width of input is 512, depth 32；Execute activation layer operation；

L11-5 layers: the width of input is 512, depth 32, executing rule layer operation；

L11-6 layers: the width of input is 512, depth 32；Convolution layer operation is executed, the width of convolution layer operation is 3, Quantity is 32, step pitch 1, zero padding 1；

L11-7 layers: the width of input is 512, depth 32；Execute activation layer operation；

L11-8 layers: the width of input is 512, depth 32, executing rule layer operation；

L12-0 layers: the width of input is 512, depth 32；Convolution layer operation is executed, the width of convolution layer operation is 3, Quantity is 4, step pitch 1, zero padding 1；

L12-1 layers: the width of input is 512, depth 4；Execute activation layer operation；

L13 layers: the width of input is 512, depth 4；Execute loss layer operation.

3, Dice related coefficient loss function in step 3 as described above specifically:

Y in formula_iWithIt is the true value and predicted value of label respectively, M is the number of pixels of input picture, smooth=1.

4, adaptive moments estimation optimization algorithm in step 3 as described above specifically:

G in formula_tFor the gradient of t step, m_tFor the gradient first moment of t step, v_tFor the gradient second moment of t step, β₁It is one Rank momentum attenuation coefficient, β₂For second order momentum attenuation coefficient, ε is numerical stability amount, and η is learning rate, θ_tIndicate t step to excellent Change parameter.

5, in step 4 as described above, images to be recognized is cut into the size cut in step 2, input trains Neural network in, respectively identify after spliced.

Beneficial effects of the present invention and advantage are as follows: the present invention is realized for the steel case comprising complex background interference information The overall process that beam crack precisely identifies that neural network model training, the crack identification of original image size and result are shown is automatic Change processing.The present invention is convenient, accurate, improves the accuracy and stability of steel box-girder crack identification result.Entire positioning and knowledge Other process is automatic processing, significantly reduces the artificial participation in detection process.The present invention is also able to satisfy steel box-girder and splits It is sewn on line monitoring and warning and real time data processing demand, i.e., updates without training set, directly acquired image is known Not, as a result output delay can be down to ten seconds or less.The present invention improves the automation of bridge steel box girder crack intelligent recognition, intelligence Change, accuracy and robustness, the intelligent recognition for steel box-girder crack in science of bridge building provide solution.

Detailed description of the invention

Fig. 1 is flow diagram of the invention；

Fig. 2 is the structural schematic diagram of the full convolutional neural networks of depth of step 3 of the present invention；

Fig. 3 is the original image in the embodiment；

Fig. 4 is the mark image (i.e. true value) in the embodiment；

Fig. 5 is the intelligent recognition result figure of proposition method of the present invention.

Specific embodiment

Below according to Figure of description citing, the present invention will be further described:

Embodiment 1

As shown in Figure 1, a kind of accurate recognition methods in bridge steel structure crack based on computer vision, including walk as follows Rapid: step 1: using common steel box-girder common site crack image as initial data, carry out the artificial mark of pixel scale, Obtaining indicates different classes of Pixel-level mark image using different colours, specific as follows:

(1), using common steel box-girder crack disease image data base, picture size with 4000 × 3000 pixels more than；

(2), different types of mark is carried out using pixel annotation tool to original image, covering is different in different colors Category area；

(3), original image and mark image are subjected to flip horizontal, flip vertical etc. simultaneously, or by original image into Tri- channels row BGR apply 10% interference etc., the original image and its corresponding mark figure after respectively obtaining overturning or interference Picture.

Step 2: original image and mark image are split according to certain size, calculate cost and adaptation to reduce Full convolutional network input requirements, distribution different digital marks inhomogeneity to make label after mark image is extracted according to different colours Not, and in a manner of array it is stored；

Step 3: training set being input in the full convolutional neural networks of depth and is trained, and is damaged used in training process Mistake function is Dice related coefficient loss function, Dice related coefficient loss function specifically:

Y in formula_iWithIt is the true value and predicted value of label respectively, M is the number of pixels of input picture, and smooth is anti- Only denominator is set, generally 1 for 0, and optimization algorithm is adaptive moments estimation optimization algorithm, and adaptive moments estimation optimization algorithm is specific Are as follows:

G in formula_tFor the gradient of t step, m_tFor the gradient first moment of t step, v_tFor the gradient second moment of t step, β₁It is one Rank momentum attenuation coefficient, β₂For second order momentum attenuation coefficient, ε is numerical stability amount, and η is learning rate, θ_tIndicate t step to excellent Change parameter；

The wherein structure of each layer of the full convolutional neural networks of depth are as follows:

L13 layers: the width of input is 512, depth 4；Execute loss layer operation；

Step 4: after bridge steel box girder region of interest image to be identified is cut into the size cut in step 2, it is defeated Enter into trained neural network, carries out the Pixel-level knowledge that splicing returns to original image size after obtaining Pixel-level recognition result Other result.

Embodiment 2

A kind of Pixel-level accurate intelligent recognition methods in bridge steel box girder crack based on computer vision, as shown in Figure 1, packet It includes:

Step 1: using common steel box-girder common site crack image as initial data, the artificial of pixel scale is carried out Mark, obtaining indicates different classes of Pixel-level mark image using different colours, in which:

(1), the steel box-girder crack database that the present embodiment uses is APESS2018Steel Girder Crack ID Dataset, (https: //github.com/dawnnao/APESS2018_Steel_Girder_Crack_ID_datas et).

(2), original image and mark image are subjected to flip horizontal, flip vertical etc. simultaneously, or by original image into Tri- channels row BGR apply 10% interference etc., the original image and its corresponding mark figure after respectively obtaining overturning or interference Picture.

In one embodiment, the size of segmentation be 512 × 512 pixels, to adapt to network inputs, can be used MATLAB or Person Python carries out color extraction and distributes number；

It is trained using the full convolutional network of depth, wherein optimization algorithm is using the initiation parameter defaulted；

Step 4: by trained neural network is input to after bridge steel box girder region of interest image to be identified cutting In, the Pixel-level recognition result that splicing returns to original image size is carried out after obtaining Pixel-level recognition result, in which:

(1), image to be identified is cut, in the present embodiment, cut lengths are 512 × 512 pixels；

(2), the subgraph of cutting is input in trained neural network model, obtains the classification feelings of each pixel Condition according to classification and matching corresponding color and is drawn；

(3), subgraph recognition result is subjected to image mosaic, obtains the crack identification result of final original size.

In one embodiment, respective algorithms can be developed under Python environment, can be directly applied for general with consumer level The bridge steel box girder of logical camera shooting is also easy to produce location of cracks image, does not need special shooting or detection device.

Claims

1. a kind of accurate recognition methods in bridge steel structure crack based on computer vision, which is characterized in that method includes as follows Step:

Step 1: using steel box-girder common site crack image as initial data, the artificial mark of pixel scale is carried out, is obtained Indicate that different classes of Pixel-level marks image using different colours；

Step 2: original image and mark image are split according to certain size, mark image is mentioned according to different colours It is different classes of to do label label that different digital is distributed after taking, and is stored in a manner of array；

It is trained Step 3: training set is input in the full convolutional neural networks of depth, letter is lost used in training process Number is Dice related coefficient loss function, and optimization algorithm is adaptive moments estimation optimization algorithm；

Step 4: being obtained being input in trained neural network after bridge steel box girder region of interest image to be identified cutting The Pixel-level recognition result that splicing returns to original image size is carried out after to Pixel-level recognition result.

2. a kind of accurate recognition methods in bridge steel structure crack based on computer vision according to claim 1, special Sign is that step 1 specifically includes:

Step 1 one, using steel box-girder crack disease image data base, it is more than 4000 × 3000 pixel of picture size；

Step 1 two carries out different types of mark using pixel annotation tool to original image, and covering is different in different colors Category area；

Original image and mark image are carried out flip horizontal, flip vertical simultaneously, or original image are carried out by step 1 three Tri- channels BGR apply 10% interference, the original image and its corresponding mark image after respectively obtaining overturning or interference.

3. a kind of Pixel-level accurate intelligent identification in bridge steel box girder crack based on computer vision according to claim 1 Method, which is characterized in that the structure of each layer of the full convolutional neural networks of depth in step 3 are as follows:

L1-0 layers: the width of input is 512, depth 3；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 32, step pitch 1, zero padding 1；

L1-3 layers: the width of input is 512, depth 32；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 32, step pitch 1, zero padding 1；

L1-6 layers: the width of input is 512, depth 32；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 32, step pitch 1, zero padding 1；

L2-0 layers: the width of input is 256, depth 32；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 32, step pitch 1, zero padding 1；

L2-3 layers: the width of input is 256, depth 32；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 64, step pitch 1, zero padding 1；

L2-6 layers: the width of input is 256, depth 64；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 64, step pitch 1, zero padding 1；

L3-0 layers: the width of input is 128, depth 64；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 64, step pitch 1, zero padding 1；

L3-3 layers: the width of input is 128, depth 64；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 128, step pitch 1, zero padding 1；

L3-6 layers: the width of input is 128, depth 128；Convolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 128, step pitch 1, zero padding 1；

L4-0 layers: the width of input is 64, depth 128；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 128, step pitch 1, zero padding 1；

L4-3 layers: the width of input is 64, depth 128；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 256, step pitch 1, zero padding 1；

L4-6 layers: the width of input is 64, depth 256；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 256, step pitch 1, zero padding 1；

L5-0 layers: the width of input is 32, depth 256；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 256, step pitch 1, zero padding 1；

L5-3 layers: the width of input is 32, depth 256；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 512, step pitch 1, zero padding 1；

L5-6 layers: the width of input is 32, depth 512；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 512, step pitch 1, zero padding 1；

L6-0 layers: the width of input is 16, depth 512；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 1024, step pitch 1, zero padding 1；

L6-3 layers: the width of input is 16, depth 1024；Convolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 1024, step pitch 1, zero padding 1；

L6-6 layers: the width of input is 16, depth 1024；Convolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 1024, step pitch 1, zero padding 1；

L7-0 layers: the width of input is 32, depth 1536；Deconvolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 1536, step pitch 1, zero padding 1；

L7-3 layers: the width of input is 32, depth 1536；Convolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 512, step pitch 1, zero padding 1；

L7-6 layers: the width of input is 32, depth 3；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 512, step pitch 1, zero padding 1；

L8-0 layers: the width of input is 64, depth 768；Deconvolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 768, step pitch 1, zero padding 1；

L8-3 layers: the width of input is 64, depth 768；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 256, step pitch 1, zero padding 1；

L8-6 layers: the width of input is 64, depth 256；Convolution layer operation is executed, the width of convolution layer operation is 3, and quantity is 256, step pitch 1, zero padding 1；

L9-0 layers: the width of input is 128, depth 384；Deconvolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 384, step pitch 1, zero padding 1；

L9-3 layers: the width of input is 128, depth 384；Convolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 128, step pitch 1, zero padding 1；

L9-6 layers: the width of input is 128, depth 128；Convolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 128, step pitch 1, zero padding 1；

L10-0 layers: the width of input is 256, depth 192；Deconvolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 192, step pitch 1, zero padding 1；

L10-3 layers: the width of input is 256, depth 192；Convolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 64, step pitch 1, zero padding 1；

L10-6 layers: the width of input is 256, depth 64；Convolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 64, step pitch 1, zero padding 1；

L11-0 layers: the width of input is 512, depth 96；Deconvolution layer operation is executed, the width of convolution layer operation is 3, number Amount is 96, step pitch 1, zero padding 1；

L11-3 layers: the width of input is 512, depth 96；Convolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 32, step pitch 1, zero padding 1；

L11-6 layers: the width of input is 512, depth 32；Convolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 32, step pitch 1, zero padding 1；

L12-0 layers: the width of input is 512, depth 32；Convolution layer operation is executed, the width of convolution layer operation is 3, quantity It is 4, step pitch 1, zero padding 1；

L13 layers: the width of input is 512, depth 4；Execute loss layer operation.

4. a kind of accurate recognition methods in bridge steel structure crack based on computer vision according to claim 1, special Sign is, Dice related coefficient loss function in step 3 specifically:

5. a kind of accurate recognition methods in bridge steel structure crack based on computer vision according to claim 1, special Sign is, adaptive moments estimation optimization algorithm in step 3 specifically:

G in formula_tFor the gradient of t step, m_tFor the gradient first moment of t step, v_tFor the gradient second moment of t step, β₁It is dynamic for single order Measure attenuation coefficient, β₂For second order momentum attenuation coefficient, ε is numerical stability amount, and η is learning rate, θ_tIndicate the ginseng to be optimized of t step Number.

6. a kind of accurate recognition methods in bridge steel structure crack based on computer vision according to claim 1, special Sign is, in step 4, images to be recognized is cut into the size cut in step 2, is inputted in trained neural network, Spliced after identifying respectively.