CN113011525A

CN113011525A - Dependency decoding-based track slab crack semantic segmentation model

Info

Publication number: CN113011525A
Application number: CN202110425910.4A
Authority: CN
Inventors: 李文举; 陈慧玲; 何茂贤; 张耀星
Original assignee: Shanghai Institute of Technology
Current assignee: Shanghai Institute of Technology
Priority date: 2021-04-20
Filing date: 2021-04-20
Publication date: 2021-06-22
Anticipated expiration: 2041-04-20
Also published as: CN113011525B

Abstract

The invention discloses a track slab crack semantic segmentation model based on dependency decoding, which comprises the following steps of S1: acquiring a track slab crack image on site; step S2: semantically segmenting the acquired image system into labels, and dividing the acquired crack image and the corresponding labels into a training set and a test set; step S3: placing the crack image labels divided into the training set in a limited Bowman machine, mapping from high dimension to low dimension, and learning and obtaining the mapping parameters from low dimension to high dimension by using the unique reconstruction mechanism; step S4: constructing a track slab semantic segmentation model, normalizing the track slab images of the training set, and then inputting the normalized track slab images into the model in an iterative manner, wherein the semantic segmentation model extracts the features of the track slab images; step S5: the feature map extracted in step S4 is sent to a hidden layer in the constrained-wave zeeman machine, restored to the original size using the reconstruction parameters, and subjected to pixel-level prediction.

Description

Dependency decoding-based track slab crack semantic segmentation model

Technical Field

The invention relates to the field of semantic segmentation models, in particular to a track slab crack semantic segmentation model based on dependency decoding.

Background

Urban construction drives the high-speed rail industry to develop rapidly, but brings convenience to people, and meanwhile, due to the fact that day and night temperature difference changes greatly, and high-speed rail motor cars extrude the track plate, the crack of the track plate is likely to be enlarged continuously, and certain safety accidents are finally caused. Therefore, the detection of the cracks of the track slab is an important task for ensuring the safety of people and the stability of the country. However, the traditional rail slab cracks need to be screened manually and completely depend on subjectivity and experience, which is an extremely unreasonable detection mode.

With the continuous development of the fields of artificial intelligence and digital image processing, detection modes such as manual and traditional image processing and the like are replaced by convolutional neural networks. At present, in related fields, certain research results exist, people such as the firewood cedar and the like use a deep neural network to identify the tunnel lining cracks, and people such as the Lilianfu and the like also use a convolutional neural network to successfully detect the bridge cracks. Generally, the mainstream crack image detection method is divided into crack image classification, crack target detection and crack image segmentation.

In a fracture image segmentation model, various neural networks are generally adopted to extract high-level abstract information of a fracture, but the acquired fracture features become smaller and smaller. The common processing mode is to use a bilinear interpolation method to up-sample a feature map, restore the feature map to the size of an input crack image and realize the end-to-end property of the whole track slab crack semantic segmentation model, but the method for restoring the image size and performing pixel-level prediction is too simple, and data dependence between feature points is ignored. The method uses an additional model to learn the mapping relation of the crack image from the low space structure to the original image size, and replaces the original bilinear interpolation method with the mapping relation.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a track slab crack semantic segmentation model based on dependency decoding, and the specific position information of cracks in an image is effectively acquired.

In order to achieve the above purpose, the technical solution for solving the technical problem is as follows:

a track slab crack semantic segmentation model based on dependency decoding comprises the following steps:

step S1: acquiring a track slab crack image on site;

step S2: semantically segmenting the acquired image system into labels, and dividing the acquired crack image and the corresponding labels into a training set and a test set;

step S3: placing the crack image labels divided into the training set in a limited Bowman machine, mapping from high dimension to low dimension, and learning and obtaining the mapping parameters from low dimension to high dimension by using the unique reconstruction mechanism;

step S4: constructing a track slab semantic segmentation model, normalizing the track slab images of the training set, and then inputting the normalized track slab images into the model in an iterative manner, wherein the semantic segmentation model extracts the features of the track slab images;

step S5: the feature map extracted in step S4 is sent to a hidden layer in the constrained-wave zeeman machine, restored to the original size using the reconstruction parameters, and subjected to pixel-level prediction.

Further, in the crack image obtained in step S2, a/dehophoh tool is used to assist in making a crack region label, and the obtained label image is a 2-value matrix with the same size as the image, where 0 represents the background region and 1 represents the crack region.

Further, the input size of the training image in the training set of step S2 is

Corresponding label

By passingone-hot coding is converted into a sparse matrix Y epsilon {0, 1}^H×W×NWhere N is the number of categories in the semantic segmentation task, and N is 2, so that the final matrix obtained in step S5 needs to be as close to the label matrix as possible.

Further, in step S3, the mapping method of the training features to the pixel-level predictive labels is a restricted-wave zeeman machine, the model adopts a label-free learning method without intra-layer connection, the label graph of the training image is used as a visible layer of the restricted-wave zeeman machine, the value of the linear structure matched with the nonlinear activation function is used as a hidden layer node, and in the reconstruction stage of the restricted-wave zeeman machine, the value of the hidden layer node is subjected to forward propagation with the same weight and different deviations to obtain an approximate reconstructed value; updating the weight and the parameters by calculating the loss degree to enable the approximate value to be close to the original input; the following is a specific formula of the limited Beziman machine, wherein v is the input of a visible layer, w is a linear matrix, v with a high dimension is projected to h with a low dimension, alpha and beta are the deviation during projection and reconstruction, sigma is a sigmoid function, r is a reconstruction value, a difference value is calculated through the sum of h, parameters are updated, and finally, trained w and beta are stored:

h＝σ(w·v+α)；r＝w^T·h+β

as can be seen from the meaning of the tag matrix in semantic segmentation,

indicating the distribution of the ith category in the image, and YⁱConversion to v ∈ {0, 1}^(H×W)×1And (3) the data are sent to a limited wave-Zeeman machine for unsupervised learning, a projection matrix w of the data is learned through forward propagation and reconstruction, and each class needs to be trained independently to obtain the projection matrix because each complete training can only learn the decoding transformation mode of the current class.

Further, in step S4, the side length of the feature map obtained by using the downsampling extracted features is 1/8 of the original map, which makes the number of hidden layer nodes constructed in step S3 equal to 1/64 of the number of visible layer nodes, that is, 1/64 of the number of visible layer nodes

The last layer of features F of the saved w and beta pair semantic segmentation model_lastPerforming projection to obtain a final pixel-level prediction result E, and using the most possible category of the past current pixel point of the argmax function, wherein the prediction formula is as follows:

E＝argmax(w^T·F_last+β)

in the semantic segmentation model training process, the upsampled parameters w and beta are frozen, and the cross entropy is calculated to update the weights and bias values in the backbone network.

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects:

1. in the prior art, a bilinear interpolation method is too simple, and only the model can reach the suboptimal solution of a semantic segmentation model. The up-sampling mode used by the invention is strictly dependent on coding, and compared with a bilinear interpolation method, the up-sampling method only depends on surrounding points, can capture global information to perform up-sampling, and can flexibly and automatically update w to change the influence degree of surrounding pixel characteristics on target characteristic categories;

2. the invention decouples the relation between the upsampling and the feature extraction of the semantic segmentation model, puts the upsampling and the feature extraction into the two models for respective training, and the upsampling amplitude can be self-defined.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a drawing of the Labelme labeling tool of the present invention;

FIG. 2 is an image of an actual crack and its corresponding crack label map in accordance with the present invention;

FIG. 3 is a schematic projection of a confined Bowman machine training feature of the present invention;

FIG. 4 is an overall architecture diagram of the model of the present invention.

Detailed Description

While the embodiments of the present invention will be described and illustrated in detail with reference to the accompanying drawings, it is to be understood that the invention is not limited to the specific embodiments disclosed, but is intended to cover various modifications, equivalents, and alternatives falling within the scope of the invention as defined by the appended claims.

The embodiment discloses a track slab crack semantic segmentation model based on dependency decoding, which comprises the following steps:

step S1: acquiring a track slab crack image on site;

Further, the crack image obtained in step S2 is used to assist in making a crack region label by using a/dehophoh tool, and the making process is as shown in fig. 1, where the obtained label image is a 2-value matrix with the same size as the image, where 0 represents the background region and 1 represents the crack region. As shown in FIG. 2, the input size of the training images in the training set of step S2 is

Corresponding label

The method is converted into a sparse matrix Y epsilon {0, 1} through one-hot coding^H×W×NWhere N is the number of categories in the semantic segmentation task, and N is 2, so that the final matrix obtained in step S5 needs to be as close to the label matrix as possible.

Furthermore, in step S3, the mapping method of the training features to the pixel-level predictive labels is a restricted bauzmann machine, the model adopts a label-free learning method without intra-layer connection to help independently create weights and deviations, the label graph of the training image is used as a visible layer of the restricted bauzmann machine, and the values obtained by linear structure and nonlinear activation function are used as hidden layer nodes, and the invention simulates the semantic model structure to extract features by using the method; in the reconstruction stage of the restricted wave zeeman machine, the values of the nodes of the hidden layer are transmitted by the same weight and different deviations in the forward direction to obtain an approximate value of reconstruction; updating the weight and the parameters by calculating the loss degree to enable the approximate value to be close to the original input; the following is a specific formula of the limited Beziman machine, wherein v is the input of a visible layer, w is a linear matrix, v with a high dimension is projected to h with a low dimension, alpha and beta are the deviation during projection and reconstruction, sigma is a sigmoid function, r is a reconstruction value, a difference value is calculated through the sum of h, parameters are updated, and finally, trained w and beta are stored:

h＝σ(w·v+α)；r＝w^T·h+β

as can be seen from the meaning of the tag matrix in semantic segmentation,

indicating the distribution of the ith category in the image, and YⁱConversion to v ∈ {0, 1}^(H×W)×1And (3) sending the data to a limited wave-Zeeman machine for unsupervised learning, and learning a projection matrix w of the data through forward propagation and reconstruction, wherein the specific schematic diagram is shown in figure 3. Since only the decoding transformation mode of the current class can be learned in each complete training, each class needs to be trained independently so as to obtain the projection matrix.

Further, in step S4, the side length of the feature map obtained by down-sampling feature extraction is 1/8 of the original image, which is usually used in the general semantic segmentation model, and this also makes the number of hidden layer nodes constructed in step S3 equal to 1/64 of the number of visible layer nodes, that is, 1/64 of the number of visible layer nodes

E＝argmax(w^T·F_last+β)

According to the track slab crack semantic segmentation model based on dependency decoding, the overall structure of the model is shown in fig. 4, the specific position of the crack is judged while the crack is identified, and the situation that fine cracks cannot be accurately detected in the detection process is avoided. The crack image is collected, a Labelme tool is used for dividing the crack area, and then one-hot coding is used for making the label. The whole model is divided into 2 parts, firstly, each class of label is placed into a restricted Bowman machine to learn the reconstruction parameter of a single class, and the label is applied to an upsampling part in a semantic segmentation model based on FCN to replace the original bilinear interpolation upsampling mode, so that the classification prediction result of a pixel level is obtained. Compared with the prior art, the method can determine the pixel point type according to the global information when the characteristic diagram is restored to the input size, is flexible, and can customize the up-sampling amplitude.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A track slab crack semantic segmentation model based on dependency decoding is characterized by comprising the following steps:

step S1: acquiring a track slab crack image on site;

2. The dependency decoding-based track slab fracture semantic segmentation model of claim 1, wherein the fracture image obtained in step S2 is labeled by using a label tool to assist in making a fracture region, and the obtained label image is a 2-value matrix with the same size as the image, where 0 represents a background region and 1 represents a fracture region.

3. The dependency decoding-based track slab crack semantic segmentation model as claimed in claim 2, wherein the input size of the training images in the training set of step S2 is as follows

Corresponding label

4. The dependency decoding-based track slab crack semantic segmentation model according to claim 1, wherein in step S3, the mapping manner of the training features to the pixel-level predictive labels is a restricted-wave zeeman machine, the model adopts a label-free learning manner without intra-layer connection, a label graph of a training image is used as a visible layer of the restricted-wave zeeman machine, a value obtained by a linear structure and a nonlinear activation function is used as a hidden layer node, and in the reconstruction stage of the restricted-wave zeeman machine, the value of the hidden layer node is subjected to forward propagation with the same weight and different deviations to obtain a reconstructed approximate value; updating the weight and the parameters by calculating the loss degree to enable the approximate value to be close to the original input; the following is a specific formula of the limited Beziman machine, wherein v is the input of a visible layer, w is a linear matrix, v with a high dimension is projected to h with a low dimension, alpha and beta are the deviation during projection and reconstruction, sigma is a sigmoid function, r is a reconstruction value, a difference value is calculated through the sum of h, parameters are updated, and finally, trained w and beta are stored:

h＝σ(w·v+α)；r＝w^T·h+β

as can be seen from the meaning of the tag matrix in semantic segmentation,

5. The dependency decoding-based track slab crack semantic segmentation model as claimed in claim 1, wherein in step S4, the side length of the feature map obtained by using downsampling to extract features is 1/8 of the original map, which makes the number of hidden layer nodes constructed in step S3 equal to 1/64 of the number of visible layer nodes, that is, the model is characterized in that

E＝argmax(w^T·F_last+β)