CN114973246A - Crack detection method of cross mode neural network based on optical flow alignment - Google Patents

Crack detection method of cross mode neural network based on optical flow alignment Download PDF

Info

Publication number
CN114973246A
CN114973246A CN202210643687.5A CN202210643687A CN114973246A CN 114973246 A CN114973246 A CN 114973246A CN 202210643687 A CN202210643687 A CN 202210643687A CN 114973246 A CN114973246 A CN 114973246A
Authority
CN
China
Prior art keywords
multiplied
size
image
fusion
rgb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210643687.5A
Other languages
Chinese (zh)
Inventor
骆霖轩
韩晓东
黄嘉浩
吴欢娱
王嘉萁
陈采吟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Minjiang University
Original Assignee
Minjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Minjiang University filed Critical Minjiang University
Priority to CN202210643687.5A priority Critical patent/CN114973246A/en
Publication of CN114973246A publication Critical patent/CN114973246A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a crack detection method of a cross mode neural network based on optical flow alignment, which is characterized in that the method can perform cross-mode crack detection by fusing RGB image and depth image characteristic information based on the cross mode semantic segmentation neural network FAC-Net of the optical flow alignment, and comprises the following steps; s0, constructing a data set, training a cross-modal characteristic neural network FAC-Net, and S1, acquiring an RGB image and a depth image of a target to be detected; s2, obtaining a classification result image from the RGB image and the depth image of the same target to be detected through a pre-trained cross-modal characteristic neural network FAC-Net, wherein the classification result comprises crack semantic pixels and background semantic pixels; the method can realize the fusion of the RGB image and the depth image characteristic information and perform the crack detection in a cross-modal way.

Description

Crack detection method of cross mode neural network based on optical flow alignment
Technical Field
The invention relates to the technical field of concrete structure crack identification, in particular to a crack detection method of a cross mode neural network based on optical flow alignment.
Background
Non-human disasters such as flood, earthquake, typhoon and the like occur every year, and the disasters not only threaten the lives of people but also cause irreversible damage to concrete structures such as roads, bridges and building buildings, and the like, so that an originally good house can be changed into a dangerous house with a seriously damaged house structure, and the dangerous house can be unpredictably risked. For example, a very important project for identifying a dangerous room is observation of a crack in a building, and most of the existing dangerous room exploration is manual exploration which can generate great risks to the life safety of exploration personnel. Therefore, in the process of searching and judging the position of the concrete crack, a machine can be used for collecting the site image of the house, the crack is found out through background processing, and the approximate crack position is marked in the image, so that the concrete crack can be effectively detected under the safe condition.
With the rapid development of deep learning, especially the development of image processing, target detection and computer vision technology, the image-based nondestructive detection technology has become a research hotspot for defect detection at home and abroad. The detection method mostly adopts a digital image processing technology and a machine learning algorithm, and can detect some simple structural damages.
However, some problems still existing in the actual scene in the prior art cannot be effectively solved, and particularly, the crack cannot be accurately detected under the condition of a complex environment, such as water stain which is easily confused with the crack, manually-applied notes and the like. Because the above methods ignore the unique physical characteristics of the crack that are different from the background environment, the physical characteristics are essentially different in that the crack is deep, changes in depth rapidly, and has a large amplitude compared with other surrounding environments.
Optical flow (optical flow) is an important method for motion image analysis, and its concept was first proposed by James j. Gibson in the 40 th century, referring to the velocity of mode motion in time-varying images. Because when an object is in motion, the luminance pattern of its corresponding point on the image is also in motion. The apparent motion (apparent motion) of the image brightness pattern is the optical flow. The optical flow expresses the change of the image, and since it contains information on the movement of the object, it can be used by the observer to determine the movement of the object. The optical flow definition can be used to extend the optical flow field, which is a two-dimensional (2D) instantaneous velocity field formed by all pixel points in an image, wherein the two-dimensional velocity vector is the projection of the three-dimensional velocity vector of a visible point in a scene on the imaging surface. The optical flow contains not only motion information of the observed object but also rich information about the three-dimensional structure of the scene.
Disclosure of Invention
The invention provides a crack detection method of a cross mode neural network based on optical flow alignment, which can realize the cross-modal crack detection by fusing RGB image and depth image characteristic information.
The invention adopts the following technical scheme.
A crack detection method of a cross mode neural network based on optical flow alignment is disclosed, the method is based on the cross mode semantic segmentation neural network FAC-Net of optical flow alignment, and can perform crack detection in a cross mode by fusing RGB image and depth image feature information, and comprises the following steps;
step S0, constructing a data set, training a cross-modal characteristic neural network FAC-Net,
s1, acquiring an RGB image and a depth image of a target to be detected;
s2, obtaining a classification result image from the RGB image and the depth image of the same target to be detected through a pre-trained cross-modal characteristic neural network FAC-Net, wherein the classification result comprises crack semantic pixels and background semantic pixels;
and identifying the crack region according to the crack semantic pixel and the background semantic pixel.
In the step S0, the specific method includes:
step S01, shooting the concrete object by using a camera with a depth perception technology and an RGB sensor, and acquiring the original image data of the concrete object to form a data set;
step S02, classifying and marking the data of the data set into a crack pixel class and a background pixel class;
step S03, dividing the data set into training set and testing set according to the proportion;
step S04, training the FAC-Net neural network by using a data set;
and step S05, arranging the trained neural network model on back-end equipment for crack detection.
In the step S01, the concrete objects include the concrete structure surfaces of roads, bridges and building structures;
in step S02, the types of the crack pixels and the background pixels can be expanded into a data set by data enhancement;
in step S03, the image data set includes RGB image data, depth image data, and tag data corresponding thereto.
In step S1, a camera having a depth sensing technology and an RGB sensor is used to acquire an RGB image and a depth image of the target to be detected, and the image acquisition is not limited to a shooting mode, so that the obtained image is clear and can be regarded as a qualified standard.
In step S2, the captured RGB image and depth image are transmitted to the back-end device through a network or a wired manner, and the back-end device inputs the RGB image and depth image into the trained FAC-Net neural network to obtain a corresponding concrete crack detection result.
The FAC-Net network structure is an encoder-decoder structure; the encoder is provided with 2 branches and 1 fusion area, wherein the branches are RGB branches and Depth branches respectively;
after each branch unit of the FAC-Net network, inputting the RGB characteristics and the Depth characteristics into an FAA fusion module to obtain fusion characteristics, and then adding the fusion characteristics to the original branch characteristics; after 4 units, inputting the highest level fusion features into a feature pyramid PPM to further extract features;
the decoder part uses an FA module and is used for enabling the low-resolution high-semantic feature image to flow to the high-resolution low-semantic feature image, and an output image of a classification result is obtained through 3 modules;
all branches of the encoder use a classical resnet50 network as a backbone network;
in the encoder backbone network structure, a first unit is sequentially provided with a convolution layer, a maximum pooling layer and a comprehensive convolution layer; the second unit is a comprehensive convolution layer three-layer structure; the third unit is a three-layer structure of a comprehensive rolling layer.
In the encoder backbone network structure, the convolution layer of the first unit comprises 64 convolution kernels, the size of the convolution kernels is 7 x 7, the step length is 2, and the padding is 3; the pooling core size of the maximum pooling layer of the first unit is 3 × 3, and the step length is 2; the integrated convolutional layers of the first unit are of a three-layer structure, wherein the first convolutional layer comprises 64 convolution kernels, the size of the convolution kernels is 1 x 1, the second convolutional layer comprises 64 convolution kernels, the size of the convolution kernels is 3 x 3, the padding is 1, the third convolutional layer comprises 256 convolution kernels, and the size of the convolution kernels is 1 x 1;
the integrated convolutional layers of the second unit are of a three-layer structure, the first convolutional layer comprises 128 convolution kernels, the size of the convolution kernels is 1 x 1, the second convolutional layer comprises 128 convolution kernels, the size of the convolution kernels is 3 x 3, the padding is 1, the third convolutional layer comprises 512 convolution kernels, and the size of the convolution kernels is 1 x 1;
the comprehensive convolutional layer of the third unit is of a three-layer structure, the first convolutional layer comprises 256 convolutional kernels, the size of the convolutional kernels is 1 x 1, the second convolutional layer comprises 256 convolutional kernels, the size of the convolutional kernels is 3 x 3, the padding is 1, the third convolutional layer comprises 1024 convolutional kernels, and the size of the convolutional kernels is 1 x 1;
the integrated convolutional layer of the fourth unit has a three-layer structure, the first convolutional layer comprises 512 convolution kernels, the size of the convolution kernels is 1 x 1, the second convolutional layer comprises 512 convolution kernels, the size of the convolution kernels is 3 x 3, the padding is 1, the third convolutional layer comprises 2048 convolution kernels, and the size of the convolution kernels is 1 x 1.
In the detection method, the depth picture and the picture information of the RGB picture are aligned and then fused by using the optical flow characteristic of an FAA fusion module;
the FAA fusion module comprises an RGB branch and a Depth branch;
when the RGB branches work, the expression mode of the RGB characteristics of the image is H multiplied by W multiplied by C, namely the height multiplied by the width multiplied by the number of channels, the RGB characteristics firstly pass through a first convolution layer of the RGB branches, the convolution kernel size of the first convolution layer is 1 multiplied by 1, the step length is 1, the filling is 0, and the image characteristic graph size is H multiplied by W multiplied by 1 after the first convolution layer processing;
when the Depth branch works, the expression mode of the Depth feature of the image is H multiplied by W multiplied by C, namely the height multiplied by the width multiplied by the channel number, the Depth feature passes through a first convolution layer of the Depth branch, 2 features after convolution are spliced in the channel direction to be H multiplied by W multiplied by 2, and a fusion convolution layer with the convolution kernel size of 3 multiplied by 3, the step length of 1 and the filling of 1 is further processed to obtain an H multiplied by W multiplied by 4 fusion light flow graph;
2 layers of the fused optical flow graph are RGB branch optical flow graphs, the other 2 layers of the fused optical flow graph are Depth branch optical flow graphs, then the fused optical flow graph is split into 2H multiplied by W multiplied by 2 optical flow graphs, and the original RGB characteristics of H multiplied by W multiplied by C and the original Depth characteristics of H multiplied by W multiplied by C are respectively subjected to characteristic correction through corresponding optical flow graphs; inputting corrected RGB (red, green and blue) features of H multiplied by W multiplied by C and Depth features of H multiplied by W multiplied by C into a correlation mechanism of a space attention module to further extract fusion features, specifically, carrying out averaging and maximum pooling operation on 2 features along a channel direction to generate 4 features with the size of H multiplied by W multiplied by 1, splicing the 4 features to one H multiplied by W multiplied by 4 feature along the channel direction, and then carrying out convolution kernel with the size of 3 multiplied by 3, the step length of 1, the convolution layer filled with 1 and a Sigmoid activation function to obtain a space attention weight matrix of H multiplied by W multiplied by 2;
finally, splitting the H multiplied by W multiplied by 2 spatial attention weight matrix into 2H multiplied by W multiplied by 1 spatial attention weight matrices; multiplying the original RGB characteristics of H multiplied by W multiplied by C and the original Depth characteristics of H multiplied by W multiplied by C with the corresponding weight matrix, then adding 2 weight-weighted characteristics, and finally outputting the obtained added characteristics through a ReLU activation function to obtain FAA fusion characteristics of H multiplied by W multiplied by C; the input RGB features, Depth features and output FAA fusion features are all the same size.
In the detection method, an FA module predicts a flow field by using a method of optical flow correction through jump connection of an encoder and a decoder to enable a low-resolution high-semantic feature image to flow to a high-resolution low-semantic feature image, so that the detail features of a crack are reserved as much as possible, and the specific method comprises the following steps: compressing the number of channels of the low-resolution high-semantic feature image and the high-resolution low-semantic feature image to 2 through a first convolution layer with convolution kernel size of 1 multiplied by 1, step length of 1 and filling of 0, performing bilinear interpolation on the H/2 multiplied by W/2 multiplied by 2 low-resolution high-semantic feature image to enable the size of the low-resolution high-semantic feature image to be the same as that of the H multiplied by W multiplied by 2 high-resolution low-semantic feature image, and then splicing the low-resolution high-semantic feature image and the high-semantic feature image in the channel direction;
after splicing, 2 fusion convolution layers with convolution kernel size of 3 multiplied by 3, step length of 1 and filling of 1 are respectively processed to obtain H multiplied by W multiplied by 2 light flow graph and H multiplied by W multiplied by 1 space attention weight matrix; multiplying the low semantic feature image by a space attention weight matrix, carrying out feature correction on the high semantic image through an optical flow graph, and adding the two to obtain a combined feature.
The method for predicting the flow field by using the optical flow correction is used for enabling the low-resolution high-semantic feature image to flow to the high-resolution low-semantic feature image, and in the prediction process of the neural network, the feature size of the image changes as follows: inputting RGB images with the size of 512 multiplied by 3 and Depth pictures with the size of 512 multiplied by 1 for detection, obtaining image fusion characteristics with the size of 128 multiplied by 256 after fusion of the first unit, and obtaining new RGB characteristics and new Depth with the size of 128 multiplied by 256 after the addition of the fusion characteristics and the original characteristics; obtaining fusion characteristics with the size of 64 multiplied by 512 through a second unit, and obtaining new RGB characteristics and new Depth characteristics with the size of 64 multiplied by 512 after the addition of the fusion characteristics and the original characteristics; obtaining fusion characteristics with the size of 32 multiplied by 1024 through a third unit, and obtaining new RGB characteristics and new Depth characteristics with the size of 32 multiplied by 1024 after the addition of the fusion characteristics and the original characteristics; obtaining fusion features with the size of 16 multiplied by 2048 through a fourth unit;
then, further extracting the characteristics of the fusion characteristics with the size of 16 multiplied by 2048 through a characteristic pyramid to obtain new fusion characteristics with the size of 16 multiplied by 2048; inputting the new fusion features with the size of 16 multiplied by 2048 and the fusion features with the size of 32 multiplied by 1024 into an FA module for optical flow correction to obtain optical flow alignment fusion features with the size of 32 multiplied by 1024; inputting the optical flow alignment fusion features with the size of 32 multiplied by 1024 and the fusion features with the size of 64 multiplied by 512 into an FA module for optical flow correction to obtain new optical flow alignment fusion features with the size of 64 multiplied by 512; inputting the optical flow alignment fusion features with the size of 64 multiplied by 512 and the fusion features with the size of 128 multiplied by 256 into an FA module for optical flow correction to obtain new optical flow alignment fusion features with the size of 128 multiplied by 256; carrying out bilinear interpolation on the optical flow alignment fusion features with the size of 128 multiplied by 256 to enlarge the size of 512 multiplied by 256; and (3) subjecting the optical flow alignment fusion features with the size of 512 multiplied by 256 to convolution layers with the convolution kernel size of 1 multiplied by 1 and the step length of 1 to obtain a final 512 multiplied by 1 predicted image, wherein semantic information of the predicted image pixels is divided into two types of crack pixels and background pixels.
In step S0, a pre-prepared crack image dataset is sent to a cross-modal eigen neural network FAC-Net for training, the crack image dataset includes N crack RGB images and N corresponding crack depth images, and the data set is calculated by a method of 2: the proportion of 8 is divided into a test set and a training set, the training set is used for training FAC-Net, and the test set is used for observing the accuracy rate of the neural network model for identifying cracks in the training process;
setting three indexes as comparison benchmarks of the observation neural network model, wherein the indexes are accuracy, recall rate and F1 scores; the index calculation formula is as follows:
the parameter TP, i.e. the amount of occurrence considered and indeed true;
the parameter FP, i.e. the amount of occurrence that is considered true but actually false;
parameter FN, the amount of occurrence that is considered false but actually true;
the parameter TN, the occurrence considered false and indeed true;
accuracy = TP/(TP + FP)
Recall = TP/(TP + FN)
F1 score =2 × precision × recall/(precision + recall).
In the scheme, the FAA fusion Module is a Chinese shorthand of Flow Alignment attribute Module; the characteristic Pyramid PPM is a Chinese shorthand of Pyramid Pooling Module; the FA Module is the Chinese shorthand of Flow Alignment Module.
Compared with the prior art, the invention has the technical advantages that:
1. cross-modal crack detection is achieved using RGB images and depth images as data.
Due to the different physical characteristics between the crack and the confounding background, the essential differences are that the crack is deep, changes in depth quickly, and is large in magnitude compared to other surrounding environments. The cross-modal detection method fusing the RGB image and the depth image is used, and due to the complementarity of the RGB image and the depth image, the difference between the crack and the confusable background environment can be better recognized by combining the color semantic information of the RGB image and the depth semantic information of the depth image compared with the method only utilizing the color semantic information of the RGB image.
2. It is proposed to align RGB information and depth information using an optical flow alignment method.
The depth picture and the RGB picture are not consistent in information area expressed by the depth picture and the RGB picture because the depth picture may have objective effects such as edge blurring and misalignment with the RGB picture when the depth picture and the RGB picture are taken. If the addition fusion is simple, the feature extraction of the crack edge information by the neural network can be misled. The FAA fusion module structure is designed by utilizing the characteristics of the optical flow, and aims to align the depth picture and the information area expressed by the RGB picture in an optical flow alignment mode, so that the depth picture and the RGB picture can obtain better detailed feature information during fusion.
The invention can carry out real-time crack marking and detection on the inner walls, the ground and the like of buildings such as factories, houses, commercial buildings and the like, and is mainly applied to the fields of dangerous house exploration, urban planning and the like.
The invention realizes a real-time crack semantic segmentation algorithm based on target detection. The algorithm is characterized in that a real-time feature extraction module is added on the basis of a YOLOv5 target detection algorithm, when the size of an interested area predicted by target detection is smaller than a threshold value, an Ostu algorithm is used for rapidly detecting a small interested area, when the size of the interested area is larger than the threshold value, a real-time semantic segmentation network FANet network is used for carrying out real-time slice fusion detection on a large interested area, and features detected by all interested areas are mapped to an original image position to form a final crack feature map so as to achieve the purpose of detecting the specific position of a crack on the basis of guaranteeing real-time performance and accuracy.
Drawings
The invention is described in further detail below with reference to the following figures and detailed description:
FIG. 1 is a schematic diagram of the structure of a FAC-Net network;
FIG. 2 is a schematic structural diagram of a FAA fusion module;
FIG. 3 is a schematic diagram of a spatial attention module;
FIG. 4 is a schematic diagram of the FA module;
FIG. 5 is a schematic flow diagram of the present invention;
FIG. 6 is a pictorial illustration of a fracture picture dataset;
fig. 7 is a schematic diagram showing comparison between an image of an object to be detected and a detection result.
Detailed Description
As shown in the figure, the method for detecting the cracks of the cross mode neural network based on the optical flow alignment is characterized in that the neural network FAC-Net is segmented based on the cross mode semantic of the optical flow alignment, and the crack detection can be carried out in a cross mode by fusing RGB (red, green and blue) images and depth image feature information, and comprises the following steps;
step S0, constructing a data set, training a cross-modal characteristic neural network FAC-Net,
s1, acquiring an RGB image and a depth image of a target to be detected;
s2, obtaining a classification result image from the RGB image and the depth image of the same target to be detected through a pre-trained cross-modal characteristic neural network FAC-Net, wherein the classification result comprises crack semantic pixels and background semantic pixels;
in the step S0, the specific method includes:
step S01, shooting the concrete object by using a camera with a depth perception technology and an RGB sensor, and acquiring the original image data of the concrete object to form a data set;
step S02, classifying and marking the data of the data set into a crack pixel class and a background pixel class;
step S03, dividing the data set into training set and testing set according to the proportion;
step S04, training the FAC-Net neural network by using a data set;
and step S05, arranging the trained neural network model on back-end equipment for crack detection.
In the step S01, the concrete objects include concrete structure surfaces of roads, bridges and building buildings;
in step S02, the types of the crack pixels and the background pixels can be expanded into a data set by data enhancement;
in step S03, the image data set includes RGB image data, depth image data, and tag data corresponding thereto.
In step S1, a camera having a depth sensing technology and an RGB sensor is used to acquire an RGB image and a depth image of the target to be detected, and the image acquisition is not limited to a shooting mode, so that the obtained image is clear and can be regarded as a qualified standard.
In step S2, the captured RGB image and depth image are transmitted to the back-end device through a network or a wired manner, and the back-end device inputs the RGB image and depth image into the trained FAC-Net neural network to obtain a corresponding concrete crack detection result.
As shown in fig. 1, the FAC-Net network structure is an encoder-decoder structure; the encoder is provided with 2 branches and 1 fusion area, wherein the branches are RGB branches and Depth branches respectively;
after each branch unit of the FAC-Net network, inputting the RGB characteristics and the Depth characteristics into an FAA fusion module to obtain fusion characteristics, and then adding the fusion characteristics to the original branch characteristics; after 4 units, inputting the highest level fusion features into a feature pyramid PPM to further extract features;
the decoder part uses an FA module and is used for enabling the low-resolution high-semantic feature image to flow to the high-resolution low-semantic feature image, and an output image of a classification result is obtained through 3 modules;
all branches of the encoder use a classical resnet50 network as a backbone network;
in the encoder backbone network structure, a first unit is sequentially provided with a convolution layer, a maximum pooling layer and a comprehensive convolution layer; the second unit is a comprehensive convolution layer three-layer structure; the third unit is a three-layer structure of a comprehensive rolling layer.
In the encoder backbone network structure, the convolution layer of the first unit comprises 64 convolution kernels, the size of the convolution kernels is 7 x 7, the step length is 2, and the padding is 3; the pooling core size of the maximum pooling layer of the first unit is 3 × 3, and the step length is 2; the integrated convolutional layers of the first unit are of a three-layer structure, wherein the first convolutional layer comprises 64 convolution kernels, the size of the convolution kernels is 1 x 1, the second convolutional layer comprises 64 convolution kernels, the size of the convolution kernels is 3 x 3, the padding is 1, the third convolutional layer comprises 256 convolution kernels, and the size of the convolution kernels is 1 x 1;
the integrated convolutional layers of the second unit are of a three-layer structure, the first convolutional layer comprises 128 convolution kernels, the size of the convolution kernels is 1 x 1, the second convolutional layer comprises 128 convolution kernels, the size of the convolution kernels is 3 x 3, the padding is 1, the third convolutional layer comprises 512 convolution kernels, and the size of the convolution kernels is 1 x 1;
the comprehensive convolutional layer of the third unit is of a three-layer structure, the first convolutional layer comprises 256 convolutional kernels, the size of the convolutional kernels is 1 x 1, the second convolutional layer comprises 256 convolutional kernels, the size of the convolutional kernels is 3 x 3, the padding is 1, the third convolutional layer comprises 1024 convolutional kernels, and the size of the convolutional kernels is 1 x 1;
the integrated convolutional layer of the fourth unit has a three-layer structure, the first convolutional layer comprises 512 convolution kernels, the size of the convolution kernels is 1 x 1, the second convolutional layer comprises 512 convolution kernels, the size of the convolution kernels is 3 x 3, the padding is 1, the third convolutional layer comprises 2048 convolution kernels, and the size of the convolution kernels is 1 x 1.
As shown in fig. 2, in the detection method, the depth picture and the picture information of the RGB picture are aligned by the optical flow characteristic of the FAA fusion module and then fused;
the FAA fusion module comprises an RGB branch and a Depth branch;
when RGB branches work, the expression mode of RGB characteristics of an image is H multiplied by W multiplied by C, namely the height multiplied by the width multiplied by the number of channels, the RGB characteristics firstly pass through a first convolution layer of the RGB branches, the size of a convolution kernel of the first convolution layer is 1 multiplied by 1, the step length is 1, the filling is 0, and the size of an image characteristic graph is changed into H multiplied by W multiplied by 1 after the first convolution layer processing;
when the Depth branch works, the expression mode of the Depth characteristics of the image is H multiplied by W multiplied by C, namely the height multiplied by the width multiplied by the number of channels, the Depth characteristics pass through a first convolution layer of the Depth branch, 2 characteristics after convolution are spliced in the channel direction to be H multiplied by W multiplied by 2, and a fusion convolution layer with the convolution kernel size of 3 multiplied by 3, the step length of 1 and the filling of 1 is further processed to obtain an H multiplied by W multiplied by 4 fusion light flow graph;
2 layers of the fused optical flow graph are RGB branch optical flow graphs, the other 2 layers of the fused optical flow graph are Depth branch optical flow graphs, then as shown in FIG. 3, the fused optical flow graph is split into 2H multiplied by W multiplied by 2 optical flow graphs, and firstly, the original RGB characteristics of H multiplied by W multiplied by C and the Depth characteristics of H multiplied by W multiplied by C are respectively subjected to characteristic correction through corresponding optical flow graphs; inputting corrected RGB (red, green and blue) features of H multiplied by W multiplied by C and Depth features of H multiplied by W multiplied by C into a correlation mechanism of a space attention module to further extract fusion features, specifically, carrying out averaging and maximum pooling operation on 2 features along a channel direction to generate 4 features with the size of H multiplied by W multiplied by 1, splicing the 4 features to one H multiplied by W multiplied by 4 feature along the channel direction, and then carrying out convolution kernel with the size of 3 multiplied by 3, the step length of 1, the convolution layer filled with 1 and a Sigmoid activation function to obtain a space attention weight matrix of H multiplied by W multiplied by 2;
finally, splitting the H multiplied by W multiplied by 2 spatial attention weight matrix into 2H multiplied by W multiplied by 1 spatial attention weight matrices; multiplying the original RGB characteristics of H multiplied by W multiplied by C and the original Depth characteristics of H multiplied by W multiplied by C with the corresponding weight matrix, then adding 2 weight-weighted characteristics, and finally outputting the obtained added characteristics through a ReLU activation function to obtain FAA fusion characteristics of H multiplied by W multiplied by C; the input RGB features, Depth features and output FAA fusion features are all the same size.
As shown in fig. 4, in the detection method, an FA module uses a skip connection between an encoder and a decoder and predicts a flow field by using an optical flow correction method to flow a low-resolution high-semantic feature image to a high-resolution low-semantic feature image, so as to retain the detail features of the crack as much as possible, and the specific method is as follows: compressing the number of channels of the low-resolution high-semantic feature image and the high-resolution low-semantic feature image to 2 through a first convolution layer with convolution kernel size of 1 multiplied by 1, step length of 1 and filling of 0, performing bilinear interpolation on the H/2 multiplied by W/2 multiplied by 2 low-resolution high-semantic feature image to enable the size of the low-resolution high-semantic feature image to be the same as that of the H multiplied by W multiplied by 2 high-resolution low-semantic feature image, and then splicing the low-resolution high-semantic feature image and the high-semantic feature image in the channel direction;
after splicing, 2 fusion convolution layers with convolution kernel size of 3 multiplied by 3, step length of 1 and filling of 1 are respectively processed to obtain H multiplied by W multiplied by 2 light flow graph and H multiplied by W multiplied by 1 space attention weight matrix; multiplying the low semantic feature image by a space attention weight matrix, carrying out feature correction on the high semantic image through an optical flow graph, and adding the two to obtain a combined feature.
The method for predicting the flow field by using the optical flow correction is used for enabling the low-resolution high-semantic feature image to flow to the high-resolution low-semantic feature image, and in the prediction process of the neural network, the feature size of the image changes as follows: inputting RGB images with the size of 512 multiplied by 3 and Depth pictures with the size of 512 multiplied by 1 for detection, obtaining image fusion characteristics with the size of 128 multiplied by 256 after fusion of the first unit, and obtaining new RGB characteristics and new Depth with the size of 128 multiplied by 256 after the addition of the fusion characteristics and the original characteristics; obtaining fusion characteristics with the size of 64 multiplied by 512 through a second unit, and obtaining new RGB characteristics and new Depth characteristics with the size of 64 multiplied by 512 after the addition of the fusion characteristics and the original characteristics; obtaining fusion characteristics with the size of 32 multiplied by 1024 through a third unit, and obtaining new RGB characteristics and new Depth characteristics with the size of 32 multiplied by 1024 after the addition of the fusion characteristics and the original characteristics; obtaining fusion features with the size of 16 multiplied by 2048 through a fourth unit;
then, further extracting the characteristics of the fusion characteristics with the size of 16 multiplied by 2048 through a characteristic pyramid to obtain new fusion characteristics with the size of 16 multiplied by 2048; inputting the new fusion features with the size of 16 multiplied by 2048 and the fusion features with the size of 32 multiplied by 1024 into an FA module for optical flow correction to obtain optical flow alignment fusion features with the size of 32 multiplied by 1024; inputting the optical flow alignment fusion features with the size of 32 multiplied by 1024 and the fusion features with the size of 64 multiplied by 512 into an FA module for optical flow correction to obtain new optical flow alignment fusion features with the size of 64 multiplied by 512; inputting the optical flow alignment fusion features with the size of 64 multiplied by 512 and the fusion features with the size of 128 multiplied by 256 into an FA module for optical flow correction to obtain new optical flow alignment fusion features with the size of 128 multiplied by 256; carrying out bilinear interpolation on the optical flow alignment fusion features with the size of 128 multiplied by 256 to enlarge the size of 512 multiplied by 256; and (3) subjecting the optical flow alignment fusion features with the size of 512 multiplied by 256 to convolution layers with the convolution kernel size of 1 multiplied by 1 and the step length of 1 to obtain a final 512 multiplied by 1 predicted image, wherein semantic information of the predicted image pixels is divided into two types of crack pixels and background pixels.
In step S0, a pre-prepared crack image dataset is sent to a cross-modal eigen neural network FAC-Net for training, the crack image dataset includes N crack RGB images and N corresponding crack depth images, and the data set is calculated by a method of 2: the proportion of 8 is divided into a test set and a training set, the training set is used for training FAC-Net, and the test set is used for observing the accuracy rate of the neural network model for identifying cracks in the training process;
setting three indexes as comparison benchmarks of the observation neural network model, wherein the three indexes are respectively an accuracy rate, a recall rate and an F1 score; the index calculation formula is as follows:
the parameter TP, i.e. the amount of occurrence considered and indeed true;
the parameter FP, i.e. the amount of occurrence that is considered true but actually false;
parameter FN, the amount of occurrence that is considered false but actually true;
the parameter TN, the occurrence considered false and indeed true;
accuracy = TP/(TP + FP)
Recall = TP/(TP + FN)
F1 score =2 × precision × recall/(precision + recall).
The following table shows the comparison result of the crack detection index of the model FAC-Net and the classical semantic segmentation network U-Net under different environmental backgrounds:
Figure DEST_PATH_IMAGE001
the data in the table show that the cross-mode neural network designed by the invention can better identify the characteristic difference between the crack and the confusable background under the condition of different environment backgrounds by using the cross-mode semantic segmentation neural network FAC-Net based on the alignment of the optical flow, thereby achieving a better crack detection effect and having stronger robustness.
In the middle of the left part and the middle of the right part of fig. 7, there is a pixel region identified as a crack. It can be seen that the FAC-Net can still exclude various confusable information from accurately detecting the position of the crack under the condition of more complex background colors.
In the above, the FAA fusion Module is a chinese shorthand of Flow Alignment attribute Module; the characteristic Pyramid PPM is a Chinese shorthand of Pyramid Pooling Module; the FA Module is the Chinese shorthand of Flow Alignment Module.

Claims (10)

1. A crack detection method of a cross mode neural network based on optical flow alignment is characterized by comprising the following steps: the method is based on the cross mode semantic segmentation neural network FAC-Net of optical flow alignment, can perform crack detection in a cross mode by fusing RGB image and depth image characteristic information, and comprises the following steps;
step S0, constructing a data set, training a cross-modal characteristic neural network FAC-Net,
s1, acquiring an RGB image and a depth image of a target to be detected;
s2, obtaining a classification result image from the RGB image and the depth image of the same target to be detected through a pre-trained cross-modal characteristic neural network FAC-Net, wherein the classification result comprises crack semantic pixels and background semantic pixels;
in the step S0, the specific method includes:
step S01, shooting the concrete object by using a camera with a depth perception technology and an RGB sensor, and acquiring the original image data of the concrete object to form a data set;
step S02, classifying and marking the data of the data set into a crack pixel class and a background pixel class;
step S03, dividing the data set into training set and testing set according to the proportion;
step S04, training the FAC-Net neural network by using the data set;
and step S05, arranging the trained neural network model on back-end equipment for crack detection.
2. The method of claim 1, wherein the method comprises: in the step S01, the concrete objects include concrete structure surfaces of roads, bridges and building buildings;
in step S02, the types of the crack pixels and the background pixels can be expanded into a data set by data enhancement;
in step S03, the image data set includes RGB image data, depth image data, and tag data corresponding thereto.
3. The method of claim 1, wherein the method comprises: in step S1, a camera having a depth sensing technology and an RGB sensor is used to acquire an RGB image and a depth image of the target to be detected, and the image acquisition is not limited to a shooting mode, so that the obtained image is clear and can be regarded as a qualified standard.
4. The method of claim 3, wherein the method comprises: in step S2, the captured RGB image and depth image are transmitted to the back-end device through a network or a wired manner, and the back-end device inputs the RGB image and depth image into the trained FAC-Net neural network to obtain a corresponding concrete crack detection result.
5. The method of claim 1, wherein the method comprises: the FAC-Net network structure is an encoder-decoder structure; the encoder is provided with 2 branches and 1 fusion area, wherein the branches are RGB branches and Depth branches respectively;
after each branch unit of the FAC-Net network, inputting the RGB characteristics and the Depth characteristics into an FAA fusion module to obtain fusion characteristics, and then adding the fusion characteristics to the original branch characteristics; after 4 units, inputting the highest level fusion features into a feature pyramid PPM to further extract features;
the decoder part uses an FA module and is used for enabling the low-resolution high-semantic feature image to flow to the high-resolution low-semantic feature image, and an output image of a classification result is obtained through 3 modules;
all branches of the encoder use a classical resnet50 network as a backbone network;
in the encoder backbone network structure, a first unit is sequentially provided with a convolution layer, a maximum pooling layer and a comprehensive convolution layer; the second unit is a comprehensive convolution layer three-layer structure; the third unit is a three-layer structure of a comprehensive rolling layer.
6. The method of claim 5, wherein the method comprises: in the encoder backbone network structure, the convolution layer of the first unit comprises 64 convolution kernels, the size of the convolution kernels is 7 x 7, the step length is 2, and the padding is 3; the pooling core size of the maximum pooling layer of the first unit is 3 × 3, and the step length is 2; the integrated convolutional layers of the first unit are of a three-layer structure, wherein the first convolutional layer comprises 64 convolution kernels, the size of the convolution kernels is 1 x 1, the second convolutional layer comprises 64 convolution kernels, the size of the convolution kernels is 3 x 3, the padding is 1, the third convolutional layer comprises 256 convolution kernels, and the size of the convolution kernels is 1 x 1;
the integrated convolutional layers of the second unit are of a three-layer structure, the first convolutional layer comprises 128 convolution kernels, the size of the convolution kernels is 1 x 1, the second convolutional layer comprises 128 convolution kernels, the size of the convolution kernels is 3 x 3, the padding is 1, the third convolutional layer comprises 512 convolution kernels, and the size of the convolution kernels is 1 x 1;
the comprehensive convolutional layer of the third unit is of a three-layer structure, the first convolutional layer comprises 256 convolutional kernels, the size of the convolutional kernels is 1 x 1, the second convolutional layer comprises 256 convolutional kernels, the size of the convolutional kernels is 3 x 3, the padding is 1, the third convolutional layer comprises 1024 convolutional kernels, and the size of the convolutional kernels is 1 x 1;
the integrated convolutional layer of the fourth unit has a three-layer structure, the first convolutional layer comprises 512 convolutional kernels, the size of the convolutional kernels is 1 × 1, the second convolutional layer comprises 512 convolutional kernels, the size of the convolutional kernels is 3 × 3, the padding is 1, the third convolutional layer comprises 2048 convolutional kernels, and the size of the convolutional kernels is 1 × 1.
7. The method of claim 5, wherein the method comprises: in the detection method, the depth picture and the picture information of the RGB picture are aligned and then fused by using the optical flow characteristic of an FAA fusion module;
the FAA fusion module comprises an RGB branch and a Depth branch;
when RGB branches work, the expression mode of RGB characteristics of an image is H multiplied by W multiplied by C, namely the height multiplied by the width multiplied by the number of channels, the RGB characteristics firstly pass through a first convolution layer of the RGB branches, the size of a convolution kernel of the first convolution layer is 1 multiplied by 1, the step length is 1, the filling is 0, and the size of an image characteristic graph is changed into H multiplied by W multiplied by 1 after the first convolution layer processing;
when the Depth branch works, the expression mode of the Depth feature of the image is H multiplied by W multiplied by C, namely the height multiplied by the width multiplied by the channel number, the Depth feature passes through a first convolution layer of the Depth branch, 2 features after convolution are spliced in the channel direction to be H multiplied by W multiplied by 2, and a fusion convolution layer with the convolution kernel size of 3 multiplied by 3, the step length of 1 and the filling of 1 is further processed to obtain an H multiplied by W multiplied by 4 fusion light flow graph;
2 layers of the fused optical flow graph are RGB branch optical flow graphs, the other 2 layers of the fused optical flow graph are Depth branch optical flow graphs, then the fused optical flow graph is split into 2H multiplied by W multiplied by 2 optical flow graphs, and the original RGB characteristics of H multiplied by W multiplied by C and the original Depth characteristics of H multiplied by W multiplied by C are respectively subjected to characteristic correction through corresponding optical flow graphs; inputting corrected RGB (red, green and blue) features of H multiplied by W multiplied by C and Depth features of H multiplied by W multiplied by C into a correlation mechanism of a space attention module to further extract fusion features, specifically, carrying out averaging and maximum pooling operation on 2 features along a channel direction to generate 4 features with the size of H multiplied by W multiplied by 1, splicing the 4 features to one H multiplied by W multiplied by 4 feature along the channel direction, and then carrying out convolution kernel with the size of 3 multiplied by 3, the step length of 1, the convolution layer filled with 1 and a Sigmoid activation function to obtain a space attention weight matrix of H multiplied by W multiplied by 2;
finally, splitting the H multiplied by W multiplied by 2 spatial attention weight matrix into 2H multiplied by W multiplied by 1 spatial attention weight matrices; multiplying the original RGB characteristics of H multiplied by W multiplied by C and the original Depth characteristics of H multiplied by W multiplied by C with the corresponding weight matrix, then adding 2 weight-weighted characteristics, and finally outputting the obtained added characteristics through a ReLU activation function to obtain FAA fusion characteristics of H multiplied by W multiplied by C; the input RGB features, Depth features and output FAA fusion features are all the same size.
8. The method of claim 7, wherein the method comprises: in the detection method, an FA module predicts a flow field by using a method of optical flow correction through jump connection of an encoder and a decoder to enable a low-resolution high-semantic feature image to flow to a high-resolution low-semantic feature image, so that the detail features of a crack are reserved as much as possible, and the specific method comprises the following steps: compressing the number of channels of the low-resolution high-semantic feature image and the high-resolution low-semantic feature image to 2 through a first convolution layer with convolution kernel size of 1 multiplied by 1, step length of 1 and filling of 0, performing bilinear interpolation on the H/2 multiplied by W/2 multiplied by 2 low-resolution high-semantic feature image to enable the size of the low-resolution high-semantic feature image to be the same as that of the H multiplied by W multiplied by 2 high-resolution low-semantic feature image, and then splicing the low-resolution high-semantic feature image and the high-semantic feature image in the channel direction;
after splicing, obtaining an H multiplied by W multiplied by 2 light flow graph and an H multiplied by W multiplied by 1 space attention weight matrix through 2 fusion convolution layers with convolution kernel size of 3 multiplied by 3, step length of 1 and filling of 1; multiplying the low semantic feature image by a space attention weight matrix, carrying out feature correction on the high semantic image through an optical flow graph, and adding the two to obtain a combined feature.
9. The method of claim 8, wherein the method comprises: the method for predicting the flow field by using the optical flow correction is used for enabling the low-resolution high-semantic feature image to flow to the high-resolution low-semantic feature image, and in the prediction process of the neural network, the feature size of the image changes as follows: inputting RGB images with the size of 512 multiplied by 3 and Depth pictures with the size of 512 multiplied by 1 for detection, obtaining image fusion characteristics with the size of 128 multiplied by 256 after fusion of the first unit, and obtaining new RGB characteristics and new Depth with the size of 128 multiplied by 256 after the addition of the fusion characteristics and the original characteristics; obtaining a fusion feature with the size of 64 multiplied by 512 through a second unit, and obtaining a new RGB feature and a new Depth feature with the size of 64 multiplied by 512 through the addition of the fusion feature and the original feature; obtaining fusion characteristics with the size of 32 multiplied by 1024 through a third unit, and obtaining new RGB characteristics and new Depth characteristics with the size of 32 multiplied by 1024 after the fusion characteristics and the original characteristics are added; obtaining fusion features with the size of 16 multiplied by 2048 through a fourth unit;
then, further extracting the characteristics of the fusion characteristics with the size of 16 multiplied by 2048 through a characteristic pyramid to obtain new fusion characteristics with the size of 16 multiplied by 2048; inputting the new fusion features with the size of 16 multiplied by 2048 and the fusion features with the size of 32 multiplied by 1024 into an FA module for optical flow correction to obtain optical flow alignment fusion features with the size of 32 multiplied by 1024; inputting the optical flow alignment fusion features with the size of 32 multiplied by 1024 and the fusion features with the size of 64 multiplied by 512 into an FA module for optical flow correction to obtain new optical flow alignment fusion features with the size of 64 multiplied by 512; inputting the optical flow alignment fusion features with the size of 64 multiplied by 512 and the fusion features with the size of 128 multiplied by 256 into an FA module for optical flow correction to obtain new optical flow alignment fusion features with the size of 128 multiplied by 256; carrying out bilinear interpolation on the optical flow alignment fusion features with the size of 128 multiplied by 256 to enlarge the size of 512 multiplied by 256; and (3) subjecting the optical flow alignment fusion features with the size of 512 multiplied by 256 to convolution layers with the convolution kernel size of 1 multiplied by 1 and the step length of 1 to obtain a final 512 multiplied by 1 predicted image, wherein semantic information of the predicted image pixels is divided into two types of crack pixels and background pixels.
10. The method of claim 1, wherein the method comprises: in step S0, a pre-prepared crack image dataset is sent to a cross-modal eigen neural network FAC-Net for training, the crack image dataset includes N crack RGB images and N corresponding crack depth images, and the data set is calculated by a method of 2: the proportion of 8 is divided into a test set and a training set, the training set is used for training FAC-Net, and the test set is used for observing the accuracy rate of the neural network model for identifying cracks in the training process;
setting three indexes as comparison benchmarks of the observation neural network model, wherein the indexes are accuracy, recall rate and F1 scores; the index calculation formula is as follows:
the parameter TP, i.e. the amount of occurrence considered and indeed true;
the parameter FP, i.e. the amount of occurrence that is considered true but actually false;
parameter FN, the amount of occurrence that is considered false but actually true;
the parameter TN, the occurrence considered false and indeed true;
accuracy = TP/(TP + FP)
Recall = TP/(TP + FN)
F1 score =2 × precision × recall/(precision + recall).
CN202210643687.5A 2022-06-09 2022-06-09 Crack detection method of cross mode neural network based on optical flow alignment Pending CN114973246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210643687.5A CN114973246A (en) 2022-06-09 2022-06-09 Crack detection method of cross mode neural network based on optical flow alignment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210643687.5A CN114973246A (en) 2022-06-09 2022-06-09 Crack detection method of cross mode neural network based on optical flow alignment

Publications (1)

Publication Number Publication Date
CN114973246A true CN114973246A (en) 2022-08-30

Family

ID=82961064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210643687.5A Pending CN114973246A (en) 2022-06-09 2022-06-09 Crack detection method of cross mode neural network based on optical flow alignment

Country Status (1)

Country Link
CN (1) CN114973246A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116823819A (en) * 2023-08-28 2023-09-29 常熟理工学院 Weld surface defect detection method, system, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116823819A (en) * 2023-08-28 2023-09-29 常熟理工学院 Weld surface defect detection method, system, electronic equipment and storage medium
CN116823819B (en) * 2023-08-28 2023-11-07 常熟理工学院 Weld surface defect detection method, system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112434796A (en) Cross-modal pedestrian re-identification method based on local information learning
WO2021056630A1 (en) Defect detection method and device for transmission line tower structure
CN110246141B (en) Vehicle image segmentation method based on joint corner pooling under complex traffic scene
CN110378232B (en) Improved test room examinee position rapid detection method of SSD dual-network
CN114743119B (en) High-speed rail contact net hanger nut defect detection method based on unmanned aerial vehicle
CN114399672A (en) Railway wagon brake shoe fault detection method based on deep learning
CN110705566B (en) Multi-mode fusion significance detection method based on spatial pyramid pool
CN111611861B (en) Image change detection method based on multi-scale feature association
CN110222604A (en) Target identification method and device based on shared convolutional neural networks
Pathak et al. An object detection approach for detecting damages in heritage sites using 3-D point clouds and 2-D visual data
CN111666852A (en) Micro-expression double-flow network identification method based on convolutional neural network
CN115661505A (en) Semantic perception image shadow detection method
CN114972177A (en) Road disease identification management method and device and intelligent terminal
CN114049356A (en) Method, device and system for detecting structure apparent crack
CN117593304A (en) Semi-supervised industrial product surface defect detection method based on cross local global features
Bai et al. Deep cascaded neural networks for automatic detection of structural damage and cracks from images
CN114973246A (en) Crack detection method of cross mode neural network based on optical flow alignment
CN116468769A (en) Depth information estimation method based on image
CN115564031A (en) Detection network for glass defect detection
CN114662605A (en) Flame detection method based on improved YOLOv5 model
CN110321867A (en) Shelter target detection method based on part constraint network
CN114332739A (en) Smoke detection method based on moving target detection and deep learning technology
CN113627427A (en) Instrument and meter reading method and system based on image detection technology
CN107463968A (en) Smog judges the detection method of the production method of code book, generation system and smog
CN115082833A (en) Method and system for judging threat degree of water target

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination