CN114972882A

CN114972882A - Wear surface damage depth estimation method and system based on multi-attention machine system

Info

Publication number: CN114972882A
Application number: CN202210689847.XA
Authority: CN
Inventors: 王硕; 邵涛; 武通海; 王青华
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2022-06-17
Filing date: 2022-06-17
Publication date: 2022-08-30
Anticipated expiration: 2042-06-17
Also published as: CN114972882B

Abstract

The invention discloses a wear surface damage depth estimation method and system based on a multi-attention machine mechanism, wherein a wear surface basic feature extraction layer is constructed by taking four layers of convolution blocks in a ResNet-50 coding layer as a main body and combining the convolution blocks of two layers of Conv-ReLU; fusing the damaged area segmentation branch network and the depth information estimation branch network to form a wear surface depth estimation model; obtaining a loss function of the wear surface depth estimation model in a weighting mode; selecting a wear surface image with a typical damage area as a training sample, taking a loss function as an optimization target, training a wear surface depth estimation model by adopting an adaptive moment estimation method, inputting a single wear surface image into the wear surface depth estimation model, and obtaining a damage area segmentation result graph and a depth information result graph of the wear surface. The method and the device effectively realize the estimation of the three-dimensional depth information from a single wear surface image, and solve the problems of high acquisition difficulty, low efficiency and high complexity of the depth information in the technical field of wear surface analysis.

Description

Wear surface damage depth estimation method and system based on multi-attention machine system

Technical Field

The invention belongs to the technical field of wear surface analysis in the field of machine fault diagnosis, and particularly relates to a wear surface damage depth estimation method and system based on a multi-attention machine system.

Background

The wear of the friction pairs reduces the operational reliability and stability of the mechanical equipment and may even lead to serious failures. The wear surface is a direct product of the wear process, and the morphology of the damaged area can characterize the wear evolution mechanism and the wear severity. Therefore, wear surface analysis techniques are considered to be the most direct and reliable technical means for critical tribological system condition monitoring and fault diagnosis. Under the promotion of the concept of 'foreseeing and preventing' maintenance of mechanical equipment, the wear surface analysis technology is rapidly developing towards the in-situ and three-dimensional directions, and the in-situ on-machine detection technology based on the industrial endoscope becomes a main technical means. However, the accuracy and efficiency of wear surface analysis are restricted by the complicated shape and inconsistent size of the damage, and how to acquire three-dimensional morphology information from a single two-dimensional surface image becomes a difficult point in the research of wear surface analysis technology.

The wear surface topography analysis technology based on machine vision realizes three-dimensional topography information extraction based on two-dimensional wear surface images. For example, a wear surface image obtained by a scanning electron microscope is taken as a research object, and surface three-dimensional reconstruction is realized by using a multi-view geometric constraint method. The fusion technique of shadow shape restoration and stereo vision enables estimation of wear surface depth information. The method for enhancing the shadow shape recovery transformation based on the complex wavelet is used for acquiring the surface three-dimensional roughness of the milling mechanical part. In addition, photometric stereo vision is innovatively used in the extraction of three-dimensional topography information of the in-situ wear surface. However, the method depends on auxiliary information such as multiple views, multiple ideal assumptions, multiple light sources and the like, but a complex mechanical equipment industrial environment and an endoscope imaging system cause difficulty in building a multi-vision system and acquiring images, and limit application scenarios of the method in endoscope-based wear surface detection. For this reason, three-dimensional topography reconstruction remains an important task for wear surface analysis.

In recent years, the monocular depth estimation method provides a better research prospect for the depth estimation of a single worn surface by establishing the mapping relation between the pixel value and the depth value of a single two-dimensional image. However, the problems of fuzzy damage area edges and inaccurate damage shape estimation still exist in the depth estimation of the damage area of the wear surface. The key reason is that the existing monocular depth estimation model processes the whole area of the wear surface with the same weight, but the damage area is only a small part of the wear surface, so that the three-dimensional appearance of the damage area cannot be effectively reconstructed.

Generally, the existing three-dimensional reconstruction technology of the wear surface achieves certain engineering effect in condition monitoring and fault diagnosis. However, due to the restriction of the endoscope volume and the complex imaging environment inside the mechanical equipment, the application of the three-dimensional topography acquisition technology based on multi-view and multi-light-source assistance is limited, and the problems of fuzzy edges of a wear damage area, inaccurate damage topography estimation and the like exist in a single-image depth estimation model, so that the accuracy of extracting the three-dimensional topography of the wear surface damage area is reduced.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a wear surface damage depth estimation method and system based on a multiple attention mechanism, aiming at the defects in the prior art, wherein the multiple attention mechanism is introduced into a U-net network architecture to extract a feature map of a wear area of a wear surface with more attention, so as to realize damage depth information estimation from a single wear surface image, and provide a more effective three-dimensional topography information acquisition method for a wear surface analysis technology.

The invention adopts the following technical scheme:

the method for estimating the damage depth of the wear surface based on the multi-attention machine system comprises the following steps:

s1, constructing a wear surface basic feature extraction layer by taking four layers of convolution blocks in a ResNet-50 coding layer as a main body and combining the convolution blocks of two layers of Conv-ReLU;

s2, constructing a damaged area segmentation branch network and a depth information estimation branch network based on the wear surface basic feature extraction layer constructed in the step S1 by combining the U-Net structure and the wear surface characteristics, and fusing the damaged area segmentation branch network and the depth information estimation branch network to form a wear surface depth estimation model;

s3, constructing a depth information mean square error loss, a damage detection cross entropy loss, an edge detection mean square error loss and a structure consistency loss function based on the damaged region segmentation network and the depth information estimation network, and obtaining a loss function of the wear surface depth estimation model constructed in the step S2 in a weighting mode;

s4, acquiring a two-dimensional wear surface image, and making a corresponding damage area marking map and a depth map; selecting a wear surface image with a typical damage area as a training sample, taking the loss function constructed in the step S3 as an optimization target, training the wear surface depth estimation model constructed in the step S2 by adopting an adaptive moment estimation method, and taking a single wear surface image as the input of the trained wear surface depth estimation model to obtain a damage area segmentation result graph and a depth information result graph of the wear surface.

Specifically, step S1 specifically includes:

s101, constructing two convolution blocks with a Conv-ReLU standard structure, and realizing primary feature extraction of the wear surface image;

s102, extracting Res1, Res2, Res3 and Res4 in the ResNet-50 feature extraction network, wherein the Res1, Res2, Res3 and Res4 in the ResNet-50 feature extraction network share 4 layers of convolution blocks as high-level semantic feature extraction blocks of the worn surface, and combining the two Conv-ReLU convolution blocks in the step S101 to establish a basic feature extraction network to realize basic feature extraction of the worn surface image.

Further, in step S101, constructing two convolution blocks with the standard structure of Conv-ReLU specifically includes:

the first convolution block and the second convolution block adopt convolution blocks with the structure of Conv-ReLU to perform primary image feature extraction, and each convolution block comprises two layers of convolution operation; performing convolution operation on each layer of a first convolution block by adopting 3 convolution kernels with the size of 3 multiplied by 3 and a step size stride of 1, and performing characteristic nonlinear mapping by adopting a ReLU activation function to obtain an F0 characteristic diagram; and each layer of the second convolution block adopts 64 convolution kernels with the size of 3 × 3 and the step length of stride 2 and stride 1 respectively to carry out convolution operation, then a ReLU activation function is adopted to carry out characteristic nonlinear mapping to obtain an F1 characteristic diagram, and finally a characteristic diagram of 64 channels is output to the subsequent four convolution blocks.

Specifically, in step S2, a damaged region segmentation branch network is established by using convolution blocks with the structures of Conv-BN-ReLU and Convtransp-BN-ReLU, and a damaged region segmentation result graph corresponding to the size of the two-dimensional wear surface is extracted; and establishing a depth information estimation branch network based on the U-Net network architecture, and estimating the depth information of the damaged area.

Further, the establishing of the damaged area division branch network specifically includes:

the convolution layers with the structures of Conv-BN-ReLU and Convtransp-BN-ReLU are connected in series; for a first structure Conv-BN-ReLU, firstly carrying out convolution operation through a convolution kernel with the size of 3 multiplied by 3 and a step size stride equal to 1, then carrying out normalization processing on characteristics by adopting BN batch normalization operation, accelerating network convergence, and finally carrying out characteristic nonlinear mapping by adopting a ReLU activation function to obtain a characteristic diagram after noise suppression; for the second structure Convtransp-BN-ReLU, the difference from the structure Conv-BN-ReLU is that a Convtransp deconvolution operation of 3 × 3 size is used, and the step size stride is 2, finally achieving a 2-fold enlargement of the input feature map; then cascading the feature graph with the feature graph from the basic feature extraction layer; and finally, mapping the P0 feature maps of the 32 channels to a damaged area segmentation map of the 3-dimensional channels by adopting a convolution block with the structure of Conv-ReLU.

Further, the establishment of the depth information estimation branch network based on the U-Net network architecture specifically includes:

adopting convolution blocks with the structures of Conv-BN-ReLU and Convtransp-BN-ReLU as upsampling operation; expanding the input feature map by two times to be used as the input of a coordinate attention module; embedding the amplified feature map into a coordinate attention module, acquiring a feature image containing position coordinate information, and performing cascade combination on the feature image and the feature map extracted by the efficient pyramid module; and extracting a characteristic image with local damage area information from the characteristic image of the coding layer through a high-efficiency pyramid segmentation attention module, and fusing and cascading the characteristic image with the characteristics extracted by the coordinate attention module to obtain a wear surface characteristic image with position information, space information and channel information.

Specifically, step S3 specifically includes:

s301, constructing a depth estimation loss function of self-adaptive distribution of weight of a damaged area

S302, aiming at the difficult problem of predicted edge blurring caused by large edge depth change of the damaged area, in order to improve the estimation precision of the edge depth value of the damaged area, a two-dimensional Haar wavelet three-layer transformation is adopted to construct an edge detection loss function

S303, dividing the wear surface area into three types of background, scratch and pit, and selecting a three-type cross entropy function as a loss function of the damage area segmentation network branch

S304, adopting a structure consistency loss function

Improving the similarity between the predicted depth map and the measurement result of the laser confocal microscope;

and S305, obtaining a loss function of the wear surface depth estimation model through a weighted summation mode on the basis of the depth information mean square error loss obtained in the step S301, the edge detection loss function obtained in the step S302, the loss function of the damaged area segmentation network branch obtained in the step S303 and the structure consistency loss function obtained in the step S304.

In particular, the loss function of the wear surface depth estimation model

Comprises the following steps:

wherein, y represents the wear surface estimated depth map,

representing the depth map measured by a confocal laser microscope, p being the predicted pixel class,

and lambda is the weight coefficient of the edge loss function of the control depth map for the actual pixel point category.

Specifically, step S4 specifically includes:

s401, acquiring a two-dimensional wear surface image with a damaged area, acquiring a depth image of the corresponding area and a mark image of the corresponding damaged area, and manufacturing a training sample and a test sample;

s402, using ResNet-50 network weight trained on an ImageNet data set as an encoder initialization weight parameter, and training a wear surface depth estimation model by adopting the training sample manufactured in the step S401;

and S403, setting learning rate parameters, optimizing and training the wear surface depth estimation model in the step S402 by adopting an adaptive moment estimation method, inputting the test sample manufactured in the step S401 into the optimally trained wear surface depth estimation model, and realizing the wear surface damage region depth information estimation based on the multiple attention mechanism surface feature extraction and damage region positioning guidance.

In a second aspect, an embodiment of the present invention provides a wear surface damage depth estimation system based on a multi-attention machine system, including:

the extraction module is used for constructing a wear surface basic feature extraction layer by taking four layers of convolution blocks in a ResNet-50 coding layer as a trunk and combining the convolution blocks of two layers of Conv-ReLU;

the fusion module is used for constructing a damaged area segmentation branch network and a depth information estimation branch network based on a wear surface basic feature extraction layer constructed by the extraction module by combining a U-Net structure architecture and wear surface characteristics, and fusing the damaged area segmentation branch network and the depth information estimation branch network to serve as a wear surface depth estimation model;

the function module is used for constructing a depth information mean square error loss, a damage detection cross entropy loss, an edge detection mean square error loss and a structure consistency loss function based on the damaged area segmentation network and the depth information estimation network, and obtaining a loss function of the wear surface depth estimation model constructed by the fusion module in a weighting mode;

the estimation module is used for acquiring a two-dimensional wear surface image and making a corresponding damage area marking map and a depth map; selecting a wear surface image with a typical damage area as a training sample, taking a loss function constructed by a function module as an optimization target, training a wear surface depth estimation model constructed by a fusion module by adopting an adaptive moment estimation method, and taking a single wear surface image as the input of the trained wear surface depth estimation model to obtain a damage area segmentation result image and a depth information result image of the wear surface.

Compared with the prior art, the invention has at least the following beneficial effects:

the invention relates to a wear surface damage depth estimation method based on a multi-attention machine system, which takes a single two-dimensional wear surface image as a research object, selects a ResNet-50 coding layer as a basic coding layer trunk and extracts characteristic graphs of wear surfaces under different scales; establishing a damaged area segmentation branch network and a depth information estimation branch network by combining an efficient pyramid segmentation attention module (EPSA) module, a Coordinate Attention (CA) module and the like to detect a damaged area of the wear surface and estimate depth information of the damaged area; the network model adopts the weighted sum of four types of loss functions including depth information mean square error loss, damage detection cross entropy loss, edge detection mean square error loss and structure consistency loss as an overall loss function, parameter training is carried out by using an Adam optimization algorithm to obtain a final surface depth estimation model, the depth information estimation of a damaged area of a wear surface is realized, and more effective information is provided for wear mechanism and state analysis

Further, step S1 constructs a basic feature extraction network structure based on the Conv-ReLU rolling block and the ResNet-50 feature extraction rolling block, so as to extract semantic features of the wear surface, provide semantic feature maps of different scales for a subsequent damaged region segmentation network and a depth estimation network, and contribute to improving the damaged region depth estimation accuracy.

Further, step S101 adopts convolution blocks of two layers of Conv-ReLU to perform wear surface primary semantic feature extraction, extracts a 64-channel F1 feature map from the damage surface map input as 3 channels, provides rich primary semantic feature maps for subsequent ResNet-50 feature extraction convolution blocks, and extracts a 64-channel F1 feature map to a feature map F5 with 2048 channels by the ResNet-50 feature extraction convolution block, thereby implementing wear surface multi-level semantic feature extraction.

Further, in step S2, a damaged region segmentation branch network is established by using convolution blocks with the structures of Conv-BN-ReLU and Convtransp-BN-ReLU, and a damaged region segmentation result map corresponding to the size of the two-dimensional wear surface is extracted, so as to provide more damaged region feature maps for the depth estimation network, constrain and position the damaged region, and improve the depth estimation accuracy of the depth estimation network for the damaged region; and establishing a depth information estimation branch network based on the U-Net network architecture, and fully utilizing the multi-level semantic features and the damaged area segmentation feature map in the step S1 to realize and improve the depth information estimation precision of the damaged area.

Further, a damage region segmentation branch network is constructed to obtain the position of the damage region and a corresponding characteristic diagram, so that rich damage region characteristic diagrams are provided for a subsequent depth estimation branch network.

Further, based on a U-Net network architecture, the multi-level semantic features extracted in step S1, the feature map extracted by the damaged region segmentation branch network, and the features extracted by the multi-attention mechanism module are continuously fused and superimposed, the feature map is continuously restored to the depth estimation result map with the same size as the input worn surface image by using upsampling, and meanwhile, higher damaged region texture detail information can be maintained.

Further, in step S3, a damage region base depth loss function is first constructed based on step S301

To constrain the differences between the depth estimation results and the label depth map of the LSCM acquisition; secondly, adding an edge detection loss function on the basis of the step S301

To improve the estimated damage region edge definition; after step S301 and step S302, a loss function is employed

Classifying the estimated pixel points to segment the damaged area, and finally estimating and extracting the depth information of the damaged area; finally adopting a structure consistency loss function

So as to improve the overall accuracy of the final estimated depth image of the damaged area.

Further, the final loss function is composed of depth information mean square error loss, damage detection cross entropy loss, edge detection mean square error loss and structure consistency loss function; wherein depth information mean square error loss

A loss function of the depth estimation branch is used for ensuring the accuracy of the depth map estimation; damage detection cross entropy loss

Accurately dividing the damaged area; edge detection mean square error loss

The method is used for improving the edge definition of the damaged area in the depth map; the structural consistency loss function is used to improve the overall topography accuracy of the final wear surface depth map.

Further, a wear surface damage area marking map and a depth map are made to serve as designed depth estimation network input data, and the loss function designed in the step S3 is used as an optimization target to ensure that the estimated result of the final model is closer to the depth map acquired by the laser confocal microscope; and (4) training the wear surface depth estimation model constructed in the step S2 by adopting an adaptive moment estimation method to improve the convergence rate of the network, so that the depth estimation model can be trained quickly and a better depth estimation result can be obtained.

It is understood that the beneficial effects of the second aspect can be referred to the related description of the first aspect, and are not described herein again.

In conclusion, the three-dimensional depth information of a single abrasion surface image is effectively estimated based on the monocular depth estimation model, and the problems of high difficulty in acquiring the depth information, low efficiency and high complexity in the technical field of abrasion surface analysis are solved.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a framework diagram of a wear surface depth estimation step;

FIG. 2 is a flow chart of the production of a two-dimensional wear surface map, a wear surface depth map, and a damaged area signature map;

FIG. 3 is a three-dimensional visualization of typical wear surface damage maps and their corresponding depth maps, damage region signature maps, and depths;

FIG. 4 is a network architecture diagram of a wear surface damage region depth estimation model based on multiple self-attention mechanism fusion;

FIG. 5 is a comparison of loss values during model training with different attention mechanism models;

fig. 6 is a depth estimation result map of three types of wear surfaces and a corresponding depth three-dimensional visualization map.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be understood that the terms "comprises" and/or "comprising" indicate the presence of the stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and including such combinations, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

It should be understood that although the terms first, second, third, etc. may be used to describe preset ranges, etc. in embodiments of the present invention, these preset ranges should not be limited to these terms. These terms are only used to distinguish preset ranges from each other. For example, the first preset range may also be referred to as a second preset range, and similarly, the second preset range may also be referred to as the first preset range, without departing from the scope of the embodiments of the present invention.

The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.

The invention provides a wear surface damage depth estimation method based on a multi-attention machine system, which takes a two-dimensional wear surface image as a research object, makes a corresponding damage area marking image, and makes a corresponding standard depth image by using a laser confocal microscope; on the basis, constructing a wear surface depth estimation network model of a double-task model based on a U-Net framework; extracting a network by taking a ResNet-50 network as a basic feature, extracting a wear surface basic feature, and inputting the wear surface basic feature into a damaged area segmentation and depth information estimation double-branch network; merging the feature maps of a Coordinate Attention (CA) module, a high-Efficiency Pyramid Segmentation Attention (EPSA) module and a damaged area segmentation branch into a depth estimation network branch to obtain a feature map with damaged area information; the constructed network model adopts depth information mean square error loss, damage detection cross entropy loss, edge detection mean square error loss and a structure consistency loss function, and obtains a final worn surface depth estimation model by using an Adam optimization algorithm training model through weighting and taking the weighted sum as an overall loss function, so as to realize depth estimation of a worn surface damage area; the method is based on the monocular depth estimation model, effectively realizes the estimation of the three-dimensional depth information of a single abrasion surface image, and solves the problems of high difficulty, low efficiency and high complexity in the depth information acquisition in the technical field of abrasion surface analysis.

Referring to fig. 1, a depth estimation model of a damaged area of a wear surface based on fusion of multiple self-attention mechanisms is shown, where three-dimensional morphology information of the wear surface is the basis of an abrasive particle analysis technique, which directly affects the precision of damage detection and state identification of the wear surface. However, the three-dimensional reconstruction method based on multiple views, multiple light sources and the like is difficult to be applied to the measurement environment of the size and the narrowness of an industrial endoscope, and the problems of inaccurate shape estimation of a damaged area, fuzzy edge and the like still exist in the information estimation of the damaged depth of the worn surface based on the monocular depth estimation method, so that the damaged shape reconstruction accuracy of the monocular depth estimation model in the practical application is greatly reduced. According to the method, a multiple attention mechanism is integrated into a depth estimation branch coding layer based on a U-Net network framework, so that the characterization capability of a wear surface characteristic graph on a damage area is improved, then, damage area segmentation branches are introduced into a wear surface depth estimation model, further more damage area positioning characteristic graphs are provided for the depth estimation branches, and finally, a double-branch task model is constructed, so that the detection and depth information estimation of the wear surface damage area are realized.

The invention relates to a wear surface damage depth estimation method based on a multi-attention machine mechanism, which comprises the following steps of:

s1, constructing a wear surface basic feature extraction layer by taking four layers of convolution blocks in a ResNet-50 coding layer as a backbone and combining the convolution blocks of two layers of Conv-ReLU to realize automatic extraction of wear surface multi-scale features as feature input in the step S2;

referring to fig. 2, the basic feature extraction layer includes six volume blocks, wherein the first and second volume blocks adopt convolution blocks with a structure of Conv-ReLU to realize primary feature extraction of the wear surface image;

the construction of two convolution blocks with the standard structure Conv-ReLU specifically comprises the following steps:

the first convolution block and the second convolution block adopt convolution blocks with the structure of Conv-ReLU to carry out primary image feature extraction, and each convolution block comprises two layers of convolution operation; each layer of the first convolution adopts 3 convolution kernels with the size of 3 multiplied by 3 and the step size stride being 1 to carry out convolution operation, and then adopts a ReLU activation function to carry out characteristic nonlinear mapping to obtain an F0 characteristic diagram; the second convolution block adopts 64 convolution kernels with the size of 3 × 3 in each layer, the step length is stride 2, stride 1 respectively, the convolution operation is carried out, then the ReLU activation function is adopted to carry out characteristic nonlinear mapping, an F1 characteristic diagram is obtained, and finally the characteristic diagram of 64 channels is output to the following four convolution blocks.

And (3) performing further wear surface feature extraction on the last four convolution blocks by adopting 4 layers of convolution blocks including Res1, Res2, Res3 and Res4 in a ResNet-50 feature extraction network, wherein the number of feature channels output by each convolution block is 256, 512, 1024 and 2048 respectively, so that an F2-F5 feature map is obtained, and high-level feature extraction of the wear surface image is realized.

s201, establishing a damaged area segmentation branch by adopting convolution blocks with structures of Conv-BN-ReLU and Convtransp-BN-ReLU, and extracting a damaged area segmentation result graph corresponding to the size of the two-dimensional wear surface.

The damage region division branches into a decoder of a U-Net framework, and the forward propagation process of the damage region division is opposite to that of the encoder. When a decoder is constructed, upsampling is carried out on input features by adopting upsampling operation in each step, an input feature layer is amplified by 2 times, and finally, the input features are mapped into a damaged area segmentation result graph with the same size as an image of a wear surface through 5 times of upsampling operation and jump connection operation.

The specific steps of establishing the damaged area division branch network are as follows:

s2011, upsampling operation

The structure is mainly composed of convolution layers of Conv-BN-ReLU and Convtransp-BN-ReLU which are connected in series; for a first structure Conv-BN-ReLU, firstly carrying out convolution operation through a convolution kernel with the size of 3 multiplied by 3 and a step size stride equal to 1, then carrying out normalization processing on characteristics by adopting BN batch normalization operation, accelerating network convergence, and finally carrying out characteristic nonlinear mapping by adopting a ReLU activation function to obtain a characteristic diagram after noise suppression; for the second structure Convtransp-BN-ReLU, the difference from the structure Conv-BN-ReLU is that a Convtransp deconvolution operation of 3 × 3 size is used, and the step size stride is 2, finally achieving a 2-fold enlargement of the input feature map;

s2012, jump connection operation

After the up-sampling operation, the feature map and the feature map from the basic feature extraction layer are cascaded, so that the feature map can have higher resolution in the up-sampling process.

S2013, convolution Block 1 operation

And finally, mapping the P0 feature maps of the 32 channels to a damaged area segmentation map of the 3-dimensional channels by adopting a convolution block with the structure of Conv-ReLU.

S202, establishing a depth information estimation branch network based on a U-Net network architecture to estimate depth information of a damaged area;

the depth information estimation branch structure also corresponds to the coding layer, and the convolution blocks using Conv-BN-ReLU and Convtransp-BN-ReLU perform an upsampling operation to restore the feature map size to be the same as the wear surface image. In order to fully extract the characteristic image with the local damage area information, a Coordinate Attention (CA) module and an Efficient Pyramid Segmentation Attention (EPSA) module are respectively introduced, and a characteristic map from a damage area segmentation branch is merged in the last convolution block operation, so that the attention to the typical wear area information of the wear surface in the wear surface depth estimation process is further strengthened, and the damage area depth information estimation is finally realized.

S2021, adopting convolution blocks with the structures of Conv-BN-ReLU and Convtransp-BN-ReLU as an upsampling operation;

and inputting the feature graph obtained by up-sampling the features from the feature basic coding layer and amplified by 2 times into a Coordinate Attention (CA) module for further attention feature extraction, and acquiring a feature image containing position coordinate information.

S2022, embedding a Coordinate Attention (CA) module, and acquiring a feature image containing position coordinate information;

when jumping connection is carried out, the basic coding layer features of corresponding positions are extracted by a high-Efficiency Pyramid Segmentation Attention (EPSA) module to obtain feature images with local damage area information, and then the feature images and the feature images containing position coordinate information are fused and cascaded to obtain a wear surface feature map with damage area position information, space information and channel information.

S2023, extracting the feature image with the local damage area information from the feature image of the coding layer through an Efficient Pyramid Segmentation Attention (EPSA) module, and fusing and cascading the feature image with the features extracted by the CA module to obtain the wear surface feature image with position information, space information and channel information.

After five times of up-sampling-CA-EPSA-jump connection cascade operation, the feature maps from the damaged area segmentation network are fused, and finally the C0 feature maps of 64 channels are mapped to the wear surface depth map of the 1-dimensional channel by the convolution block with the Conv-ReLU structure.

S3, aiming at the damaged area segmentation and depth information estimation double-branch structure constructed in the step S2, constructing a depth information mean square error loss, a damage detection cross entropy loss, an edge detection mean square error loss and a structure consistency loss function, and obtaining an integral model loss function in a weighting mode to improve the depth estimation precision of the damaged area of the model;

the loss function is the target of network optimization training, and network parameter optimization is guided by back propagation through errors between prediction results of the damage segmentation graph and the depth graph and a real labeled graph. The constructed damaged region segmentation branch and the depth information estimation branch respectively correspond to damage detection cross entropy loss and depth information mean square error loss, and in order to solve the problems of edge blurring and damaged region morphology blurring of a predicted depth map, a Haar wavelet-based edge detection mean square error loss and structure consistency loss function are adopted, and an integral model loss function is constructed in a weighting mode.

S301, by calculating the mean square error between the estimated depth map of the worn surface and the depth map obtained by the laser confocal microscope, the difference between the predicted depth map and a target value is constrained, more weights are distributed to the damaged area, the problem that the worn surface area is not uniformly distributed is solved, and a depth estimation loss function of self-adaptive distribution of the weights of the damaged area is constructed, so that the accuracy of estimation of the depth information of the damaged area of the model is improved. As shown in formula (1) and formula (2);

a＝Num _{other areas} /Num _{damage areas} (2)

wherein i represents the coordinate position of the pixel point, N represents the number of the image pixel points, and y _i An estimated depth value representing the wear surface,

representing the depth value measured by a laser confocal microscope, wherein a is a super parameter for controlling the weight of each pixel point, and Num represents the number of the pixel points in the area;

s302, aiming at the difficult problem of predicted edge blurring caused by large edge depth change of the damaged area, in order to improve the estimation precision of the edge depth value of the damaged area, a two-dimensional Haar wavelet three-layer transformation is adopted to construct an edge detection loss function, as shown in a formula (3);

wherein V, C, H correspond to the vertical high frequency coefficient, the diagonal high frequency coefficient, and the horizontal high frequency coefficient of the depth image, respectively, L is a low frequency coefficient, y represents the wear surface estimated depth map,

representing a depth map measured by a confocal laser microscope.

S303, considering that the typical damage types of the wear surface are scratches and pits, dividing the wear surface area into three types of background, scratches and pits, and selecting a three-classification cross entropy function as a loss function of a damage area segmentation network branch, as shown in a formula (4);

in the formula, N is the number of image pixel points, and M is the number of region categories equal to 3; p is a radical of _ic Is an area label vector with the length of 3, the values are 0 and 1,

representing the probability that the prediction sample belongs to c.

S304, in order to improve the similarity between the predicted depth map and the measurement result of the laser confocal microscope, a structural consistency loss function is adopted, as shown in a formula (5);

in the formula, mu _y And

respectively represent y and

is determined by the average value of (a) of (b),

and

respectively represent y and

the variance of (a);

the expression is y and

covariance of c ₁ And c ₁ Is a constant that prevents the denominator from being zero, y represents the wear surface estimated depth map,

representing a depth map measured by a confocal laser microscope.

S305, obtaining a loss function of the whole model through a weighted summation mode on the basis of depth information mean square error loss, damage detection cross entropy loss, edge detection mean square error loss and structure consistency loss function, as shown in formula (6).

Wherein λ is a weight coefficient of the edge loss function of the control depth map.

S4, acquiring a two-dimensional wear surface image, and making a corresponding damage area marking map and a depth map; selecting not less than 500 groups of wear surface images with typical damage areas as training samples, taking the model loss function constructed in the step S3 as an optimization target, training the constructed wear surface depth estimation model by adopting an adaptive moment estimation method (Adam), taking a single wear surface image as the input of the model based on the trained depth estimation model, and finally outputting a damage area segmentation result graph and a depth information result graph of the wear surface.

Referring to fig. 3, the surface to be measured acquires a surface image of a certain position by using a surface image in-situ acquisition system, and marks a damaged area of the surface image by using image marking software label to obtain a damaged area marking map, and fig. 4 shows three types of measured wear surface maps.

S401, acquiring a two-dimensional wear surface image with a damaged area through a handheld digital microscope, acquiring a depth image of a corresponding area through a laser confocal microscope, acquiring a marking image of the corresponding damaged area through image marking software, and realizing the manufacture of training and testing samples;

s402, using ResNet-50 network weights trained on the ImageNet data set as an encoder initialization weight parameter, and further training a wear surface depth estimation model by adopting the training sample manufactured in the step S401;

and S403, setting learning rate parameters, optimizing and training a wear surface depth estimation model by adopting an adaptive moment estimation method (Adam), and realizing wear surface damage region depth information estimation based on multiple attention mechanism surface feature extraction and damage region positioning guidance.

Setting a learning rate parameter to be 0.0003, and optimizing a wear surface depth estimation model constructed by training by adopting an adaptive moment estimation method (Adam), wherein the training iteration times and the number of input images in each batch are respectively 200 and 2; and comparing the convergence rates of the network models under different attention mechanism combinations, wherein the training process is shown in FIG. 5; therefore, the depth information estimation of the damaged area of the worn surface based on the extraction of the surface features of the multiple attention mechanism and the positioning guidance of the damaged area is realized.

In another embodiment of the present invention, a system for estimating a damage depth of a wear surface based on a multi-attention machine system is provided, where the system can be used to implement the method for estimating a damage depth of a wear surface based on a multi-attention machine system, and specifically, the system for estimating a damage depth of a wear surface based on a multi-attention machine system includes an extraction module, a fusion module, a function module, and an estimation module.

The extraction module is used for constructing a wear surface basic feature extraction layer by taking four layers of convolution blocks in a ResNet-50 coding layer as a main body and combining the convolution blocks of two layers of Conv-ReLU;

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The present invention compares the variation of the training convergence loss function value of the model with the attention mechanism and different combinations of the attention mechanism, as shown in fig. 5. The number 1 represents a model without an attention mechanism, the number 2 represents a model with only one CA coordinate attention mechanism, the number 3 represents a model with only one EPSA efficient pyramid segmentation attention module, and the number 4 represents a depth estimation model with two types of CA and EPSA coordinate attention mechanism modules. The training convergence speed is fastest and the minimum loss error value is 38.6 when the final number 4 is obtained from the graph, and the errors of the numbers 1, 2 and 3 are 83.2, 69.1 and 45.6 respectively, so that the superiority of the model constructed by the method is further proved.

Based on the model trained by the attention mechanism combination set by the number 4, three damage wear surface maps, namely scratches, pits, scratches and pit maps, are input, corresponding depth maps are obtained through prediction, and the difference between the wear surface depth information estimation result map and the depth map acquired by the laser confocal microscope by different methods is compared to verify the effectiveness of the constructed damage area depth estimation model of the wear surface, and as a result, as shown in fig. 6, it can be seen from the figure that the damage area depth estimation model constructed by the invention obtains the minimum estimation error, compared with the depth map acquired by the laser confocal microscope, the root mean square error RMSE is 1.49 μm and is far smaller than 61.64 μm of the SE-Net method and 41.33 μm of the BS-Net method estimation, the effectiveness of the model constructed by the invention is further proved.

In conclusion, the wear surface damage depth estimation method and system based on the multi-attention mechanism construct a dual-task branch model for damaged area segmentation and depth information estimation, realize synchronous acquisition of the depth information of the damaged area of the wear surface and the corresponding position and damage type of the damaged area under a single U-net network frame, and provide effective guide information for state monitoring and fault diagnosis based on the wear surface; a comprehensive loss function taking depth information mean square error loss, damage detection cross entropy loss, edge detection mean square error loss and structure consistency loss as optimization targets is constructed, and the sensitivity of depth estimation deviation of a damaged area is enhanced, so that the depth information precision of the damaged area estimated by the model is improved.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. The method for estimating the damage depth of the wear surface based on the multi-attention machine system is characterized by comprising the following steps of:

2. The method for estimating damage depth of a wear surface based on a multi-attention machine system according to claim 1, wherein the step S1 is specifically as follows:

3. The method for estimating damage depth of a wear surface based on a multi-attention mechanism as claimed in claim 2, wherein in step S101, constructing two convolution blocks with a standard structure of Conv-ReLU is specifically:

the first convolution block and the second convolution block adopt convolution blocks with the structure of Conv-ReLU to perform primary image feature extraction, and each convolution block comprises two layers of convolution operation; performing convolution operation on each layer of a first convolution block by adopting 3 convolution kernels with the size of 3 multiplied by 3 and a step size stride of 1, and performing characteristic nonlinear mapping by adopting a ReLU activation function to obtain an F0 characteristic diagram; the second convolution block carries out convolution operation by adopting 64 convolution kernels with the size of 3 × 3 and the step length of stride 2 and stride 1 respectively, then carries out characteristic nonlinear mapping by adopting a ReLU activation function to obtain an F1 characteristic diagram, and finally outputs the characteristic diagram of 64 channels to the following four convolution blocks.

4. The method for estimating damage depth to a wear surface based on a multi-attention mechanism as claimed in claim 1, wherein in step S2, a damage region segmentation branch network is established by using convolution blocks with the structures of Conv-BN-ReLU and Convtransp-BN-ReLU, and a damage region segmentation result map corresponding to the size of the two-dimensional wear surface is extracted; and establishing a depth information estimation branch network based on the U-Net network architecture, and estimating the depth information of the damaged area.

5. The method for estimating the damage depth of the wear surface based on the multi-attention machine mechanism according to claim 4, wherein the establishing of the damage region segmentation branch network specifically comprises:

the convolution layers with the structures of Conv-BN-ReLU and Convtransp-BN-ReLU are connected in series; for a first structure Conv-BN-ReLU, firstly carrying out convolution operation through a convolution kernel with the size of 3 multiplied by 3 and a step size stride equal to 1, then carrying out normalization processing on characteristics by adopting BN batch normalization operation, accelerating network convergence, and finally carrying out characteristic nonlinear mapping by adopting a ReLU activation function to obtain a characteristic diagram after noise suppression; for the second structure Convtransp-BN-ReLU, the difference from the structure Conv-BN-ReLU is that a Convtransp deconvolution operation of 3 × 3 size is used, and the step size stride is 2, which finally achieves a 2-fold enlargement of the input feature map; then cascading the feature graph with the feature graph from the basic feature extraction layer; and finally, mapping the P0 feature maps of the 32 channels to a damaged area segmentation map of the 3-dimensional channels by adopting a convolution block with the structure of Conv-ReLU.

6. The method for estimating the damage depth of the wear surface based on the multi-attention mechanism according to claim 4, wherein the establishing of the depth information estimation branch network based on the U-Net network architecture is specifically as follows:

convolution blocks with the structures of Conv-BN-ReLU and Convtransp-BN-ReLU are adopted as upsampling operation; expanding the input feature map by two times to be used as the input of a coordinate attention module; embedding the amplified feature map into a coordinate attention module, acquiring a feature image containing position coordinate information, and performing cascade combination on the feature image and the feature map extracted by the efficient pyramid module; and extracting a characteristic image with local damage area information from the characteristic image of the coding layer through a high-efficiency pyramid segmentation attention module, and fusing and cascading the characteristic image with the characteristics extracted by the coordinate attention module to obtain a wear surface characteristic image with position information, space information and channel information.

7. The method for estimating damage depth of a wear surface based on a multi-attention machine system according to claim 1, wherein the step S3 is specifically as follows:

S304, adopting a structure consistency loss function

8. The method of claim 1, wherein the wear surface damage depth estimation model is a loss function of a wear surface depth estimation model

Comprises the following steps:

wherein, y represents the wear surface estimated depth map,

9. The method for estimating damage depth of a wear surface based on a multi-attention machine system according to claim 1, wherein the step S4 is specifically as follows:

10. A multi-attention mechanism based wear surface damage depth estimation system, comprising: