CN117152142A - Bearing defect detection model construction method and system - Google Patents

Bearing defect detection model construction method and system Download PDF

Info

Publication number
CN117152142A
CN117152142A CN202311415402.3A CN202311415402A CN117152142A CN 117152142 A CN117152142 A CN 117152142A CN 202311415402 A CN202311415402 A CN 202311415402A CN 117152142 A CN117152142 A CN 117152142A
Authority
CN
China
Prior art keywords
image
bearing
model
patch image
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311415402.3A
Other languages
Chinese (zh)
Other versions
CN117152142B (en
Inventor
王凯
田楷
晏文仲
陈立名
胡江洪
曹彬
方超群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fitow Tianjin Detection Technology Co Ltd
Original Assignee
Fitow Tianjin Detection Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fitow Tianjin Detection Technology Co Ltd filed Critical Fitow Tianjin Detection Technology Co Ltd
Priority to CN202311415402.3A priority Critical patent/CN117152142B/en
Publication of CN117152142A publication Critical patent/CN117152142A/en
Application granted granted Critical
Publication of CN117152142B publication Critical patent/CN117152142B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Geometry (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method and a system for constructing a bearing defect detection model, which belong to the technical field of defect detection and comprise the following steps: s1, carrying out image preprocessing on sample images in a sample image set; s2, training a target detection model by using the complete image set; s3, adjusting the parameter weight in the CLIP model; the adjustment process comprises the following steps: for each patch image input by the KQ branch of the cross attention module in the image encoder module, performing linear transformation and then performing point multiplication weight weighting; introducing a low confidence detection result of target detection into the V branch of the cross attention module; s4, training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model. The application combines the target detection model and the CLIP model, and can improve the reliability of the result when the bearing parts have few sample defects.

Description

Bearing defect detection model construction method and system
Technical Field
The application belongs to the technical field of defect detection, and particularly relates to a method and a system for constructing a bearing defect detection model.
Background
Bearings (bearings) are an important component in mechanical devices. Its main function is to support the mechanical rotator, reduce the coefficient of friction (friction coefficient) during its movement and guarantee its precision of revolution (accuracies). In the manufacturing process of the bearing parts, forging, rolling, punching, turning, grinding, heat treatment and other working procedures are carried out.
In the bearing defect detection process, the traditional technology is as follows: firstly, collecting defect images, then marking and training by adopting a target detection algorithm, wherein the detection effect of the algorithm depends on the number and diversity of the collected defects. However, with the continuous optimization and maturation of the technology, the yield of the production enterprises is continuously improved, which makes the collection of the defect images more and more difficult, and the final result is that the performance of the target detection algorithm is insufficient and the defect missing detection condition is serious.
Disclosure of Invention
The application provides a method and a system for constructing a bearing defect detection model, which are used for solving the technical problems in the prior art, and combining a target detection model with a CLIP model to improve the reliability of results when few sample defects of bearing parts are detected.
The first object of the present application is to provide a method for constructing a bearing defect detection model, comprising:
s1, acquiring a sample image set of a bearing, and carrying out image preprocessing on sample images in the sample image set to obtain a complete image set containing complete information of the end face of the bearing and a patch image set containing local information of the end face of the bearing; each complete image consists of a plurality of patch images;
s2, training a target detection model by using the complete image set; obtaining the confidence coefficient of target detection;
s3, adjusting the parameter weight in the CLIP model; the image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; the adjustment process comprises the following steps:
for each patch image input by the KQ branch of the cross attention module, carrying out linear transformation and then carrying out point multiplication weight weighting;
introducing a low confidence detection result of target detection into a V branch of the cross attention module, wherein the range of the low confidence is 0.1-0.5;
s4, training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model; wherein: the training is performed by using the cross attention module only.
Preferably, the image preprocessing specifically includes:
s101, firstly graying a sample image to obtain a gray image; then dividing the gray image into a target area and a background area according to the pixel value of the area to be detected; finally, filling the hollow in the annular region by using a closed operation;
s102, firstly, reserving the contour of the end face of the bearing in an annular area to be detected through contour screening; then optimizing the detected annular closed contour by using a fitting circle algorithm, and finally buckling an annular region on an original image;
s103, marking defects: marking the type and the position of the defect on the image taken out by the buckle.
Preferably, in S3, for each patch image input by the KQ branch of the cross-attention module, the expression of the patch image calculation weight is: sin ((r pi/2)/(3) +1, r < = 1; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
Preferably, the low confidence detection result of the target detection is introduced in the V branch of the cross attention module, which specifically comprises:
performing linear transformation on each patch image input by the V branch, and then performing point multiplication on a confidence coefficient adjustment value detected by the target detection model, wherein the adjustment value is obtained through the following steps:
screening out a low confidence detection frame;
then, calculating the total area of each annotation frame in the patch image, which belongs to the area ratio of the patch image, leaving the annotation frame with the ratio of more than 40%, removing the annotation frame with the high confidence detection frame IOU of more than 50%, and calculating the left annotation frame and the IOU of the low confidence detection frame in the area of the patch image;
finally, the weight calculation is performed by using the IOU to input the following formula: cos (r pi/2) +1 to obtain the adjusted value; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
A second object of the present application is to provide a bearing defect detection model construction system, comprising:
sample module: acquiring a sample image set of a bearing, and carrying out image preprocessing on sample images in the sample image set to obtain a complete image set containing complete information of the end face of the bearing and a patch image set containing local information of the end face of the bearing; each complete image consists of a plurality of patch images;
a model preliminary training module; training a target detection model by using the complete image set; obtaining the confidence coefficient of target detection;
parameter weight adjustment module: adjusting the parameter weight in the CLIP model; the image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; the adjustment process comprises the following steps:
for each patch image input by the KQ branch of the cross attention module, carrying out linear transformation and then carrying out point multiplication weight weighting;
introducing a low confidence detection result of target detection into a V branch of the cross attention module, wherein the range of the low confidence is 0.1-0.5;
model retraining module: training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model; wherein: the training is performed by using the cross attention module only.
Preferably, the image preprocessing specifically includes:
s101, firstly graying a sample image to obtain a gray image; then dividing the gray image into a target area and a background area according to the pixel value of the area to be detected; finally, filling the hollow in the annular region by using a closed operation;
s102, firstly, reserving the contour of the end face of the bearing in an annular area to be detected through contour screening; then optimizing the detected annular closed contour by using a fitting circle algorithm, and finally buckling an annular region on an original image;
s103, marking defects: marking the type and the position of the defect on the image taken out by the buckle.
Preferably, in S3, for each patch image input by the KQ branch of the cross-attention module, the expression of the patch image calculation weight is: sin ((r pi/2)/(3) +1, r < = 1; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
Preferably, the low confidence detection result of the target detection is introduced in the V branch of the cross attention module, which specifically comprises:
performing linear transformation on each patch image input by the V branch, and then performing point multiplication on a confidence coefficient adjustment value detected by the target detection model, wherein the adjustment value is obtained through the following steps:
screening out a low confidence detection frame;
then, calculating the total area of each annotation frame in the patch image, which belongs to the area ratio of the patch image, leaving the annotation frame with the ratio of more than 40%, removing the annotation frame with the high confidence detection frame IOU of more than 50%, and calculating the left annotation frame and the IOU of the low confidence detection frame in the area of the patch image;
finally, the weight calculation is performed by using the IOU to input the following formula: cos (r pi/2) +1 to obtain the adjusted value; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
The third object of the present application is to provide a bearing defect detection model, which is constructed by the method for constructing a bearing defect detection model.
The fourth object of the present application is to provide a method for detecting a bearing defect, which uses the above-mentioned bearing defect detection model to complete the defect detection process.
The application has the advantages and positive effects that:
the image encoder of the CLIP model in the application comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; during construction, firstly, preprocessing a sample image to obtain a complete image of a bearing and a plurality of patch images, and then processing the patch images by using a cross attention module in an initial CLIP model to obtain KQV branches of the cross attention module; processing the complete image by utilizing target detection to obtain modal information, and adjusting the weight of the V branch model of the cross attention module by utilizing a low confidence detection result in the modal information; and finally, training the cross attention module in the CLIP model by using the patch image again, namely freezing the model weights extracted from the first six self-attention features during training, and training by using only the last six cross attention modules.
The bearing defect detection model constructed by the application utilizes the CLIP pre-training model and introduces new mode information, and uses the CLIP model and the target detection model to jointly carry out defect judgment. The detection key points of the application are as follows: the algorithm can be rapidly enabled to have the capability of distinguishing defect features by using a small amount of samples through the following two points:
1. the contrast learning algorithm of the CLIP model is improved, when the bearing image is trained, the base model of the CLIP is finely adjusted by introducing the modal information of the target detection result, and the base model can be quickly provided with the characteristic perception capability of new data through specific steps.
2. Because the algorithm characteristic of the CLIP has stronger characteristic distinguishing capability than that of the target detection pre-training model, the application selectively freezes part of the weights of the CLIP pre-training model for training, so that the improved CLIP model can achieve the same characteristic extracting capability as the target detection model by using a small amount of data.
Drawings
FIG. 1 is a flow chart of a method for constructing a bearing defect detection model in a preferred embodiment of the present application;
FIG. 2 is a block diagram of a CLIP model in accordance with a preferred embodiment of the present application;
FIG. 3 is a system block diagram of a bearing defect detection model building system in accordance with a preferred embodiment of the present application;
FIG. 4 is a schematic view of a bearing prior to image preprocessing in a preferred embodiment of the present application;
fig. 5 is a schematic view of a bearing after image preprocessing in a preferred embodiment of the present application.
Detailed Description
For a further understanding of the application, its features and advantages, reference is now made to the following examples, which are illustrated in the accompanying drawings in which:
the following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the technical solutions of the present application, all other embodiments obtained by a person skilled in the art without making any creative effort fall within the protection scope of the present application.
The CLIP model is a model trained by using large-scale text-image pairs, 4 hundred million text-image pairs (images) are collected to train the CLIP model, and after large-scale text-image pre-training, the cosine similarity of the input text and image is calculated through encoding to judge the matching degree of the data pairs. And then the model can be directly migrated to an image classification task without any fine adjustment of labeled data, and the zero-shot classification model can be directly realized.
The CLIP model learns image information by means of contrast learning, as shown in fig. 2, taking a rust bearing image as an example, a photo of rust on a bearing indicates that: an image of a rust bearing; the Image model inputs N 'Image-text' pairs containing one batch Image, the N images are Image coded by an Image Encoder, and the [ I ] is output 1 ,I 2 ,…I n ]Dimension (N, di), (here N pictures of one batch are encoded, I) 1 I.e. the coding vector of the first picture, I n Is the coding vector of the i-th picture, each coding vector having a length di. From I 1 To I n Form a code vector group I 1 ,I 2 ,I 3 ,…I N ]Then its size is (N, di)); simultaneously, N texts are output as [ T ] through a text encoder 1 ,T 2 ,…T n ]Dimension (N, dt), (length of each encoding vector is dt); wherein the positive sample pair I corresponds to the data pair indexed by T, such as I 1 And T is 1 ,I 2 And T is 2 The rest are negative samples, and N positive samples and N square-N negative samples can be finally obtained. Calculation I i And T is i Cosine similarity among the images and texts is higher as the maximum similarity indicates that the image-text correlation is higher, and the optimization targets are as follows: by maximizing the cosine similarity of the positive sample pair, i.e., the similarity when ij is the same, minimizing the cosine similarity of the negative sample, i.e., the similarity when ij is different,the Text Encoder and Image Encoder related weight parameters are trained using cross entropy loss. Can be summarized in the following form:
"·" is the cosine similarity of the text vector and the image vector.
Self-Attention (Self-Attention) and Cross-Attention (Cross-Attention) are variants of the Attention mechanism for capturing associations between different elements in sequence or multimodal data.
In the field of bearing defect detection, AI technology is used for target detection as a conventional means, but target detection generalization is limited, and only defect characteristics which are relatively similar to those of trained defects can be detected, so that the possibility of missed detection is generated. Therefore, a clip algorithm model with strong feature perception capability is added in the application, and the model technology is adjusted by a small amount of data, wherein other weights except the MLP in the image encoder are frozen and trained, so that the feature distinction capability of normal and defect data of the bearing is increased on the basis of original feature perception. Finally, the two algorithms are used for detecting the bearing shooting image in a cascading mode, so that missed detection is reduced.
Referring to fig. 1, a method for constructing a bearing defect detection model includes:
s1, acquiring a sample image set of a bearing, and performing image preprocessing on sample images in the sample image set to obtain a complete image set containing bearing end face information and a patch image set containing a plurality of bearing end face local information; in this embodiment, the specific process of image preprocessing includes:
s101, firstly, reading a sample image containing a bearing, and graying the sample image to obtain a gray image; then dividing the gray image into a target area and a background area according to the pixel value of the area to be detected; finally, the voids in the annular region are filled by using a closed operation in morphology, so that small noise can be removed, and the annular region can be better separated;
s102, firstly, only retaining the contour of the end face of the bearing in the annular region to be detected through contour screening; then further optimizing the detected annular closed contour by using a fitting circle algorithm, and finally, buckling an annular region on an original image;
s103, marking defects: marking the type and the position of the defect on the image taken out by the buckle;
marking the image taken out by the buckle by using a target detection marking form, namely marking the defect type and the position; the defect types of the bearing are defined as follows: rust (rust), bump (crush), crush (crush), swell (bulge).
Referring to fig. 4 and 5, fig. 4 is a sample image of the bearing before processing, fig. 5 is a labeled bearing image, and four rust labeling frames are total.
S2, model preliminary training; the method specifically comprises the following steps:
training a target detection model by using the complete image set; obtaining detection result modal information of target detection; the detection result modal information comprises the category, coordinates and confidence of target detection; the method comprises the following steps: and training the marked complete image by using a target detection model, wherein the training of the target detection model is to introduce detection result modal information of target detection when training the improved CLIP model.
S3, adjusting the parameter weight in the CLIP model; the image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; the adjustment process comprises the following steps:
since the area to be detected of the bearing image is in a ring state, the embodiment divides the bearing image into 9 patch images (other numbers can be adopted, such as 4 patch images or 16 patch images) according to the characteristics of the bearing image, and after the middle black irrelevant area is removed, the remaining 8 patch image blocks are used as the original input of KQV three branches of cross attention; the irrelevant area information can be screened out by removing the middle block, so that the training speed is improved.
For each patch image input by the KQ branch of the cross attention module, carrying out linear transformation and then carrying out point multiplication weight weighting; the weights of patch images with different importance degrees are different; the weight calculation process of the patch image is as follows:
firstly, calculating the ratio r of the area of a mark frame in the current patch image to the mark frame; and then inputting the ratio r to a designed weight function sin ((r pi/2)/(3) +1, wherein r is not more than 1, and calculating the weight value of the patch image. The function is to make the patch image with larger mark frame ratio obtain more weight and smaller weight than the patch image with smaller mark frame ratio, and then to calculate (r pi/2) to 3 times. In order to ensure that the calculated gradient does not disappear during training, 1 is added to the sin function as a whole, and the patch image without label frame overlapping is ensured to be capable of acquiring the weight of 1.
Introducing a low confidence detection result of target detection into a V branch of the cross attention module, wherein the range of the low confidence is 0.1-0.5; the complete image is inferred using the target detection model, and a low confidence detection result (confidence between 0.1 and 0.5) of target detection is introduced into the V branch, and the result participates in training as information of another mode of the cross-attention V branch. Specifically, the confidence coefficient adjustment value detected by the target detection model is multiplied after each patch image input by the V branch is subjected to linear transformation, and the adjustment value is as follows: firstly screening out a low confidence detection frame, then calculating the total area of each annotation frame belonging to the patch image in the area ratio of the annotation frame in the patch image, leaving the annotation frame with the ratio of more than 40%, removing the annotation frame with the ratio of more than 50% in the high confidence detection frame IOU, calculating the left annotation frame and the IOU of the low confidence detection frame in the patch image area, and using the IOU to input the following formula for weight calculation: cos (r.pi.2) +1. The function is mainly used for obtaining a higher weight from a patch image with low confidence and larger overlapping with a label frame, so that the CLIP model can learn characteristics in the patch image which are difficult to detect in target detection, and the detection capability of the characteristics is improved.
S4, training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model, wherein the training is performed by using the cross attention module only; that is, the model weights of the first six self-attention feature extraction are frozen during training, and only the last six self-attention feature extraction modules are improved into a cross-attention module; the operation not only maintains the feature extraction capability of the original pre-training model, but also fits new data.
Based on the above embodiment, the method may further include S5, defining a label of the patch image in the contrast training;
the area of the marking frame in the patch image is more than 10% of the total area of the marking frame, and the category of the marking frame is marked in the patch image. The following is the bearing class patch image tag naming convention: the a photo of (rust) on a bearing, the names of various defects of the bearing such as rust (rust), bump (bump), normal (crush), and sweell (bump) are filled in brackets as labels, wherein normal is a normal image without defects. The multi-tag incorporates a tag as the tag such as a photo of (last and bulb) on a bearing.
Referring to fig. 3, a system for constructing a bearing defect detection model includes:
sample module: acquiring a sample image set of a bearing, and carrying out image preprocessing on sample images in the sample image set to obtain a complete image set containing bearing end face information and a patch image set containing a plurality of bearing end face local information; the method specifically comprises the following steps:
s101, firstly, reading a sample image containing a bearing, and graying the sample image to obtain a gray image; then dividing the gray image into a target area and a background area according to the pixel value of the area to be detected; finally, the voids in the annular region are filled by using a closed operation in morphology, so that small noise can be removed, and the annular region can be better separated;
s102, firstly, only retaining the contour of the end face of the bearing in the annular region to be detected through contour screening; then further optimizing the detected annular closed contour by using a fitting circle algorithm, and finally, buckling an annular region on an original image;
s103, marking defects: marking the type and the position of the defect on the image taken out by the buckle;
marking the image taken out by the buckle by using a target detection marking form, namely marking the defect type and the position; the defect types of the bearing are defined as follows: rust (rust), bump (crush), crush (crush), swell (bulge).
A model preliminary training module; the model preliminary training specifically comprises the following steps:
training a target detection model by using the complete image set; obtaining detection result modal information of target detection; the detection result modal information comprises the category, coordinates and confidence of target detection; the method comprises the following steps: training the marked complete image by using a target detection model, wherein the training of the target detection model is to introduce detection result modal information of target detection when training an improved CLIP model;
performing primary training on a CLIP model by using a patch image set, wherein an image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; only using the cross attention module to train during training;
parameter weight adjustment module: adjusting the parameter weight in the CLIP model; the image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; the adjustment process comprises the following steps:
since the area to be detected of the bearing image is in a ring state, the embodiment divides the bearing image into 9 patch images (other numbers can be adopted, such as 4 patch images or 16 patch images) according to the characteristics of the bearing image, and after the middle black irrelevant area is removed, the remaining 8 patch image blocks are used as the original input of KQV three branches of cross attention; the irrelevant area information can be screened out by removing the middle block, so that the training speed is improved.
For each patch image input by the KQ branch of the cross attention module, carrying out linear transformation and then carrying out point multiplication weight weighting; the weights of patch images with different importance degrees are different; the weight calculation process of the patch image is as follows:
firstly, calculating the ratio r of the area of a mark frame in the current patch image to the mark frame; and then inputting r into a designed weight function sin ((r. Pi/2). Sup.3) +1, wherein r is not more than 1, and calculating the weight value of the patch image. The function is to make the patch image with larger mark frame ratio obtain more weight and smaller weight than the patch image with smaller mark frame ratio, and then to calculate (r pi/2) to 3 times. In order to ensure that the calculated gradient does not disappear during training, 1 is added to the sin function as a whole, and the patch image without label frame overlapping is ensured to be capable of acquiring the weight of 1.
Introducing a low confidence detection result of target detection into a V branch of the cross attention module, wherein the range of the low confidence is 0.1-0.5; the complete image is inferred using the target detection model, and a low confidence detection result (confidence between 0.1 and 0.5) of target detection is introduced into the V branch, and the result participates in training as information of another mode of the cross-attention V branch. Specifically, the confidence coefficient adjustment value detected by the target detection model is multiplied after each patch image input by the V branch is subjected to linear transformation, and the adjustment value is as follows: firstly screening out a low confidence detection frame, then calculating the total area of each annotation frame belonging to the patch image in the area ratio of the annotation frame in the patch image, leaving the annotation frame with the ratio of more than 40%, removing the annotation frame with the ratio of more than 50% in the high confidence detection frame IOU, calculating the left annotation frame and the IOU of the low confidence detection frame in the patch image area, and using the IOU to input the following formula for weight calculation: cos (r.pi.2) +1. The function is mainly used for obtaining a higher weight from a patch image with low confidence and larger overlapping with a label frame, so that the characteristic in the patch image which is difficult to detect in target detection can be learned by a CLIP model, and the detection capability of the characteristic is improved;
model retraining module: training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model, wherein the training is performed by using the cross attention module only; that is, the model weights of the first six self-attention feature extraction are frozen during training, and only the last six self-attention feature extraction modules are improved into a cross-attention module; the operation not only maintains the feature extraction capability of the original pre-training model, but also fits new data.
On the basis of the above embodiment, the method may further include a tag definition module: defining a label for comparing patch images in training;
the area of the marking frame in the patch image is more than 10% of the total area of the marking frame, and the category of the marking frame is marked in the patch image. The following is the bearing class patch image tag naming convention: the a photo of (rust) on a bearing, the names of various defects of the bearing such as rust (rust), bump (bump), normal (crush), and sweell (bump) are filled in brackets as labels, wherein normal is a normal image without defects. The multi-tag incorporates a tag as the tag such as a photo of (last and bulb) on a bearing.
A bearing defect detection model is constructed by the method for constructing the bearing defect detection model.
The bearing defect detection method utilizes the bearing defect detection model to finish the defect detection process; the specific detection process comprises the following steps:
and inputting the bearing image to be detected into a CLIP model and a target detection model, respectively processing the bearing image to be detected by using the two models, and performing logical OR operation on defect information obtained by processing results, namely, considering that the bearing has defects as long as any model detects the defect information, and further outputting the defect information.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in the form of a computer program product comprising one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the application in any way, but any simple modification, equivalent variation and modification of the above embodiments according to the technical principles of the present application are within the scope of the technical solutions of the present application.

Claims (10)

1. The method for constructing the bearing defect detection model is characterized by comprising the following steps of:
s1, acquiring a sample image set of a bearing, and carrying out image preprocessing on sample images in the sample image set to obtain a complete image set containing complete information of the end face of the bearing and a patch image set containing local information of the end face of the bearing; each complete image consists of a plurality of patch images;
s2, training a target detection model by using the complete image set; obtaining the confidence coefficient of target detection;
s3, adjusting the parameter weight in the CLIP model; the image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; the adjustment process comprises the following steps:
for each patch image input by the KQ branch of the cross attention module in the image encoder module, performing linear transformation and then performing point multiplication weight weighting;
introducing a low confidence detection result of target detection into a V branch of the cross attention module, wherein the range of the low confidence is 0.1-0.5;
s4, training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model; wherein: the training is performed by using the cross attention module only.
2. The method for constructing a bearing defect detection model according to claim 1, wherein the image preprocessing specifically comprises:
s101, firstly graying a sample image to obtain a gray image; then dividing the gray image into a target area and a background area according to the pixel value of the area to be detected; finally, filling the hollow in the annular region by using a closed operation;
s102, firstly, reserving the contour of the end face of the bearing in an annular area to be detected through contour screening; then optimizing the detected annular closed contour by using a fitting circle algorithm, and finally buckling an annular region on an original image;
s103, marking defects: marking the type and the position of the defect on the image taken out by the buckle.
3. The method of claim 1, wherein in S3, for each patch image input by the KQ branch of the cross-attention module, the expression of the patch image calculation weight is: sin ((r pi/2)/(3) +1, r < = 1; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
4. The method for constructing a bearing defect detection model according to claim 1, wherein the low confidence detection result of the target detection is introduced into the V branch of the cross attention module, specifically comprising:
and carrying out linear transformation on each patch image input by the V branch, and then carrying out dot multiplication on the confidence coefficient adjustment value detected by the target detection model.
5. The method of claim 4, wherein the confidence adjustment value is obtained by:
screening out a low confidence detection frame;
then, calculating the total area of each annotation frame in the patch image, which belongs to the area ratio of the patch image, leaving the annotation frame with the ratio of more than 40%, removing the annotation frame with the high confidence detection frame IOU of more than 50%, and calculating the left annotation frame and the IOU of the low confidence detection frame in the area of the patch image;
finally, the weight calculation is performed by using the IOU to input the following formula: cos (r pi/2) +1 to obtain the adjusted value; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
6. A bearing defect detection model construction system, comprising:
sample module: acquiring a sample image set of a bearing, and carrying out image preprocessing on sample images in the sample image set to obtain a complete image set containing complete information of the end face of the bearing and a patch image set containing local information of the end face of the bearing; each complete image consists of a plurality of patch images;
a model preliminary training module; training a target detection model by using the complete image set; obtaining the confidence coefficient of target detection;
parameter weight adjustment module: adjusting the parameter weight in the CLIP model; the image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; the adjustment process comprises the following steps:
for each patch image input by the KQ branch of the cross attention module, carrying out linear transformation and then carrying out point multiplication weight weighting;
introducing a low confidence detection result of target detection into a V branch of the cross attention module, wherein the range of the low confidence is 0.1-0.5;
model retraining module: training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model; wherein: the training is performed by using the cross attention module only.
7. The bearing defect detection model construction system of claim 6, wherein the image preprocessing specifically comprises:
s101, firstly graying a sample image to obtain a gray image; then dividing the gray image into a target area and a background area according to the pixel value of the area to be detected; finally, filling the hollow in the annular region by using a closed operation;
s102, firstly, reserving the contour of the end face of the bearing in an annular area to be detected through contour screening; then optimizing the detected annular closed contour by using a fitting circle algorithm, and finally buckling an annular region on an original image;
s103, marking defects: marking the type and the position of the defect on the image taken out by the buckle.
8. The bearing defect detection model construction system according to claim 6, wherein in S3, for each patch image input by the KQ branch of the cross-attention module, the expression of the patch image calculation weight is: sin ((r pi/2)/(3) +1, r < = 1; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
9. The bearing defect detection model construction system of claim 6, wherein introducing low confidence detection results of target detection at the V-branch of the cross-attention module comprises:
performing linear transformation on each patch image input by the V branch, and then performing point multiplication on a confidence coefficient adjustment value detected by the target detection model, wherein the adjustment value is obtained through the following steps:
screening out a low confidence detection frame;
then, calculating the total area of each annotation frame in the patch image, which belongs to the area ratio of the patch image, leaving the annotation frame with the ratio of more than 40%, removing the annotation frame with the high confidence detection frame IOU of more than 50%, and calculating the left annotation frame and the IOU of the low confidence detection frame in the area of the patch image;
finally, the weight calculation is performed by using the IOU to input the following formula: cos (r pi/2) +1 to obtain the adjusted value; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
10. A bearing defect detection model constructed by the method for constructing a bearing defect detection model according to any one of claims 1 to 5.
CN202311415402.3A 2023-10-30 2023-10-30 Bearing defect detection model construction method and system Active CN117152142B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311415402.3A CN117152142B (en) 2023-10-30 2023-10-30 Bearing defect detection model construction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311415402.3A CN117152142B (en) 2023-10-30 2023-10-30 Bearing defect detection model construction method and system

Publications (2)

Publication Number Publication Date
CN117152142A true CN117152142A (en) 2023-12-01
CN117152142B CN117152142B (en) 2024-02-02

Family

ID=88884756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311415402.3A Active CN117152142B (en) 2023-10-30 2023-10-30 Bearing defect detection model construction method and system

Country Status (1)

Country Link
CN (1) CN117152142B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210012150A1 (en) * 2019-07-11 2021-01-14 Xidian University Bidirectional attention-based image-text cross-modal retrieval method
CN113160192A (en) * 2021-04-28 2021-07-23 北京科技大学 Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN113220919A (en) * 2021-05-17 2021-08-06 河海大学 Dam defect image text cross-modal retrieval method and model
CN113902926A (en) * 2021-12-06 2022-01-07 之江实验室 General image target detection method and device based on self-attention mechanism
CN114283430A (en) * 2021-12-03 2022-04-05 苏州大创科技有限公司 Cross-modal image-text matching training method and device, storage medium and electronic equipment
CN115937091A (en) * 2022-10-24 2023-04-07 合肥中科融道智能科技有限公司 Transformer substation equipment defect image detection method based on changeable patch
CN116383671A (en) * 2023-03-27 2023-07-04 武汉大学 Text image cross-mode pedestrian retrieval method and system with implicit relation reasoning alignment
CN116386081A (en) * 2023-03-01 2023-07-04 西北工业大学 Pedestrian detection method and system based on multi-mode images
CN116630608A (en) * 2023-05-29 2023-08-22 广东工业大学 Multi-mode target detection method for complex scene
US20230306732A1 (en) * 2022-03-25 2023-09-28 Facedapter Sàrl Heterogenous Face Recognition System and Method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210012150A1 (en) * 2019-07-11 2021-01-14 Xidian University Bidirectional attention-based image-text cross-modal retrieval method
CN113160192A (en) * 2021-04-28 2021-07-23 北京科技大学 Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN113220919A (en) * 2021-05-17 2021-08-06 河海大学 Dam defect image text cross-modal retrieval method and model
CN114283430A (en) * 2021-12-03 2022-04-05 苏州大创科技有限公司 Cross-modal image-text matching training method and device, storage medium and electronic equipment
CN113902926A (en) * 2021-12-06 2022-01-07 之江实验室 General image target detection method and device based on self-attention mechanism
US20230306732A1 (en) * 2022-03-25 2023-09-28 Facedapter Sàrl Heterogenous Face Recognition System and Method
CN115937091A (en) * 2022-10-24 2023-04-07 合肥中科融道智能科技有限公司 Transformer substation equipment defect image detection method based on changeable patch
CN116386081A (en) * 2023-03-01 2023-07-04 西北工业大学 Pedestrian detection method and system based on multi-mode images
CN116383671A (en) * 2023-03-27 2023-07-04 武汉大学 Text image cross-mode pedestrian retrieval method and system with implicit relation reasoning alignment
CN116630608A (en) * 2023-05-29 2023-08-22 广东工业大学 Multi-mode target detection method for complex scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YAXIONG WANG, ET AL: "《PFAN++: Bi-Directional Image-Text Retrieval With Position Focused Attention Network》", 《IEEE TRANSACTIONS ON MULTIMEDIA。 *

Also Published As

Publication number Publication date
CN117152142B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
CN108228915B (en) Video retrieval method based on deep learning
CN107133569B (en) Monitoring video multi-granularity labeling method based on generalized multi-label learning
CN113436169B (en) Industrial equipment surface crack detection method and system based on semi-supervised semantic segmentation
CN102385592B (en) Image concept detection method and device
CN111104555B (en) Video hash retrieval method based on attention mechanism
CN113888550B (en) Remote sensing image road segmentation method combining super-resolution and attention mechanism
CN110648310A (en) Weak supervision casting defect identification method based on attention mechanism
CN115937655B (en) Multi-order feature interaction target detection model, construction method, device and application thereof
CN112766218B (en) Cross-domain pedestrian re-recognition method and device based on asymmetric combined teaching network
CN116030396B (en) Accurate segmentation method for video structured extraction
CN114463340B (en) Agile remote sensing image semantic segmentation method guided by edge information
CN116091946A (en) Yolov 5-based unmanned aerial vehicle aerial image target detection method
Li et al. Efficient detection in aerial images for resource-limited satellites
CN111523586A (en) Noise-aware-based full-network supervision target detection method
CN114529894A (en) Rapid scene text detection method fusing hole convolution
CN113657473A (en) Web service classification method based on transfer learning
CN110738129B (en) End-to-end video time sequence behavior detection method based on R-C3D network
CN112418229A (en) Unmanned ship marine scene image real-time segmentation method based on deep learning
CN117132910A (en) Vehicle detection method and device for unmanned aerial vehicle and storage medium
CN117152142B (en) Bearing defect detection model construction method and system
CN116563844A (en) Cherry tomato maturity detection method, device, equipment and storage medium
CN114861718B (en) Bearing fault diagnosis method and system based on improved depth residual error algorithm
CN117011219A (en) Method, apparatus, device, storage medium and program product for detecting quality of article
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention
CN112926670A (en) Garbage classification system and method based on transfer learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant