CN117152142A - Bearing defect detection model construction method and system - Google Patents
Bearing defect detection model construction method and system Download PDFInfo
- Publication number
- CN117152142A CN117152142A CN202311415402.3A CN202311415402A CN117152142A CN 117152142 A CN117152142 A CN 117152142A CN 202311415402 A CN202311415402 A CN 202311415402A CN 117152142 A CN117152142 A CN 117152142A
- Authority
- CN
- China
- Prior art keywords
- image
- bearing
- model
- patch image
- area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 151
- 230000007547 defect Effects 0.000 title claims abstract description 77
- 238000010276 construction Methods 0.000 title claims description 8
- 238000012549 training Methods 0.000 claims abstract description 54
- 238000000034 method Methods 0.000 claims abstract description 36
- 238000007781 pre-processing Methods 0.000 claims abstract description 15
- 230000008569 process Effects 0.000 claims abstract description 14
- 230000009466 transformation Effects 0.000 claims abstract description 13
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000012216 screening Methods 0.000 claims description 12
- JEIPFZHSYJVQDO-UHFFFAOYSA-N iron(III) oxide Inorganic materials O=[Fe]O[Fe]=O JEIPFZHSYJVQDO-UHFFFAOYSA-N 0.000 description 14
- 230000006870 function Effects 0.000 description 10
- 238000000605 extraction Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- 230000008447 perception Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000013145 classification model Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000005242 forging Methods 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 238000000227 grinding Methods 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000004080 punching Methods 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
- 238000007514 turning Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/62—Analysis of geometric attributes of area, perimeter, diameter or volume
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20036—Morphological image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30108—Industrial image inspection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Quality & Reliability (AREA)
- Geometry (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a method and a system for constructing a bearing defect detection model, which belong to the technical field of defect detection and comprise the following steps: s1, carrying out image preprocessing on sample images in a sample image set; s2, training a target detection model by using the complete image set; s3, adjusting the parameter weight in the CLIP model; the adjustment process comprises the following steps: for each patch image input by the KQ branch of the cross attention module in the image encoder module, performing linear transformation and then performing point multiplication weight weighting; introducing a low confidence detection result of target detection into the V branch of the cross attention module; s4, training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model. The application combines the target detection model and the CLIP model, and can improve the reliability of the result when the bearing parts have few sample defects.
Description
Technical Field
The application belongs to the technical field of defect detection, and particularly relates to a method and a system for constructing a bearing defect detection model.
Background
Bearings (bearings) are an important component in mechanical devices. Its main function is to support the mechanical rotator, reduce the coefficient of friction (friction coefficient) during its movement and guarantee its precision of revolution (accuracies). In the manufacturing process of the bearing parts, forging, rolling, punching, turning, grinding, heat treatment and other working procedures are carried out.
In the bearing defect detection process, the traditional technology is as follows: firstly, collecting defect images, then marking and training by adopting a target detection algorithm, wherein the detection effect of the algorithm depends on the number and diversity of the collected defects. However, with the continuous optimization and maturation of the technology, the yield of the production enterprises is continuously improved, which makes the collection of the defect images more and more difficult, and the final result is that the performance of the target detection algorithm is insufficient and the defect missing detection condition is serious.
Disclosure of Invention
The application provides a method and a system for constructing a bearing defect detection model, which are used for solving the technical problems in the prior art, and combining a target detection model with a CLIP model to improve the reliability of results when few sample defects of bearing parts are detected.
The first object of the present application is to provide a method for constructing a bearing defect detection model, comprising:
s1, acquiring a sample image set of a bearing, and carrying out image preprocessing on sample images in the sample image set to obtain a complete image set containing complete information of the end face of the bearing and a patch image set containing local information of the end face of the bearing; each complete image consists of a plurality of patch images;
s2, training a target detection model by using the complete image set; obtaining the confidence coefficient of target detection;
s3, adjusting the parameter weight in the CLIP model; the image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; the adjustment process comprises the following steps:
for each patch image input by the KQ branch of the cross attention module, carrying out linear transformation and then carrying out point multiplication weight weighting;
introducing a low confidence detection result of target detection into a V branch of the cross attention module, wherein the range of the low confidence is 0.1-0.5;
s4, training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model; wherein: the training is performed by using the cross attention module only.
Preferably, the image preprocessing specifically includes:
s101, firstly graying a sample image to obtain a gray image; then dividing the gray image into a target area and a background area according to the pixel value of the area to be detected; finally, filling the hollow in the annular region by using a closed operation;
s102, firstly, reserving the contour of the end face of the bearing in an annular area to be detected through contour screening; then optimizing the detected annular closed contour by using a fitting circle algorithm, and finally buckling an annular region on an original image;
s103, marking defects: marking the type and the position of the defect on the image taken out by the buckle.
Preferably, in S3, for each patch image input by the KQ branch of the cross-attention module, the expression of the patch image calculation weight is: sin ((r pi/2)/(3) +1, r < = 1; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
Preferably, the low confidence detection result of the target detection is introduced in the V branch of the cross attention module, which specifically comprises:
performing linear transformation on each patch image input by the V branch, and then performing point multiplication on a confidence coefficient adjustment value detected by the target detection model, wherein the adjustment value is obtained through the following steps:
screening out a low confidence detection frame;
then, calculating the total area of each annotation frame in the patch image, which belongs to the area ratio of the patch image, leaving the annotation frame with the ratio of more than 40%, removing the annotation frame with the high confidence detection frame IOU of more than 50%, and calculating the left annotation frame and the IOU of the low confidence detection frame in the area of the patch image;
finally, the weight calculation is performed by using the IOU to input the following formula: cos (r pi/2) +1 to obtain the adjusted value; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
A second object of the present application is to provide a bearing defect detection model construction system, comprising:
sample module: acquiring a sample image set of a bearing, and carrying out image preprocessing on sample images in the sample image set to obtain a complete image set containing complete information of the end face of the bearing and a patch image set containing local information of the end face of the bearing; each complete image consists of a plurality of patch images;
a model preliminary training module; training a target detection model by using the complete image set; obtaining the confidence coefficient of target detection;
parameter weight adjustment module: adjusting the parameter weight in the CLIP model; the image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; the adjustment process comprises the following steps:
for each patch image input by the KQ branch of the cross attention module, carrying out linear transformation and then carrying out point multiplication weight weighting;
introducing a low confidence detection result of target detection into a V branch of the cross attention module, wherein the range of the low confidence is 0.1-0.5;
model retraining module: training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model; wherein: the training is performed by using the cross attention module only.
Preferably, the image preprocessing specifically includes:
s101, firstly graying a sample image to obtain a gray image; then dividing the gray image into a target area and a background area according to the pixel value of the area to be detected; finally, filling the hollow in the annular region by using a closed operation;
s102, firstly, reserving the contour of the end face of the bearing in an annular area to be detected through contour screening; then optimizing the detected annular closed contour by using a fitting circle algorithm, and finally buckling an annular region on an original image;
s103, marking defects: marking the type and the position of the defect on the image taken out by the buckle.
Preferably, in S3, for each patch image input by the KQ branch of the cross-attention module, the expression of the patch image calculation weight is: sin ((r pi/2)/(3) +1, r < = 1; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
Preferably, the low confidence detection result of the target detection is introduced in the V branch of the cross attention module, which specifically comprises:
performing linear transformation on each patch image input by the V branch, and then performing point multiplication on a confidence coefficient adjustment value detected by the target detection model, wherein the adjustment value is obtained through the following steps:
screening out a low confidence detection frame;
then, calculating the total area of each annotation frame in the patch image, which belongs to the area ratio of the patch image, leaving the annotation frame with the ratio of more than 40%, removing the annotation frame with the high confidence detection frame IOU of more than 50%, and calculating the left annotation frame and the IOU of the low confidence detection frame in the area of the patch image;
finally, the weight calculation is performed by using the IOU to input the following formula: cos (r pi/2) +1 to obtain the adjusted value; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
The third object of the present application is to provide a bearing defect detection model, which is constructed by the method for constructing a bearing defect detection model.
The fourth object of the present application is to provide a method for detecting a bearing defect, which uses the above-mentioned bearing defect detection model to complete the defect detection process.
The application has the advantages and positive effects that:
the image encoder of the CLIP model in the application comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; during construction, firstly, preprocessing a sample image to obtain a complete image of a bearing and a plurality of patch images, and then processing the patch images by using a cross attention module in an initial CLIP model to obtain KQV branches of the cross attention module; processing the complete image by utilizing target detection to obtain modal information, and adjusting the weight of the V branch model of the cross attention module by utilizing a low confidence detection result in the modal information; and finally, training the cross attention module in the CLIP model by using the patch image again, namely freezing the model weights extracted from the first six self-attention features during training, and training by using only the last six cross attention modules.
The bearing defect detection model constructed by the application utilizes the CLIP pre-training model and introduces new mode information, and uses the CLIP model and the target detection model to jointly carry out defect judgment. The detection key points of the application are as follows: the algorithm can be rapidly enabled to have the capability of distinguishing defect features by using a small amount of samples through the following two points:
1. the contrast learning algorithm of the CLIP model is improved, when the bearing image is trained, the base model of the CLIP is finely adjusted by introducing the modal information of the target detection result, and the base model can be quickly provided with the characteristic perception capability of new data through specific steps.
2. Because the algorithm characteristic of the CLIP has stronger characteristic distinguishing capability than that of the target detection pre-training model, the application selectively freezes part of the weights of the CLIP pre-training model for training, so that the improved CLIP model can achieve the same characteristic extracting capability as the target detection model by using a small amount of data.
Drawings
FIG. 1 is a flow chart of a method for constructing a bearing defect detection model in a preferred embodiment of the present application;
FIG. 2 is a block diagram of a CLIP model in accordance with a preferred embodiment of the present application;
FIG. 3 is a system block diagram of a bearing defect detection model building system in accordance with a preferred embodiment of the present application;
FIG. 4 is a schematic view of a bearing prior to image preprocessing in a preferred embodiment of the present application;
fig. 5 is a schematic view of a bearing after image preprocessing in a preferred embodiment of the present application.
Detailed Description
For a further understanding of the application, its features and advantages, reference is now made to the following examples, which are illustrated in the accompanying drawings in which:
the following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the technical solutions of the present application, all other embodiments obtained by a person skilled in the art without making any creative effort fall within the protection scope of the present application.
The CLIP model is a model trained by using large-scale text-image pairs, 4 hundred million text-image pairs (images) are collected to train the CLIP model, and after large-scale text-image pre-training, the cosine similarity of the input text and image is calculated through encoding to judge the matching degree of the data pairs. And then the model can be directly migrated to an image classification task without any fine adjustment of labeled data, and the zero-shot classification model can be directly realized.
The CLIP model learns image information by means of contrast learning, as shown in fig. 2, taking a rust bearing image as an example, a photo of rust on a bearing indicates that: an image of a rust bearing; the Image model inputs N 'Image-text' pairs containing one batch Image, the N images are Image coded by an Image Encoder, and the [ I ] is output 1 ,I 2 ,…I n ]Dimension (N, di), (here N pictures of one batch are encoded, I) 1 I.e. the coding vector of the first picture, I n Is the coding vector of the i-th picture, each coding vector having a length di. From I 1 To I n Form a code vector group I 1 ,I 2 ,I 3 ,…I N ]Then its size is (N, di)); simultaneously, N texts are output as [ T ] through a text encoder 1 ,T 2 ,…T n ]Dimension (N, dt), (length of each encoding vector is dt); wherein the positive sample pair I corresponds to the data pair indexed by T, such as I 1 And T is 1 ,I 2 And T is 2 The rest are negative samples, and N positive samples and N square-N negative samples can be finally obtained. Calculation I i And T is i Cosine similarity among the images and texts is higher as the maximum similarity indicates that the image-text correlation is higher, and the optimization targets are as follows: by maximizing the cosine similarity of the positive sample pair, i.e., the similarity when ij is the same, minimizing the cosine similarity of the negative sample, i.e., the similarity when ij is different,the Text Encoder and Image Encoder related weight parameters are trained using cross entropy loss. Can be summarized in the following form:
"·" is the cosine similarity of the text vector and the image vector.
Self-Attention (Self-Attention) and Cross-Attention (Cross-Attention) are variants of the Attention mechanism for capturing associations between different elements in sequence or multimodal data.
In the field of bearing defect detection, AI technology is used for target detection as a conventional means, but target detection generalization is limited, and only defect characteristics which are relatively similar to those of trained defects can be detected, so that the possibility of missed detection is generated. Therefore, a clip algorithm model with strong feature perception capability is added in the application, and the model technology is adjusted by a small amount of data, wherein other weights except the MLP in the image encoder are frozen and trained, so that the feature distinction capability of normal and defect data of the bearing is increased on the basis of original feature perception. Finally, the two algorithms are used for detecting the bearing shooting image in a cascading mode, so that missed detection is reduced.
Referring to fig. 1, a method for constructing a bearing defect detection model includes:
s1, acquiring a sample image set of a bearing, and performing image preprocessing on sample images in the sample image set to obtain a complete image set containing bearing end face information and a patch image set containing a plurality of bearing end face local information; in this embodiment, the specific process of image preprocessing includes:
s101, firstly, reading a sample image containing a bearing, and graying the sample image to obtain a gray image; then dividing the gray image into a target area and a background area according to the pixel value of the area to be detected; finally, the voids in the annular region are filled by using a closed operation in morphology, so that small noise can be removed, and the annular region can be better separated;
s102, firstly, only retaining the contour of the end face of the bearing in the annular region to be detected through contour screening; then further optimizing the detected annular closed contour by using a fitting circle algorithm, and finally, buckling an annular region on an original image;
s103, marking defects: marking the type and the position of the defect on the image taken out by the buckle;
marking the image taken out by the buckle by using a target detection marking form, namely marking the defect type and the position; the defect types of the bearing are defined as follows: rust (rust), bump (crush), crush (crush), swell (bulge).
Referring to fig. 4 and 5, fig. 4 is a sample image of the bearing before processing, fig. 5 is a labeled bearing image, and four rust labeling frames are total.
S2, model preliminary training; the method specifically comprises the following steps:
training a target detection model by using the complete image set; obtaining detection result modal information of target detection; the detection result modal information comprises the category, coordinates and confidence of target detection; the method comprises the following steps: and training the marked complete image by using a target detection model, wherein the training of the target detection model is to introduce detection result modal information of target detection when training the improved CLIP model.
S3, adjusting the parameter weight in the CLIP model; the image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; the adjustment process comprises the following steps:
since the area to be detected of the bearing image is in a ring state, the embodiment divides the bearing image into 9 patch images (other numbers can be adopted, such as 4 patch images or 16 patch images) according to the characteristics of the bearing image, and after the middle black irrelevant area is removed, the remaining 8 patch image blocks are used as the original input of KQV three branches of cross attention; the irrelevant area information can be screened out by removing the middle block, so that the training speed is improved.
For each patch image input by the KQ branch of the cross attention module, carrying out linear transformation and then carrying out point multiplication weight weighting; the weights of patch images with different importance degrees are different; the weight calculation process of the patch image is as follows:
firstly, calculating the ratio r of the area of a mark frame in the current patch image to the mark frame; and then inputting the ratio r to a designed weight function sin ((r pi/2)/(3) +1, wherein r is not more than 1, and calculating the weight value of the patch image. The function is to make the patch image with larger mark frame ratio obtain more weight and smaller weight than the patch image with smaller mark frame ratio, and then to calculate (r pi/2) to 3 times. In order to ensure that the calculated gradient does not disappear during training, 1 is added to the sin function as a whole, and the patch image without label frame overlapping is ensured to be capable of acquiring the weight of 1.
Introducing a low confidence detection result of target detection into a V branch of the cross attention module, wherein the range of the low confidence is 0.1-0.5; the complete image is inferred using the target detection model, and a low confidence detection result (confidence between 0.1 and 0.5) of target detection is introduced into the V branch, and the result participates in training as information of another mode of the cross-attention V branch. Specifically, the confidence coefficient adjustment value detected by the target detection model is multiplied after each patch image input by the V branch is subjected to linear transformation, and the adjustment value is as follows: firstly screening out a low confidence detection frame, then calculating the total area of each annotation frame belonging to the patch image in the area ratio of the annotation frame in the patch image, leaving the annotation frame with the ratio of more than 40%, removing the annotation frame with the ratio of more than 50% in the high confidence detection frame IOU, calculating the left annotation frame and the IOU of the low confidence detection frame in the patch image area, and using the IOU to input the following formula for weight calculation: cos (r.pi.2) +1. The function is mainly used for obtaining a higher weight from a patch image with low confidence and larger overlapping with a label frame, so that the CLIP model can learn characteristics in the patch image which are difficult to detect in target detection, and the detection capability of the characteristics is improved.
S4, training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model, wherein the training is performed by using the cross attention module only; that is, the model weights of the first six self-attention feature extraction are frozen during training, and only the last six self-attention feature extraction modules are improved into a cross-attention module; the operation not only maintains the feature extraction capability of the original pre-training model, but also fits new data.
Based on the above embodiment, the method may further include S5, defining a label of the patch image in the contrast training;
the area of the marking frame in the patch image is more than 10% of the total area of the marking frame, and the category of the marking frame is marked in the patch image. The following is the bearing class patch image tag naming convention: the a photo of (rust) on a bearing, the names of various defects of the bearing such as rust (rust), bump (bump), normal (crush), and sweell (bump) are filled in brackets as labels, wherein normal is a normal image without defects. The multi-tag incorporates a tag as the tag such as a photo of (last and bulb) on a bearing.
Referring to fig. 3, a system for constructing a bearing defect detection model includes:
sample module: acquiring a sample image set of a bearing, and carrying out image preprocessing on sample images in the sample image set to obtain a complete image set containing bearing end face information and a patch image set containing a plurality of bearing end face local information; the method specifically comprises the following steps:
s101, firstly, reading a sample image containing a bearing, and graying the sample image to obtain a gray image; then dividing the gray image into a target area and a background area according to the pixel value of the area to be detected; finally, the voids in the annular region are filled by using a closed operation in morphology, so that small noise can be removed, and the annular region can be better separated;
s102, firstly, only retaining the contour of the end face of the bearing in the annular region to be detected through contour screening; then further optimizing the detected annular closed contour by using a fitting circle algorithm, and finally, buckling an annular region on an original image;
s103, marking defects: marking the type and the position of the defect on the image taken out by the buckle;
marking the image taken out by the buckle by using a target detection marking form, namely marking the defect type and the position; the defect types of the bearing are defined as follows: rust (rust), bump (crush), crush (crush), swell (bulge).
A model preliminary training module; the model preliminary training specifically comprises the following steps:
training a target detection model by using the complete image set; obtaining detection result modal information of target detection; the detection result modal information comprises the category, coordinates and confidence of target detection; the method comprises the following steps: training the marked complete image by using a target detection model, wherein the training of the target detection model is to introduce detection result modal information of target detection when training an improved CLIP model;
performing primary training on a CLIP model by using a patch image set, wherein an image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; only using the cross attention module to train during training;
parameter weight adjustment module: adjusting the parameter weight in the CLIP model; the image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; the adjustment process comprises the following steps:
since the area to be detected of the bearing image is in a ring state, the embodiment divides the bearing image into 9 patch images (other numbers can be adopted, such as 4 patch images or 16 patch images) according to the characteristics of the bearing image, and after the middle black irrelevant area is removed, the remaining 8 patch image blocks are used as the original input of KQV three branches of cross attention; the irrelevant area information can be screened out by removing the middle block, so that the training speed is improved.
For each patch image input by the KQ branch of the cross attention module, carrying out linear transformation and then carrying out point multiplication weight weighting; the weights of patch images with different importance degrees are different; the weight calculation process of the patch image is as follows:
firstly, calculating the ratio r of the area of a mark frame in the current patch image to the mark frame; and then inputting r into a designed weight function sin ((r. Pi/2). Sup.3) +1, wherein r is not more than 1, and calculating the weight value of the patch image. The function is to make the patch image with larger mark frame ratio obtain more weight and smaller weight than the patch image with smaller mark frame ratio, and then to calculate (r pi/2) to 3 times. In order to ensure that the calculated gradient does not disappear during training, 1 is added to the sin function as a whole, and the patch image without label frame overlapping is ensured to be capable of acquiring the weight of 1.
Introducing a low confidence detection result of target detection into a V branch of the cross attention module, wherein the range of the low confidence is 0.1-0.5; the complete image is inferred using the target detection model, and a low confidence detection result (confidence between 0.1 and 0.5) of target detection is introduced into the V branch, and the result participates in training as information of another mode of the cross-attention V branch. Specifically, the confidence coefficient adjustment value detected by the target detection model is multiplied after each patch image input by the V branch is subjected to linear transformation, and the adjustment value is as follows: firstly screening out a low confidence detection frame, then calculating the total area of each annotation frame belonging to the patch image in the area ratio of the annotation frame in the patch image, leaving the annotation frame with the ratio of more than 40%, removing the annotation frame with the ratio of more than 50% in the high confidence detection frame IOU, calculating the left annotation frame and the IOU of the low confidence detection frame in the patch image area, and using the IOU to input the following formula for weight calculation: cos (r.pi.2) +1. The function is mainly used for obtaining a higher weight from a patch image with low confidence and larger overlapping with a label frame, so that the characteristic in the patch image which is difficult to detect in target detection can be learned by a CLIP model, and the detection capability of the characteristic is improved;
model retraining module: training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model, wherein the training is performed by using the cross attention module only; that is, the model weights of the first six self-attention feature extraction are frozen during training, and only the last six self-attention feature extraction modules are improved into a cross-attention module; the operation not only maintains the feature extraction capability of the original pre-training model, but also fits new data.
On the basis of the above embodiment, the method may further include a tag definition module: defining a label for comparing patch images in training;
the area of the marking frame in the patch image is more than 10% of the total area of the marking frame, and the category of the marking frame is marked in the patch image. The following is the bearing class patch image tag naming convention: the a photo of (rust) on a bearing, the names of various defects of the bearing such as rust (rust), bump (bump), normal (crush), and sweell (bump) are filled in brackets as labels, wherein normal is a normal image without defects. The multi-tag incorporates a tag as the tag such as a photo of (last and bulb) on a bearing.
A bearing defect detection model is constructed by the method for constructing the bearing defect detection model.
The bearing defect detection method utilizes the bearing defect detection model to finish the defect detection process; the specific detection process comprises the following steps:
and inputting the bearing image to be detected into a CLIP model and a target detection model, respectively processing the bearing image to be detected by using the two models, and performing logical OR operation on defect information obtained by processing results, namely, considering that the bearing has defects as long as any model detects the defect information, and further outputting the defect information.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in the form of a computer program product comprising one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the application in any way, but any simple modification, equivalent variation and modification of the above embodiments according to the technical principles of the present application are within the scope of the technical solutions of the present application.
Claims (10)
1. The method for constructing the bearing defect detection model is characterized by comprising the following steps of:
s1, acquiring a sample image set of a bearing, and carrying out image preprocessing on sample images in the sample image set to obtain a complete image set containing complete information of the end face of the bearing and a patch image set containing local information of the end face of the bearing; each complete image consists of a plurality of patch images;
s2, training a target detection model by using the complete image set; obtaining the confidence coefficient of target detection;
s3, adjusting the parameter weight in the CLIP model; the image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; the adjustment process comprises the following steps:
for each patch image input by the KQ branch of the cross attention module in the image encoder module, performing linear transformation and then performing point multiplication weight weighting;
introducing a low confidence detection result of target detection into a V branch of the cross attention module, wherein the range of the low confidence is 0.1-0.5;
s4, training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model; wherein: the training is performed by using the cross attention module only.
2. The method for constructing a bearing defect detection model according to claim 1, wherein the image preprocessing specifically comprises:
s101, firstly graying a sample image to obtain a gray image; then dividing the gray image into a target area and a background area according to the pixel value of the area to be detected; finally, filling the hollow in the annular region by using a closed operation;
s102, firstly, reserving the contour of the end face of the bearing in an annular area to be detected through contour screening; then optimizing the detected annular closed contour by using a fitting circle algorithm, and finally buckling an annular region on an original image;
s103, marking defects: marking the type and the position of the defect on the image taken out by the buckle.
3. The method of claim 1, wherein in S3, for each patch image input by the KQ branch of the cross-attention module, the expression of the patch image calculation weight is: sin ((r pi/2)/(3) +1, r < = 1; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
4. The method for constructing a bearing defect detection model according to claim 1, wherein the low confidence detection result of the target detection is introduced into the V branch of the cross attention module, specifically comprising:
and carrying out linear transformation on each patch image input by the V branch, and then carrying out dot multiplication on the confidence coefficient adjustment value detected by the target detection model.
5. The method of claim 4, wherein the confidence adjustment value is obtained by:
screening out a low confidence detection frame;
then, calculating the total area of each annotation frame in the patch image, which belongs to the area ratio of the patch image, leaving the annotation frame with the ratio of more than 40%, removing the annotation frame with the high confidence detection frame IOU of more than 50%, and calculating the left annotation frame and the IOU of the low confidence detection frame in the area of the patch image;
finally, the weight calculation is performed by using the IOU to input the following formula: cos (r pi/2) +1 to obtain the adjusted value; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
6. A bearing defect detection model construction system, comprising:
sample module: acquiring a sample image set of a bearing, and carrying out image preprocessing on sample images in the sample image set to obtain a complete image set containing complete information of the end face of the bearing and a patch image set containing local information of the end face of the bearing; each complete image consists of a plurality of patch images;
a model preliminary training module; training a target detection model by using the complete image set; obtaining the confidence coefficient of target detection;
parameter weight adjustment module: adjusting the parameter weight in the CLIP model; the image encoder of the CLIP model comprises twelve attention modules, wherein the first six attention modules are self-attention modules, and the last six attention modules are cross-attention modules; the adjustment process comprises the following steps:
for each patch image input by the KQ branch of the cross attention module, carrying out linear transformation and then carrying out point multiplication weight weighting;
introducing a low confidence detection result of target detection into a V branch of the cross attention module, wherein the range of the low confidence is 0.1-0.5;
model retraining module: training the CLIP model by using the patch image set and the confidence coefficient of target detection to obtain a bearing defect detection model; wherein: the training is performed by using the cross attention module only.
7. The bearing defect detection model construction system of claim 6, wherein the image preprocessing specifically comprises:
s101, firstly graying a sample image to obtain a gray image; then dividing the gray image into a target area and a background area according to the pixel value of the area to be detected; finally, filling the hollow in the annular region by using a closed operation;
s102, firstly, reserving the contour of the end face of the bearing in an annular area to be detected through contour screening; then optimizing the detected annular closed contour by using a fitting circle algorithm, and finally buckling an annular region on an original image;
s103, marking defects: marking the type and the position of the defect on the image taken out by the buckle.
8. The bearing defect detection model construction system according to claim 6, wherein in S3, for each patch image input by the KQ branch of the cross-attention module, the expression of the patch image calculation weight is: sin ((r pi/2)/(3) +1, r < = 1; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
9. The bearing defect detection model construction system of claim 6, wherein introducing low confidence detection results of target detection at the V-branch of the cross-attention module comprises:
performing linear transformation on each patch image input by the V branch, and then performing point multiplication on a confidence coefficient adjustment value detected by the target detection model, wherein the adjustment value is obtained through the following steps:
screening out a low confidence detection frame;
then, calculating the total area of each annotation frame in the patch image, which belongs to the area ratio of the patch image, leaving the annotation frame with the ratio of more than 40%, removing the annotation frame with the high confidence detection frame IOU of more than 50%, and calculating the left annotation frame and the IOU of the low confidence detection frame in the area of the patch image;
finally, the weight calculation is performed by using the IOU to input the following formula: cos (r pi/2) +1 to obtain the adjusted value; r represents the ratio of the area of the annotation frame in the current patch image to the annotation frame.
10. A bearing defect detection model constructed by the method for constructing a bearing defect detection model according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311415402.3A CN117152142B (en) | 2023-10-30 | 2023-10-30 | Bearing defect detection model construction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311415402.3A CN117152142B (en) | 2023-10-30 | 2023-10-30 | Bearing defect detection model construction method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117152142A true CN117152142A (en) | 2023-12-01 |
CN117152142B CN117152142B (en) | 2024-02-02 |
Family
ID=88884756
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311415402.3A Active CN117152142B (en) | 2023-10-30 | 2023-10-30 | Bearing defect detection model construction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117152142B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210012150A1 (en) * | 2019-07-11 | 2021-01-14 | Xidian University | Bidirectional attention-based image-text cross-modal retrieval method |
CN113160192A (en) * | 2021-04-28 | 2021-07-23 | 北京科技大学 | Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background |
CN113220919A (en) * | 2021-05-17 | 2021-08-06 | 河海大学 | Dam defect image text cross-modal retrieval method and model |
CN113902926A (en) * | 2021-12-06 | 2022-01-07 | 之江实验室 | General image target detection method and device based on self-attention mechanism |
CN114283430A (en) * | 2021-12-03 | 2022-04-05 | 苏州大创科技有限公司 | Cross-modal image-text matching training method and device, storage medium and electronic equipment |
CN115937091A (en) * | 2022-10-24 | 2023-04-07 | 合肥中科融道智能科技有限公司 | Transformer substation equipment defect image detection method based on changeable patch |
CN116383671A (en) * | 2023-03-27 | 2023-07-04 | 武汉大学 | Text image cross-mode pedestrian retrieval method and system with implicit relation reasoning alignment |
CN116386081A (en) * | 2023-03-01 | 2023-07-04 | 西北工业大学 | Pedestrian detection method and system based on multi-mode images |
CN116630608A (en) * | 2023-05-29 | 2023-08-22 | 广东工业大学 | Multi-mode target detection method for complex scene |
US20230306732A1 (en) * | 2022-03-25 | 2023-09-28 | Facedapter Sàrl | Heterogenous Face Recognition System and Method |
-
2023
- 2023-10-30 CN CN202311415402.3A patent/CN117152142B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210012150A1 (en) * | 2019-07-11 | 2021-01-14 | Xidian University | Bidirectional attention-based image-text cross-modal retrieval method |
CN113160192A (en) * | 2021-04-28 | 2021-07-23 | 北京科技大学 | Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background |
CN113220919A (en) * | 2021-05-17 | 2021-08-06 | 河海大学 | Dam defect image text cross-modal retrieval method and model |
CN114283430A (en) * | 2021-12-03 | 2022-04-05 | 苏州大创科技有限公司 | Cross-modal image-text matching training method and device, storage medium and electronic equipment |
CN113902926A (en) * | 2021-12-06 | 2022-01-07 | 之江实验室 | General image target detection method and device based on self-attention mechanism |
US20230306732A1 (en) * | 2022-03-25 | 2023-09-28 | Facedapter Sàrl | Heterogenous Face Recognition System and Method |
CN115937091A (en) * | 2022-10-24 | 2023-04-07 | 合肥中科融道智能科技有限公司 | Transformer substation equipment defect image detection method based on changeable patch |
CN116386081A (en) * | 2023-03-01 | 2023-07-04 | 西北工业大学 | Pedestrian detection method and system based on multi-mode images |
CN116383671A (en) * | 2023-03-27 | 2023-07-04 | 武汉大学 | Text image cross-mode pedestrian retrieval method and system with implicit relation reasoning alignment |
CN116630608A (en) * | 2023-05-29 | 2023-08-22 | 广东工业大学 | Multi-mode target detection method for complex scene |
Non-Patent Citations (1)
Title |
---|
YAXIONG WANG, ET AL: "《PFAN++: Bi-Directional Image-Text Retrieval With Position Focused Attention Network》", 《IEEE TRANSACTIONS ON MULTIMEDIA。 * |
Also Published As
Publication number | Publication date |
---|---|
CN117152142B (en) | 2024-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108228915B (en) | Video retrieval method based on deep learning | |
CN107133569B (en) | Monitoring video multi-granularity labeling method based on generalized multi-label learning | |
CN113436169B (en) | Industrial equipment surface crack detection method and system based on semi-supervised semantic segmentation | |
CN102385592B (en) | Image concept detection method and device | |
CN111104555B (en) | Video hash retrieval method based on attention mechanism | |
CN113888550B (en) | Remote sensing image road segmentation method combining super-resolution and attention mechanism | |
CN110648310A (en) | Weak supervision casting defect identification method based on attention mechanism | |
CN115937655B (en) | Multi-order feature interaction target detection model, construction method, device and application thereof | |
CN112766218B (en) | Cross-domain pedestrian re-recognition method and device based on asymmetric combined teaching network | |
CN116030396B (en) | Accurate segmentation method for video structured extraction | |
CN114463340B (en) | Agile remote sensing image semantic segmentation method guided by edge information | |
CN116091946A (en) | Yolov 5-based unmanned aerial vehicle aerial image target detection method | |
Li et al. | Efficient detection in aerial images for resource-limited satellites | |
CN111523586A (en) | Noise-aware-based full-network supervision target detection method | |
CN114529894A (en) | Rapid scene text detection method fusing hole convolution | |
CN113657473A (en) | Web service classification method based on transfer learning | |
CN110738129B (en) | End-to-end video time sequence behavior detection method based on R-C3D network | |
CN112418229A (en) | Unmanned ship marine scene image real-time segmentation method based on deep learning | |
CN117132910A (en) | Vehicle detection method and device for unmanned aerial vehicle and storage medium | |
CN117152142B (en) | Bearing defect detection model construction method and system | |
CN116563844A (en) | Cherry tomato maturity detection method, device, equipment and storage medium | |
CN114861718B (en) | Bearing fault diagnosis method and system based on improved depth residual error algorithm | |
CN117011219A (en) | Method, apparatus, device, storage medium and program product for detecting quality of article | |
CN112487927B (en) | Method and system for realizing indoor scene recognition based on object associated attention | |
CN112926670A (en) | Garbage classification system and method based on transfer learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |