CN112802005A

CN112802005A - Automobile surface scratch detection method based on improved Mask RCNN

Info

Publication number: CN112802005A
Application number: CN202110174413.1A
Authority: CN
Inventors: 徐向荣; 周攀; 刘雪飞; 朱永飞
Original assignee: Anhui University of Technology AHUT
Current assignee: Anhui University of Technology AHUT
Priority date: 2021-02-07
Filing date: 2021-02-07
Publication date: 2021-05-14

Abstract

The invention discloses an automobile surface scratch detection method based on improved Mask RCNN. It comprises the following steps: collecting automobile surface scratch pictures, and making a training set and a verification set; constructing a deep learning network model based on an improved Mask RCNN; setting weight initialization, hyper-parameters and an optimizer in the training process; inputting the training set into the model constructed in the step S2 for training; inputting the verification set into a trained model, calculating the mAP value of the model, stopping training when the mAP value is more than or equal to 90%, storing the model, returning to the step S3 to perform hyper-parameter setting again when the mAP value is less than 90%, and then performing the step S4; the car scratch picture to be detected is input to the deep learning model stored in step S5 for inference, and the position, type and confidence of the scratch are obtained. The invention can effectively detect the scratches on the surface of the automobile, avoids the problems of time and labor waste of manual detection and low accuracy of on-line manual examination and verification, and has relatively high detection accuracy and good practicability.

Description

Automobile surface scratch detection method based on improved Mask RCNN

Technical Field

The invention belongs to the technical field of surface scratch image recognition, and particularly relates to an automobile surface scratch detection method based on improved Mask RCNN.

Background

With the development of socio-economic, it has to be acknowledged that our lifestyle is also constantly changing. People's life style is constantly updated, and automobiles become an indispensable part of daily life. Automobiles are generally mainly used to carry people, goods, and the like, so that they can quickly and conveniently reach corresponding places, and are widely used in contemporary environments.

In daily use, the automobile inevitably has some scratch phenomena, and at the moment, insurance companies often need to identify the scratch areas. The insurance company can select staff to go to the site for identification, and can also enable the user to upload related images for online manual review. However, the selection of workers to go to the site for identification consumes manpower and material resources of the client and the company, and is extremely inconvenient especially when the client is located out of the country; and if the user selects to upload the related images for online manual review, the definition of the images needs to be required to be higher, which brings inconvenience to the client. At present, a computer vision technology is adopted to detect scratches on the surface of an automobile, but the traditional computer vision technology relies on artificial design features which generally have no robustness and cannot meet variable realistic conditions, so that the detection precision cannot be ensured.

Through retrieval, related patents on image recognition technology have been disclosed. For example, the application with the Chinese patent application number of 201910986317.X discloses a high-resolution three-number SAR image road detection method based on Mask-RCNN, which comprises the following steps: s1: cutting a sample SAR image; s2: marking the cropped sub-image of S1 and generating a JSON format file as a training set; s3: constructing a neural network model according to a Mask-RCNN theoretical model; s4: feeding the S2 training set into the neural network model constructed by the S3 for training; s5: and inputting the SAR image to be detected into the neural network model trained in S4, and automatically generating the road information in the image. Although this application also provides an image detection method, its image detection effect is not ideal, and its detection method is yet to be further improved.

Disclosure of Invention

1. Problems to be solved

The invention aims to solve the problems that manual on-site identification or online image uploading for online manual examination and verification of automobile scratches are inconvenient, and the detection precision of the automobile scratches cannot be guaranteed by the traditional computer vision technology at present, and provides an automobile surface scratch detection method based on improved Mask RCNN. By adopting the technical scheme of the invention, the scratches on the surface of the automobile can be effectively detected, the problems of time and labor waste in manual detection and low accuracy in on-line manual examination and verification are avoided, and the detection precision is relatively high and the practicability is good.

2. Technical scheme

In order to solve the problems, the technical scheme adopted by the invention is as follows:

the invention relates to an automobile surface scratch detection method based on improved Mask RCNN, which specifically comprises the following steps:

step S1: collecting automobile surface scratch pictures, and making a training set and a verification set;

step S2: constructing a deep learning network model based on an improved Mask RCNN;

step S3: setting weight initialization, hyper-parameters and an optimizer in the training process;

step S4: inputting the training set created in step S1 into the network model constructed in step S2 for training;

step S5: inputting the verification set in the step S1 into the model trained in the step S4, calculating the mAP value of the model, stopping training when the mAP value is more than or equal to 90%, and storing the model; when the mAP value is less than 90%, returning to the step S3 to reset the hyper-parameter setting, and then continuing to the step S4;

step S6: and inputting the automobile surface scratch picture to be detected into the deep learning model stored in the step S5 for inference, and obtaining the position, the type and the confidence of the scratch.

Further, after the car body scratch picture is collected in step S1, the picture is cut into 112 × 112 in size, the position and shape of the scratch part are marked in the picture by using a LabelMe tool, and all the pictures are divided into 8: the scale of 2 is divided into a training set and a test set.

Further, the specific process of step S2 is as follows:

s2.1: FPN + ResNet50 is used as a feature extraction network;

s2.2: adjusting the number and the size of anchor frames in the RPN by adopting a Kmeans + + algorithm, selecting the number of the anchor frames at the moment as k when the average intersection ratio does not rise any more, setting the length and width of each target frame in a sample of a training set as coordinate points on a two-dimensional coordinate system, and calculating the size of the adjusted anchor frames by utilizing the Kmeans algorithm;

s2.3: non-maximum suppression is performed using the Soft NMS algorithm, which is shown as follows:

in the formula: s_iScore for the ith detection box; b_iA frame to be detected; m is a candidate box with the highest score; iou (M, b)_i) Measurement of M and b_iThe degree of overlap of (c); n is a radical of_tIs a preset threshold.

Dividing all the candidate frames into different sets according to different categories, arranging the candidate frames in each set in a descending order according to the probability value, calculating the IOU value of the candidate frame with the maximum probability value and the rest candidate frames in each set, and if the IOU value is smaller than a set threshold value N_tIf not, the probability value of the candidate box is set as

S2.4: the ROI pooling layer adopts an ROI Align algorithm; and dividing the candidate region into 4 small regions, and performing average calculation on the ROI Align algorithm of each small region by using a bilinear interpolation algorithm.

Further, the specific process of step S3 is as follows:

s3.1: setting a weight initialization mode as Gaussian distribution;

s3.2: setting the batch size to 64;

s3.3: the optimizer adopts Adam and adopts a reverse gradient propagation method to update the model weight.

Further, the specific process of step S4 is as follows:

s4.1: sending the pictures in the training set into a ResNet + FPN feature extraction network for feature extraction to generate a feature map;

s4.2: sending the feature map generated in the step S4.1 into an RPN network part for candidate region extraction, wherein anchor frame parameters of the RPN network part are set according to the result of the Kmeans algorithm in the step S2.2;

s4.3: preliminarily classifying the candidate areas, and screening out areas with high reliability through a foreground score and non-maximum suppression algorithm Soft NMS to generate candidate frames;

s4.4: fusing the candidate frame and the feature map, and generating a candidate region with a fixed size through an ROI pooling layer;

s4.5: finally, inputting the candidate region obtained in the step S4.4 into a subsequent classification layer and a frame regression layer for target region identification, and generating a network through a segmentation mask to segment the image; the classification layer and the frame regression layer use a full-connection layer network, and the segmentation mask generation network adopts a full convolution layer network;

s4.6: the loss function of the network is formed by adding frame regression loss, classification loss and pixel segmentation mask loss, and the specific formula of the loss function is as follows:

L＝L_cls+L_reg+L_mask

in the formula: l is_clsRepresents the bezel regression loss, L_regRepresenting a loss of classification, L_maskRepresents pixel segmentation mask loss; i is the anchor frame index, p_iRepresenting the predicted probability value, p, in the ith anchor box^* _iIs the probability value of the object in the ith anchor box, when the ith anchor box is a positive sample, then p^* _i1 is ═ 1; when the ith anchor frame is background, then p^* _i＝0；N_regAnd N_clsIs a normalization constant; t is t_i＝(t_x,t_y,t_w,t_h) Parameterized vector representing coordinates of center point and width and height of ith anchor frame prediction, t_i ^*Is the parameterized vector of the ith label bounding box; r represents a Smooth L1 function;

n_clrepresenting the number of classes contained in the real segmentation map, n_ijRepresenting the number of pixels in which the i-th class is predicted as the j-th class, n_iiRepresenting the number of pixels belonging to the i-th class predicted as the i-th class, c_iRepresenting the total number of class i pixels in the real segmentation.

Further, the mAP value in the step S5 is an average accuracy, which is an average of accuracy rates of different categories, AP is an area under a P-R curve, and P and R are calculated as follows:

in the formula: p means precison, i.e. precision rate; r refers to Recall, i.e., Recall; n indicates the number of species. TP is True Positive, i.e. the classifier identifies the sample correctly and the classifier considers the sample as a Positive sample; FP is False Positive, i.e. the classifier identifies correctly and the classifier considers the sample as a negative sample; FN is False Negative, i.e. the classifier recognizes the result as wrong and the classifier considers the sample as a sample of positive samples, TN is True Negative, i.e. the classifier recognizes the result as wrong and the classifier considers the sample as a sample of Negative samples.

3. Advantageous effects

Compared with the prior art, the invention has the beneficial effects that:

(1) the invention relates to an automobile surface scratch detection method based on improved Mask RCNN, which comprises the steps of firstly collecting a picture making data set, then constructing a deep learning network model, then carrying out hyper-parameter setting, then training the model, storing the model after training, and then carrying out model test to obtain a final test result; the computer vision based on deep learning has good generalization performance, can meet changeable practical conditions, can effectively detect scratches on the surface of the automobile, avoids the problems of time and labor waste of manual detection and low accuracy of on-line manual examination and verification, and has good practicability.

(2) According to the automobile surface scratch detection method based on the improved Mask RCNN, the FPN + ResNet is specifically adopted as the feature extraction network, and on the basis, the NMS algorithm is improved, and since the original NMS algorithm is easy to miss detection when the target frame is close or overlapped, the Soft NMS algorithm is used for replacing the traditional NMS algorithm to carry out non-maximum value inhibition, so that the detection precision can be effectively improved, the robustness is good, and the automobile scratch part can be conveniently and effectively identified.

Drawings

FIG. 1 is a flow chart of an automobile surface scratch detection method based on improved Mask RCNN of the present invention;

FIG. 2 is a flowchart of the Mask RCNN training process of the present invention;

FIG. 3 is a diagram of a ResNet network according to the present invention;

FIG. 4 is a schematic diagram of the FPN network of the present invention;

FIG. 5 is a schematic diagram of the ROI Align algorithm of the present invention;

fig. 6 is a schematic view of the results of the automotive surface scratch detection of the present invention.

Detailed Description

At present, for some scratch phenomena occurring on automobiles, insurance companies often select workers to go to the site for identification, or users upload related images for on-line manual review; however, the selection of workers to go to the site for identification consumes manpower and material resources of the client and the company, and is extremely inconvenient especially when the client is located out of the country; and if the user selects to upload the related images for online manual review, the definition of the images needs to be required to be higher, which brings inconvenience to the client. However, the existing computer vision technology generally depends on artificial design features, which generally have no robustness and cannot meet variable realistic conditions, so that the detection accuracy cannot be guaranteed.

Based on the problems, the invention provides an automobile surface scratch detection method based on an improved Mask RCNN. Although the existing image detection method identifies pictures, such as a patent with a Chinese patent application number of 201910986317.X in the background art, the detection effect is not ideal, because the traditional computer vision technology generally depends on artificial design characteristics which do not have robustness and cannot meet changeable reality conditions, the computer vision based on deep learning has good generalization performance, the invention is improved on the basis of the original Mask RCNN algorithm, for example, ResNet is used as a characteristic extraction network for improving the Mask RCNN algorithm, a Kmeans algorithm is used for clustering anchor frame parameters again, an ROI Align algorithm is used for replacing an ROI Pooling algorithm in the original algorithm, a Soft NMS algorithm is also used for replacing the traditional NMS algorithm, the experimental result proves that the improved algorithm has better effect, can meet changeable reality conditions and has better robustness, and the detection precision is also relatively high, the scratches on the surface of the automobile can be effectively detected, the problems that manual detection is time-consuming and labor-consuming and the on-line manual auditing precision is not high are avoided, and the practicability is good.

Specifically, as shown in fig. 1, the method for detecting scratches on the surface of an automobile based on the improved Mask RCNN specifically includes the following steps:

step S1: the method comprises the steps of collecting car body surface scratch pictures by using an industrial camera, uniformly cutting the picture into 112 x 112 in size, marking the positions and the shapes of scratch parts in the pictures by using a LabelMe tool, and pressing all the pictures according to the ratio of 8: the scale of 2 is divided into a training set and a test set.

Step S2: and constructing a deep learning network model based on an improved Mask RCNN. The specific operation process of the step is as follows:

s2.1: FPN + ResNet is used as the feature extraction network. As shown in fig. 3-4, fig. 3 is a diagram of a ResNet network, and fig. 4 is a diagram of an FPN network.

S2.2: adjusting the number and the size of anchor frames in the RPN by adopting a Kmeans + + algorithm, firstly drawing a relation graph of an average intersection ratio and the number of the anchor frames, selecting the number of the anchor frames as k when the average intersection ratio does not rise any more, setting the length and width of each target frame in a sample of a training set as coordinate points on a two-dimensional coordinate system, and calculating the size of the adjusted anchor frames by utilizing the Kmeans algorithm. The Kmeansn algorithm is specifically as follows: selecting k points which are far enough away from each other as a centroid, calculating the distance between the centroid and the data point for each data point in the data set and each centroid when the cluster distribution result of any one point changes, then distributing the data point to the cluster which is closest to the data point, and then calculating the mean value of each cluster and updating the mean value into the centroid. I.e. inputting a set X of n samples, and then

(1) And initializing: let t equal to 0, randomly select k sample points from the sample set as initial clustering centers

(2) And clustering the sample points. For fixed class center

Wherein

Is of the class G_lCalculating the distance from each sample to the class center, assigning each sample point to the class of the center nearest to the sample point, and forming a new clustering result C^(t)。

(3) And calculating a new class. For clustering result C^tCalculating the mean value of the samples in each current class as a new class center

(4) When the iteration converges or meets the stop condition, outputting C^*＝C^(t)。

S2.3: the original NMS algorithm is easy to miss detection when the target frames are close or overlapped, aiming at the phenomena, the Soft NMS algorithm is used for replacing the traditional NMS algorithm to carry out non-maximum value inhibition, and the Soft NMS and NMS algorithm are shown as the following formula:

The Soft NMS algorithm is specifically as follows: firstly, dividing all candidate boxes into different sets according to different categories, then arranging the candidate boxes in each set in a descending order according to the probability value, and calculating the candidate box and the rest of the maximum probability value in each setThe IOU value of the next candidate frame is less than the set threshold N_tIf not, the probability value of the candidate box is set as

When all the remaining candidate frames are processed, the first approximate value candidate frame is saved, and the second approximate value candidate frame is selected to repeat the process. And finally, repeating the processes for all the sets.

S2.4: the ROI pooling layer adopts an ROI Align algorithm. As shown in fig. 5, the dotted square box represents a feature map, the solid square box represents a candidate region, the solid region (i.e., the candidate region) is divided into 4 small regions, and the ROI Align algorithm of each small region performs average calculation using a bilinear interpolation algorithm to solve the problem of image target region mismatch.

Step S3: setting weight initialization, hyper-parameters and an optimizer of the training process. The specific operation process of the step is as follows:

s3.1: setting a weight initialization mode as Gaussian distribution;

s3.2: setting the batch size to 64;

Step S4: the training set created in step S1 is input to the network model created in step S2 and trained. As shown in fig. 2, the specific operation process of this step is as follows:

s4.1: firstly, the pictures in the training set are sent to a ResNet + FPN feature extraction network for feature extraction to generate a feature map.

S4.2: and (4) sending the feature map generated in the step (S4.1) to an RPN network part for candidate region extraction, wherein anchor frame parameters of the RPN network part are set according to the result of the Kmeans algorithm in the step (S2.2).

S4.3: and carrying out preliminary classification on the candidate regions, and then screening out regions with higher reliability through a foreground score (the foreground score is a confidence coefficient generated by the RPN part on each preselected frame) and a non-maximum suppression algorithm Soft NMS (network management System), so as to generate the candidate frames.

L＝L_cls+L_reg+L_mask

in the formula: l is_clsRepresents the bezel regression loss, L_regRepresenting a loss of classification, L_mask represents the pixel segmentation mask loss. i is the anchor frame index, p_iRepresenting the predicted probability value, p, in the ith anchor box^* _iIs the probability value of the object in the ith anchor box, when the ith anchor box is a positive sample, then p^* _i1 is ═ 1; when the ith anchor frame is background, then p^* _i＝0；N_regAnd N_clsIs a normalization constant. t is t_i＝(t_x,t_y,t_w,t_h) Representing the ith anchor frameParameterized vector of measured center point coordinates and width and height, t_i ^*Is the parameterized vector of the ith label bounding box. R represents the Smooth L1 function.

Step S5: inputting the verification set in the step S1 into the model trained in the step S4, calculating the mAP value of the model, stopping training when the mAP value is more than or equal to 90%, and storing the model; when the mAP value is less than 90%, the process returns to step S3 to re-perform the super parameter setting, and then proceeds to step S4. The mAP value is the average accuracy, which is the mean of the accuracy rates (AP) of different classes, AP is the area under the P-R curve, and the calculation formula of P and R is as follows:

	positive sample	Negative sample
			Prediction as positive samples	TP	FP
Predicting as negative sample	FN	TN

Step S6: inputting the automobile surface scratch detection picture to be detected into the deep learning model stored in the step S5 for inference to obtain the position, the type and the confidence coefficient of the scratch; as shown in fig. 6, fig. 6 is a schematic diagram of the detection result of the scratch on the surface of the automobile.

Claims

1. An automobile surface scratch detection method based on improved Mask RCNN is characterized by comprising the following steps:

2. The improved Mask RCNN-based automotive surface scratch detection method as claimed in claim 1, wherein: after the car body scratch picture is collected in the step S1, the picture is uniformly cut into 112 × 112 in size, the position and shape of the scratch part are marked in the picture by using a label me tool, and all the pictures are divided into 8: the scale of 2 is divided into a training set and a test set.

3. The method for detecting scratches on the surface of an automobile based on modified Mask RCNN according to claim 1 or 2, wherein the specific process of step S2 is as follows:

s2.1: FPN + ResNet50 is used as a feature extraction network;

Soft NMS：

All the candidates are put into considerationDividing the frames into different sets according to different categories, arranging the candidate frames in each set in a descending order according to the magnitude of the probability value, calculating the IOU value of the candidate frame with the maximum probability value and the rest candidate frames in each set, and if the IOU value is smaller than a set threshold value N_tIf not, the probability value of the candidate box is set as

4. The method for detecting scratches on the surface of an automobile based on modified Mask RCNN according to claim 3, wherein the specific process of step S3 is as follows:

s3.1: setting a weight initialization mode as Gaussian distribution;

s3.2: setting the batch size to 64;

5. The method for detecting scratches on the surface of an automobile based on modified Mask RCNN as claimed in claim 4, wherein the specific process of step S4 is as follows:

L＝L_cls+L_reg+L_mask

in the formula: l is_clsRepresents the bezel regression loss, L_regRepresenting a loss of classification, L_maskRepresents pixel segmentation mask loss; i is the anchor frame index, p_iRepresenting the predicted probability value, p, in the ith anchor box_i ^*Is the probability value of the object in the ith anchor box, when the ith anchor box is a positive sample, then p_i ^*1 is ═ 1; when the ith anchor frame is background, then p_i ^*＝0；N_regAnd N_clsIs a normalization constant; t is t_i＝(t_x,t_y,t_w,t_h) Representing the coordinates of the center point and the width and height of the ith anchor frame prediction,t_i ^*is the parameterized vector of the ith label bounding box; r represents a Smooth L1 function;

6. The method as claimed in claim 5, wherein the mAP value in step S5 is an average accuracy, which is an average of accuracy rates of different categories, AP is an area under a P-R curve, and the calculation formula of P and R is as follows:

in the formula: p means precision rate, R means recall rate, and N means number of types; TP is the sample that the classifier identifies correctly and the classifier considers the sample to be a positive sample, FP is the sample that the classifier identifies correctly and the classifier considers the sample to be a negative sample, FN is the sample that the classifier identifies incorrectly and the classifier considers the sample to be a positive sample, TN is the sample that the classifier identifies incorrectly and the classifier considers the sample to be a negative sample.