CN113628178B

CN113628178B - Steel product surface defect detection method with balanced speed and precision

Info

Publication number: CN113628178B
Application number: CN202110872859.1A
Authority: CN
Inventors: 王兵; 汪文艳; 卢琨; 米春风; 王子; 杨海娟; 周阳; 李敏杰
Original assignee: Anhui University of Technology AHUT
Current assignee: Anhui University of Technology AHUT
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2024-03-15
Anticipated expiration: 2041-07-30
Also published as: CN113628178A

Abstract

The invention discloses a method for detecting surface defects of steel products with balanced speed and precision, which belongs to the technical field of steel surface defect detection and comprises the following steps: s1: obtaining a typical image sample of the surface defects of the hot rolled strip steel from a database, and preprocessing the sample; s2: designing a hot-rolled strip steel surface defect target detection model comprising a layer jump connection and pyramid feature fusion module based on a CenterNet target detection model, and selecting a backbone network as a feature extractor of the target detection model; s3: initializing parameters of a backbone network structure part in a target detection model by adopting parameters trained on an ImageNet data set, and training the target detection model by using a training sample; s4: and testing the test sample by using the trained target detection model, and outputting a detection result. The method for detecting the target position of the surface defect of the hot rolled strip steel has high accuracy and high detection speed, and can be effectively applied to the field real-time detection of the surface defect of the hot rolled strip steel.

Description

Steel product surface defect detection method with balanced speed and precision

Technical Field

The invention relates to the technical field of steel surface defect detection, in particular to a steel product surface defect detection method with balanced speed and precision.

Background

In the production process of hot rolled strip steel, various defects such as pressed oxide skin, scratches, pitting surfaces, inclusions, plaques, cracks and the like are easy to appear on the surface of the strip steel due to various physical and chemical factors and the complexity of the hot rolling process, and the performance and the attractiveness of the product are seriously affected. Defective products are produced and flow to the market with significant economic and commercial reputation losses to the product manufacturer. Therefore, real-time and accurate detection and research of surface defects of the hot rolled strip steel are very important to strip steel production and quality control.

Currently, inspection of steel surface defects is largely divided into manual inspection and automatic defect detection systems. The detection method for manual observation has high dependence on detection personnel, consumes a great deal of manpower, has strong subjectivity and is difficult to scientifically and accurately identify the surface defects of the product. The automatic defect detection technology such as infrared, magnetic leakage and machine vision is widely applied to actual industrial production, so that the problems brought by the artificial defect detection method are gradually solved. Particularly, the development of computer vision greatly improves the accuracy and speed of the defect classification technology. However, these methods cannot obtain accurate defect location information in the defect inspection task. In recent years, with the development of deep learning, convolutional neural networks take unique advantages in feature extraction, and achieve better results in recognition tasks. Therefore, the application of the deep learning method to industrial target detection is a research hotspot of current product detection.

The Chinese patent application No. CN201611136821.3 discloses a method for accurately positioning the surface defects of casting blanks transferred to rolled materials. The invention mainly researches the time and space corresponding relation of the surface defects of the large special steel core product, deduces the established working procedures of the surface defects, and provides important reference for improving the quality of the core product and the production efficiency of enterprises. According to the rapid and accurate positioning method for the defects of the casting blank surface corresponding to the defects of the rolled material, holes are formed in the two ends of the defects of the casting blank surface at certain distances along the rolling direction according to the required aperture and depth, welding rods are used for welding, filling and leveling, defects on the corresponding rolled material of the holes can be rapidly and accurately found according to the corresponding relation after rolling, the defects of the casting blank surface are rapidly positioned according to the defects, sampling, detection and analysis are carried out, rapid and accurate positioning for the defects of the casting blank surface corresponding to the defects of the rolled material is achieved, and timeliness and accuracy of judging defect generation reasons and process improvement can be effectively improved.

Currently in deep learning methods, it has been demonstrated that different levels of feature maps in convolutional neural networks contain different image information. Specifically, the shallow feature map may fully represent the location information of the target, while the high-level features contain rich semantic information of the target. The scientific paper of 10.1109/TIM.2019.2995504 with DOI discloses an end-to-end algorithm for detecting surface defects of hot rolled strip by fusing multiple layers of features. The paper mainly researches the capability of the detection model for detecting the surface defects of the hot-rolled strip steel by fusing a plurality of grades of characteristic diagrams. Firstly, the introduction of the feature fusion method is verified that the classification performance of the model on defects is not obviously reduced, and then the network structure is applied to the hot-rolled strip steel defect detection. The main emphasis of this approach is that features of different levels are scaled to a uniform size using different downsampling or upsampling strategies, and then fused. The experimental results prove that the defect detection capability of the detection model can be improved by the method, but a special solution is not designed for the surface defect images of the hot-rolled strip steel with different characteristics, and the characteristic images of almost all layers are integrated uniformly. In addition, the selected network backbone has more training parameters, and the detection speed requirement in the actual industrial production environment is not considered. Therefore, a method for detecting the surface defects of the steel products, which balances the speed and the precision, is provided.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: how to detect the surface defects of the steel products with high precision and speed meeting the requirements of a real-time production line, and a method for detecting the surface defects of the steel products with balanced speed and precision is provided.

The invention solves the technical problems through the following technical proposal, and the invention comprises the following steps:

s1: obtaining a typical image sample of the surface defects of the hot rolled strip steel from an NEU-DET database, and preprocessing the typical image sample;

s2: designing a hot-rolled strip steel surface defect target detection model comprising a layer jump connection and pyramid feature fusion module based on a CenterNet target detection model, and selecting a backbone network with a small number of parameters as a feature extractor of the model;

s3: initializing parameters of a backbone network structure part in the model by adopting parameters trained on an ImageNet data set, and training the model by using training samples;

s4: and testing the test sample by using the trained model, and outputting a detection result.

Still further, the step S1 includes the steps of:

s1.1: dividing the acquired samples into a training set and a testing set according to the ratio of 7:3, wherein the training set and the testing set comprise 6 categories, and expanding the samples in the training set into 384 x 384 pictures;

s1.2: and (3) performing a series of data enhancement processes such as turning, translation, brightness increase, cutting and the like on the training sample.

Further, the step S2 is based on a central net target detection model, and the step of designing a hot-rolled strip steel surface defect target detection model including a layer jump connection and pyramid feature fusion module, and selecting a backbone network with a small number of parameters as a feature extractor of the model is as follows:

s2.1: based on the CenterNet target detection model, a lightweight backbone network ResNet18-dcn with a small number of parameters is selected as a feature extractor.

S2.2: in order to improve the detection capability of the detection model on the defect image with small difference between the target and the background, a model for detecting the defect target on the surface of the hot-rolled strip steel comprising a layer jump connection module is designed;

s2.3: in order to improve the detection capability of the detection model on the defect images with the characteristics of large intra-class difference and high inter-class similarity, a pyramid feature fusion module is added into the model added with the layer jump connection module, so that a final detection model is obtained.

Furthermore, the step S3 initializes parameters of the backbone network structure part in the model by using parameters trained on the ImageNet dataset, and trains the model by using training samples as follows:

s3.1: in the improved model, initial parameters of a ResNet-18 backbone network structure in ResNet18-dcn adopt parameters which are trained on an image Net data set, an up-sampling layer in a decoding network structure and pyramid feature fusion module in ResNet18-dcn adopts a linear interpolation mode to initialize the parameters, and a deformation convolution layer adopts a xavier Gaussian initialization mode to initialize the parameters;

s3.2: inputting the training set into an improved target detection model, and carrying out parameter learning and updating by adopting a BP algorithm.

Further, the step S4 of testing the test sample by using the trained model and outputting the detection result includes the steps of:

s4.1: amplifying the test sample by 1.6 times and overturning the test sample to enhance the data;

s4.2: the final test result is the average value of the test results of the enhanced sample and the original sample.

Furthermore, the central Net model in the step S2 adopts three backbone network structures of ResNet-101, DLA-34 and Hourslass-104 as feature extractors in the target detection process. When ResNet is taken as a backbone network, a coding and decoding mode is adopted to extract the characteristics of a target, wherein the ResNet network structure is called as a coding network, three alternately connected deformation convolution layers and an up-sampling layer are introduced after an encoder to serve as a decoding network, the deformation convolution layers adopt convolution kernel sizes of 3 multiplied by 3, step sizes are 1, the up-sampling layer adopts convolution kernel sizes of 4 multiplied by 4, an initialization mode is linear interpolation, the step sizes are 2, and channels of each feature map after up-sampling are 256, 128 and 64.

Further, the step of designing the layer jump connection module in the step S2.1 includes:

s2.2.1: combining two feature graphs with the same channel number and size in the two networks based on a ResNet-18 coding network and a decoding network consisting of a deformation convolution layer and an up-sampling layer, namely adding pixel values at the same position of the two feature graphs;

s2.1.2: inputting the combined characteristic diagram into a deformation convolution layer in a decoding network;

further, the step S2.3 of designing the pyramid feature fusion module includes:

s2.3.1: the method comprises the steps of carrying out up-sampling again on a feature map of each up-sampling layer in a decoding network, wherein the convolution kernel sizes of the up-sampling layers are 16×16, 8×8 and 4×4 respectively, the step sizes are 8,4,2 respectively, and the parameter initialization mode is linear interpolation;

s2.3.2: and combining the four up-sampled feature images, wherein the combination mode is that pixel values at the same position are added.

Compared with the prior art, the invention has the following advantages:

(1) Based on a first-order detection CenterNet model with a higher detection speed, a lighter and modularized ResNet-18 network is adopted as a backbone structure, and a deformation convolution layer is added before an up-sampling layer in a decoding network to adapt to industrial data with various defect morphologies, so that the robustness of the model is enhanced; the design of residual structure in ResNet-18 can simplify the learning process of network, accelerate network gradient propagation and avoid network degradation.

(2) On the basis of a light characteristic extraction network structure, two modules of layer jump connection and pyramid characteristic fusion are designed; the design of the layer jump connection module realizes the deep supervision of the whole network structure on the basis of not increasing network parameters, and simultaneously, the deep layer and shallow layer characteristics of the network are fused to realize the integration of local and global characteristics, so that the detection capability of the model on low-quality defect samples is improved; the design of the pyramid feature fusion module fuses four feature images with rich semantic information into an output feature image with larger size, so that the detection capability of the model on defect samples with the characteristics of large intra-class difference and high inter-class similarity is improved.

(3) Compared with the existing network model, the method not only improves the detection accuracy, but also meets the minimum speed requirement for detecting sample defects in the actual production process.

Drawings

FIG. 1 is a flow chart of a method for detecting surface defects of steel products with balanced speed and accuracy in an embodiment of the invention;

FIG. 2 is a graph of 6 exemplary defect image samples in the NEU database according to one embodiment of the present invention;

FIG. 3 is a diagram of a network architecture in an embodiment of the present invention;

FIG. 4 is a diagram showing the effect of detecting a defective picture according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a different layer-jump connection module structure.

Detailed Description

The following describes in detail the examples of the present invention, which are implemented on the premise of the technical solution of the present invention, and detailed embodiments and specific operation procedures are given, but the scope of protection of the present invention is not limited to the following examples.

The embodiment provides a technical scheme: a method for detecting surface defects of steel products with balanced speed and precision is used for identifying and positioning the surface defects of hot rolled strip steel, and is shown in figure 1, and comprises the following steps:

s1: and obtaining a typical image sample of the surface defects of the hot rolled strip steel from the NEU-DET database, and preprocessing the sample.

The NEU database refers to a surface defect database (Northeastern University, university of northeast) and can obtain images of six typical surface defects of the surface of the hot-rolled strip steel and position coordinate information of the defects In each image In the database, and the data images comprise 6 typical images of the surface defects of the hot-rolled strip steel, such as cracks (Cr), pressed oxide scale (Rs), pitting surfaces (Ps), plaques (Pa), inclusions (In) and scratches (Sc), as shown In fig. 2.

The method comprises the following two steps:

s1.1: dividing the acquired samples into a training set and a testing set according to the ratio of 7:3, wherein the training set and the testing set comprise 6 categories, and enlarging the sample size in the training set into 384 x 384 pictures;

S2: based on a CenterNet target detection model, a ResNet18-dcn network structure with a small quantity of parameters is used as a feature extractor of the model, and a hot-rolled strip steel surface defect target detection model comprising a layer jump connection and pyramid feature fusion module is built.

Wherein the ResNet18-dcn model is a feature extractor combining an encoder and a decoder, which uses a conventional ResNet18 model as a backbone network (also referred to as an encoder in ResNet 18-dcn) and 3 stacked deformation convolutional layers and upsampling layers as a decoding network structure of ResNet 18-dcn. ResNet-18 includes 17 convolutional layers, 1 max pooling layer. The structure of the defect target detection network is shown in fig. 3. The target detection network of the present invention is specifically described with reference to fig. 3, firstly, an original defect image is input into one convolution layer and one maximum pooling layer (named conv 1) which are sequentially connected in an encoder, the size of an output feature image is 1/4 of the size of the original image (the fractional value here refers to the ratio of the size of the output image to the size of the original input image, which is explained in the following, the fractional value is the ratio), then, 4 identical convolution modules (conv 2, conv3, conv4 and conv 5) are sequentially connected, each module comprises two residual structures with two convolution layers, in conv2, the step sizes of the convolution layers are all 1, in conv3, conv4 and conv5, the step sizes of the first convolution layer in the first convolution residual structure are all 1, and the step sizes of the rest convolution layers are all 1, and the output feature image sizes of each convolution module are sequentially 1/4, 1/8, 1/16, and 1/32; and then three identical up-sampling modules up-conv1, up-conv2 and up-conv3 in a decoding network are connected, each up-sampling module comprises an up-sampling layer of a deformed convolution layer, the deformed convolution layer adopts a convolution kernel with the size of 3 multiplied by 3, the step length is 1, the up-sampling layer adopts a convolution kernel with the size of 4 multiplied by 4, the initialization mode is linear interpolation, the step length is 2, the channel of each feature map after up-sampling is 256, 128 and 64, and the output feature map size of the three up-sampling modules is 1/16, 1/8 and 1/4. The cross-layer connection module (SCM) represents the connection between the feature images with the same size of the output feature images in the coding network and the decoding network, and the actual combination mode is the addition of pixel values between the feature images, and the output size of the fusion feature images after addition is unchanged. The pyramid feature fusion module (PFM) is characterized in that four feature images with different sizes are firstly adjusted to be uniform in size through an up-sampling layer, and then are combined in an addition mode to be fused into a feature image with a larger size; then three output modules consisting of two convolution layers are connected, of which the first module (cls) is used for defect classification, the second module (loc_offset) is used for predicting the width and height of the defect target detection frame, and the third module (loc_wh) is used for predicting the offset of the center point coordinates of the defect target in the x and y directions.

S3: and initializing parameters of the backbone network structure part in the model by adopting parameters trained on an ImageNet data set, and training the model by using training samples.

The process of training the object detection model comprises the following steps:

s3.1: in the improved model, initial parameters of a ResNet-18 backbone network structure in ResNet18-dcn adopt parameters which are trained on an image Net data set, an up-sampling layer in a decoding network structure and pyramid feature fusion module in ResNet18-dcn adopts a linear interpolation mode to initialize parameters, and a deformation convolution layer adopts a xavier Gaussian initialization mode to initialize parameters;

s3.2: and inputting the training set into an improved target detection model to perform parameter learning and updating.

The improved target detection model is trained by adopting the BP algorithm, because the invention mainly aims at detecting the surface defect target of the steel product, the improved detection network can be called as a steel product surface defect detector, network parameters are updated according to the network output and the errors of the sample defect type and the defect position, 24 pictures are adopted for calculating network errors for one batch each time and updating weights, an Adam optimization algorithm is adopted during training, the initial learning rate is set to be 1.25e-4, the learning rate is reduced by 10 times in sequence when training is carried out to 60 and 120 rounds, and the training is ended when iterating to 160 rounds.

The process of testing the object detection model comprises the following steps:

By adopting the method, a group of steel product surface samples are subjected to defect detection, and the latest detection methods in detection speed or precision in the existing first-order and second-order target detection technology are adopted to identify the steel product surface defects, and the detection precision and speed are compared. Specific first-order detection models are a master RCNN model, a cascade RCNN model and a DDN model, and second-order detection models are M2Det model, SSD, FCOS, ATSS, YOLOv model and a CenterNet model. In order to obtain more competitive detection results, the master RCNN, DDN, M2Det and Cneternet detection models are trained using a plurality of characteristic backbone networks. Through training, the detection results of each model on different types of defects are obtained, as shown in table 1. The effect of detecting a part of the images is shown in fig. 4.

Table 1 evaluation of surface Defect Properties of iron and Steel products based on different models

As can be seen from Table 1, the second-order detector has a higher average accuracy average for detecting surface defects of steel products, and the detection accuracy of the three models is 77.9%, 73.3% and 82.3%, respectively, but the detection speed of each sample is slow. For example, when DDN detector selects res net50 as the feature extraction network, its detection speed is only 11FPS. In contrast, the first-order detector can realize a faster detection speed, but the detection accuracy is greatly different, especially the detection result of the M2Det model. Compared with other detection models, the surface defect detector (Our work) for the steel product has high average accuracy and mean value and high speed. Compared with a CenterNet model which uses the res net18-dcn as a backbone, the layer jump connection and pyramid feature fusion module designed by the invention improves the average precision mean value by 6.1% under the condition of not obviously reducing the detection speed. In general, the defect target detection model in the invention realizes the detection precision of 80.0% and the detection speed of 64FPS, and realizes the optimal speed and precision balance.

Design and discussion of model Structure

Layer jump connection module

Shallow features in convolutional neural networks may provide more target location information. In the above section, we have briefly described that the layer jump connection module (as shown in fig. 5 a) can improve the detection performance by fusing features with the same resolution. However, is other combinations of these layers yielding better performance? For this purpose, the present embodiment designs two other combination patterns to integrate feature patterns of different sizes on the basis of ResNet18-dcn, as shown in FIGS. 5b and 5 c.

The application of these different combinations to hot rolled strip surface defect detection improves detection performance compared with networks with ResNet18-dcn as a backbone. Specifically, a low-level fusion mode scm_l combining a large feature mAP in the encoding network and a small feature mAP in the decoding network obtains 77.2% of the detection speed of the mAP and 61 FPS. And a high-level feature fusion mode SCM_H combining the small feature mAP in the coding network and the large feature mAP in the decoding network is adopted to obtain 80.2% of optimal mAP. However, the increased deconvolution operation in this manner reduces the detection speed of the detection model to 59FPS, which is slower than the scm_s manner in which the same level feature maps are fused. Compared with the single chip SCM_S, the mAP of the SCM_S with the same level is 80.0%, the speed is 64FPS, and the minimum speed requirements of different industrial production scenes can be met.

More specifically, the detection performance of the detection models of different combinations on different defect types is shown in table 3, and it is seen that the detection performance of the three combination modes on scratches, plaques, pressed oxide skins and inclusions is relatively stable, and the average detection accuracy of the detection models on crack defects is respectively 53.7%, 58.6% and 45.0%, and the detection accuracy of the detection models on pitting surfaces is respectively 87%, 84.5% and 80.1%, which shows that the performance influence of the different feature fusion modes on the two defects is obviously different. The potential reasons for the method are that the crack and pitting surface defect are very similar to the background, and the SCM_L combination method adopts a maximum pool method to fuse the features of different layers at a lower layer, so that more target position information can be lost, and the detection result is poor. In contrast, scm_h enlarges the size of the shallow features by using the deconvolution technique, and further extracts more semantic features, thereby obtaining a better detection effect. However, convolution operations also reduce the speed of model detection. Ultimately, scm_s achieves the best performance tradeoff.

TABLE 3 model test Performance under different combinations

Pyramid feature fusion module

Experiments prove that the pyramid feature fusion module can improve detection performance by combining four depth features. In order to explore which level of features can be fused to achieve the best accuracy without reducing the resolution of the output feature map, the embodiment compares the detection performances of different levels of fused features such as a conv5 module, an up-sampling conv1 module, an up-conv2 module, an up-conv3 module and the like in the coding network. As previously described, we also have ResNet18-dcn as the baseline. The detection results of different integration modes are shown in Table 4. By introducing SCM, the mAP of the model was 77.6%. With the increase of the number of the introduced feature images, the detection accuracy is improved to different degrees, and the detection speed is not obviously reduced. The detection precision of the fusion of the two-layer or three-layer characteristic diagrams is 77.8%, 78.6% and 78.5%, and the speed is only deviated by 1FPS. This shows that the feature maps of different layers can provide unique defect information, and the fusion of more feature information is an effective way for improving the detection precision of the defects on the steel surface.

TABLE 4 detection Performance of models combining feature maps of different levels

conv5	up-conv1	up-conv2	up-conv3	mAP(％)	FPS
									√	77.6	70
		√	√	77.8	69
							√	√	√	78.6	68
√	√	√		78.5	64
						√	√	√	√	80.0	64

In summary, the method for detecting the surface defects of the steel products, which has balanced speed and precision, has the advantages of high accuracy in detecting the target positions of the surface defects of the hot rolled strip steel, high detection speed, and capability of being effectively applied to detecting the surface defects of the hot rolled strip steel on site in real time, and is worthy of popularization and application.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims

1. The method for detecting the surface defects of the steel product with balanced speed and precision is characterized by comprising the following steps of:

s1: obtaining a typical image sample of the surface defects of the hot rolled strip steel from a database, and preprocessing the sample;

the step S1 includes the steps of:

s11: dividing the acquired samples into a training set and a testing set according to a set proportion, wherein the training set and the testing set comprise 6 typical surface defect categories, and expanding the samples in the training set into 384 x 384 pictures;

s12: carrying out data enhancement processing on the training sample, wherein the data enhancement processing mode comprises overturning, translating, increasing brightness, cutting and amplifying;

s2: designing a hot-rolled strip steel surface defect target detection model comprising a layer jump connection and pyramid feature fusion module based on a CenterNet target detection model, and selecting a backbone network as a feature extractor of the target detection model;

the step S2 includes the steps of:

s21: selecting a backbone network ResNet18-dcn as a feature extractor based on a CenterNet target detection model;

s22: designing a hot-rolled strip steel surface defect target detection model comprising a layer jump connection module;

s23: adding a pyramid feature fusion module into the target detection model added with the layer jump connection module to obtain a final target detection model;

in the step S23, the final target detection model includes an encoding network, a decoding network, a layer jump connection module, a pyramid feature fusion module, and an output module; the coding network comprises a first convolution module, a second convolution module, a third convolution module, a fourth convolution module and a fifth convolution module which are sequentially connected, wherein the first convolution module comprises a convolution layer and a pooling layer which are sequentially connected, the second convolution module, the third convolution module, the fourth convolution module and the fifth convolution module have the same structure and comprise two residual structures which are sequentially connected and provided with two convolution layers, the decoding network comprises a first upsampling module, a second upsampling module and a third upsampling module which are sequentially connected, and the first upsampling module, the second upsampling module and the third upsampling module have the same structure and comprise a deformation convolution layer and an upsampling layer which are sequentially connected; the layer jump connection module is used for realizing connection between the feature images with the same size as the output feature images in the coding network and the decoding network, wherein the connection combination mode is addition of pixel values between the feature images, and the output size of the added fusion feature images is unchanged; the pyramid feature fusion module is used for adjusting feature graphs with different sizes into uniform sizes through an up-sampling layer, and then combining the feature graphs into a feature graph with a larger size through an addition mode;

in the coding network, the size of the output feature image of the first convolution module is 1/4 of the size of the original image, the step sizes of the convolution layers in the second convolution module are all 1, the step size of the first convolution layer in the first convolution residual error structure in the third convolution module, the fourth convolution module and the fifth convolution module is 2, the step sizes of the rest convolution layers are all 1, and the size of the output feature image of each convolution module is sequentially 1/4, 1/8, 1/16 and 1/32 except the first convolution module;

in the decoding network, the deformation convolution layer adopts a convolution kernel with the size of 3 multiplied by 3, the step length is 1, the upsampling layer adopts a convolution kernel with the size of 4 multiplied by 4, the step length is 2, and the channels of each feature map after upsampling are 256, 128 and 64;

in the step S23, the final target detection model further includes a first output module, a second output module, and a third output module connected to the pyramid feature fusion module, where the first output module is used for defect classification, the second output module is used for predicting the width and height of the defect target detection frame, and the third output module is used for predicting the offset of the center point coordinate of the defect target in the x and y directions;

s3: initializing parameters of a backbone network structure part in a target detection model by adopting parameters trained on an ImageNet data set, and training the target detection model by using a training sample;

s4: testing the test sample by using the trained target detection model, and outputting a detection result;

the step S4 includes the steps of:

s41: amplifying and overturning the test sample to obtain an enhanced test sample, and inputting the enhanced test sample and the original test sample into a trained target detection model;

s42: and respectively obtaining detection results of the enhanced test sample and the original test sample, and taking an average value of the detection results of the enhanced test sample and the original test sample as a final detection result.

2. The method for detecting the surface defects of the steel products, which is balanced in speed and precision, according to claim 1, is characterized in that: in the step S1, the database is a NEU surface defect database, and images of six typical surface defects on the surface of the hot rolled strip steel and position coordinate information of the defects in each image are obtained in the database, wherein the six typical surface defects are cracks, pressed oxide skin, pitting surfaces, plaques, inclusions and scratches respectively.

3. The method for detecting the surface defects of the steel products, which is balanced in speed and precision, according to claim 1, is characterized in that: the step S3 includes the steps of:

s31: the initial parameters of a ResNet-18 backbone network structure in the ResNet18-dcn adopt the parameters trained on an image Net data set, wherein the backbone network structure is an encoding network structure, an up-sampling layer in a decoding network structure and pyramid feature fusion module in the ResNet18-dcn adopts a linear interpolation mode to initialize the parameters, and a deformation convolution layer adopts a xavier Gaussian initialization mode to initialize the parameters;