CN113487551B

CN113487551B - Gasket detection method and device for improving dense target performance based on deep learning

Info

Publication number: CN113487551B
Application number: CN202110732858.7A
Authority: CN
Inventors: 黄坤山; 李霁峰
Original assignee: Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Current assignee: Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2024-01-16
Anticipated expiration: 2041-06-30
Also published as: CN113487551A

Abstract

The invention discloses a gasket detection method for improving dense target performance based on deep learning, which comprises the following steps: collecting gasket images on a detection assembly line, preprocessing the gasket images, and sending the preprocessed gasket images into a trained gasket detection model for detection to obtain gasket detection and positioning results; the gasket detection model comprises a depth feature extraction network backbox, a feature pyramid network Neck and a detection network Head; the collected gasket picture data are arranged, cleaned and marked, and finally a training set and a testing set are manufactured; and defining the type and the mark of the gasket according to the requirement of a manufacturer on gasket detection, and performing multiple training and generalization tests on the model by using the manufactured training set until the gasket detection model with the performance meeting the requirement is trained. Compared with the common visual detection algorithm, the invention has better performance on the detection of dense targets, and realizes the detection and positioning of the gasket at high speed and high precision and in a non-contact way.

Description

Gasket detection method and device for improving dense target performance based on deep learning

Technical Field

The invention relates to the field of deep learning, in particular to a gasket detection method and device for improving dense target performance based on deep learning.

Background

The gasket is an indispensable part between screw and the nut, can increase area of contact, reduces pressure, prevents not hard up, protection part and screw. The factories for producing gaskets generally transmit the produced gaskets to each area for processing through a production line, so that the gaskets are precisely controlled in the transmission process, and the timely counting, detection and other treatments are required.

The conventional method is to let the worker count and count at both ends of the line, but the number of gaskets is large and small, the method is inefficient and costly, and the worker is fatigued and causes many missed checks due to the increase of working time.

Disclosure of Invention

The invention aims to provide a gasket detection method and device for improving the performance of a dense target based on deep learning.

In order to realize the tasks, the invention adopts the following technical scheme:

a gasket detection method for improving dense target performance based on deep learning comprises the following steps:

collecting gasket images on a detection assembly line, preprocessing the gasket images, and sending the preprocessed gasket images into a trained gasket detection model for detection to obtain gasket detection and positioning results;

the gasket detection model comprises a deep feature extraction network backbox, a feature pyramid network Neck and a detection network Head, wherein:

the depth feature extraction network backhaul has a 19-layer network structure, wherein the first layer and the second layer are both convolution layers, and the third layer is a maximum pooling layer; the fourth layer to the seventh layer, the eighth layer to the eleventh layer, the twelfth layer to the tenth layer and the sixteenth layer to the nineteenth layer respectively form a residual error unit, each residual error unit comprises two residual error blocks, and each residual error block is formed by two convolution layers of adjacent layers; in each residual error unit, the step length and the number of convolution kernels of the first three convolution layers are the same, and the step length and the number of convolution kernels of the last layer are multiples of the step length and the number of convolution kernels of the first three convolution layers; the feature maps output by the seventh layer, the eleventh layer, the tenth layer and the nineteenth layer are respectively marked as C2, C3, C4 and C5;

the feature pyramid network Neck has a five-layer structure of P2, P3, P4, P5, and P6, respectively, wherein:

the P6 layer receives the feature map output by the P5 layer and carries out convolution operation to carry out feature scaling and channel descent treatment; the P5 layer carries out convolution operation on the feature map C5 so as to carry out channel number reduction processing; the P4 layer carries out convolution operation on the feature map C4 to carry out channel number reduction processing, carries out convolution operation on the feature map output by the P5 to carry out up-sampling processing, and then carries out fusion on the feature map after the channel number reduction processing and the feature map after the up-sampling processing; p3 fuses the feature map after the channel descent processing of C3 and the feature map after the up-sampling processing of the output of P4; p2 fuses the characteristic diagram after the C2 is subjected to channel descent treatment and the characteristic diagram after the P3 output is subjected to up-sampling treatment;

the detection network Head has a first branch structure and a second branch structure, wherein:

the first layer of the first branch is a convolution layer, a feature map output by a Neck network P2, P3, P4, P5 and P6 is obtained, convolution operation is carried out, and the size and the channel number are kept unchanged to stabilize the feature; the second layer is a convolution layer and is used for carrying out channel descent processing on the output result of the first layer; the third layer is a boundary frame generation and prediction layer and is used for generating a prediction frame pixel by pixel on the feature map output by the second layer, performing regression operation and finally outputting an initial four-dimensional vector of a detected target in the feature map to establish an ROI (region of interest); the fourth layer is a deconvolution layer, and is used for executing deconvolution operation on the ROI area obtained by the previous layer; the fifth layer is a deconvolution layer and is used for executing the prediction frame generation and regression operation again on the feature map output by the fourth layer, and a new four-dimensional vector is obtained at the moment; the fifth layer is a boundary frame correction layer and is used for correcting the initial four-dimensional vector and the new four-dimensional vector obtained by the third layer and the fifth layer to obtain an accurate boundary frame;

the second branch is input to the classification layer by using the feature map output by the second layer of the first branch and the accurate boundary box obtained by the sixth layer of the first branch, and the targets in the boundary box are identified.

Further, the convolution kernel size of each layer in the first layer and the second layer of the backhaul network is 3×3×64, wherein the step length of the first layer is 1, and the step length of the second layer is 2; the size of the pooling core of the third layer is 3 x 64, and the step length is 1; the step length of the convolution layers from the fourth layer to the sixth layer is 1, and the number of convolution kernels is 64; and the step length of the convolution layer of the seventh layer is 2, and the number of convolution kernels is increased to 128; the step length of the convolution layers from the eighth layer to the tenth layer is 1, and the number of convolution kernels is 128; the step length of the convolution layer of the eleventh layer is 2, and the number of convolution kernels is increased to 256; the step length of the convolution layers from the twelfth layer to the fourteenth layer is 1, and the number of convolution kernels is 256; and the step length of the convolution layer of the tenth layer is 2, and the number of convolution kernels is increased to 512; the step length of the convolution layers from the sixteenth layer to the eighteenth layer is 1, and the number of convolution kernels is 512; whereas the nineteenth layer has a convolution layer step size of 2, the number of convolution kernels rises to 1024.

Further, the convolution kernel size in the P6 layer of the Neck network is 1*1, the step length is 2, and the number of kernels is 128; the convolution kernel size in the P5 layer is 1*1, the step length is 1, and the number of kernels is 128; two convolution layers are arranged in the P4 layer, the first convolution kernel has a size of 1*1, the step length is 1, and the number of kernels is 128; the second convolution kernel size is 1*1, the step size is 1/2, and the number of kernels is 128.

Further, in the Head network, the convolution kernel size of the first layer is 3*3, the convolution kernel size of the second layer is 1*1, and the number of kernels is 36.

Further, the correction is performed on the initial four-dimensional vector and the new four-dimensional vector obtained by the third layer and the fifth layer, and the accurate bounding box is calculated as follows:

Obox＝{x，y，w，h}

Tbox＝{x′，y′，w′，h′}

the Obox represents the initial four-dimensional vector, the Tbox represents the new four-dimensional vector, and Nbox represents the final positioning result of the positioning box.

Further, the training process of the gasket detection model is as follows:

step 1, collecting a large number of gasket pictures, and ensuring the quantity balance of various types of gaskets in all gasket pictures;

step 2, preprocessing the acquired gasket pictures, including arrangement, cleaning and labeling;

step 3, manufacturing a training set and a testing set for the preprocessed gasket picture; wherein the training set does not coincide with the test set;

and 4, training the established gasket detection model by using a training set, performing generalization test on the trained gasket detection model by using a testing set, detecting performance indexes of the gasket detection model according to a testing result, and obtaining the trained gasket detection model after the performance indexes meet design requirements.

Further, the preprocessing of the collected gasket picture specifically includes:

the arrangement is to adjust the size direction of the gasket pictures, and the purpose is to unify the gasket pictures into the same format; the cleaning is to define the labels of various types of gaskets according to the definition of the gasket by the producer; labeling is to label the corresponding type label in the gasket by using a labeling tool, and generate a labeling file as a training positive sample, and classify the non-gasket part as a training negative sample.

A deep learning-based gasket detection device that promotes dense target performance, comprising:

the image acquisition module is used for acquiring and detecting gasket images on the assembly line and preprocessing the gasket images;

the identification detection module is used for sending the preprocessed gasket image into a trained gasket detection model for detection to obtain a gasket detection and positioning result; the gasket detection model comprises a depth feature extraction network backbox, a feature pyramid network Neck and a detection network Head.

A terminal device is installed on a gasket detection pipeline and comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the steps of the gasket detection method based on deep learning to improve dense target performance when executing the computer program.

A computer readable storage medium storing a computer program which when executed by a processor performs the steps of the aforementioned deep learning-based method of pad detection to improve dense target performance.

Compared with the prior art, the invention has the following technical characteristics:

according to the invention, the production object gasket is used for learning, and finally, the gasket on the production line can be detected and positioned; experiments show that the method not only reduces the detection workload, but also obtains better performance on the detection of dense targets than the common visual detection algorithm, thereby realizing the detection and positioning of the gasket at high speed and high precision and in a non-contact way.

Drawings

FIG. 1 is a schematic diagram of a gasket detection model;

FIG. 2 is a flowchart of a training process for a pad inspection model;

FIG. 3 is an image of a target pad to be inspected in one embodiment of the invention;

FIG. 4 is a graph showing the output of the object detection pad after model detection in accordance with one embodiment of the present invention.

Detailed Description

Referring to fig. 1 and 2, the method for detecting the gasket based on deep learning to improve the dense target performance comprises the following steps:

collecting gasket images on a detection assembly line, preprocessing the gasket images, and sending the preprocessed gasket images into a trained gasket detection model for detection to obtain gasket detection and positioning results; the gasket detection model comprises a depth feature extraction network backbox, a feature pyramid network rock and a detection network Head, wherein:

1. depth feature extraction network backhaul

The backhaul network has a 19-layer network structure, specifically:

the first layer and the second layer are both convolution layers, and are processed by adopting two convolution layers which are processed in series in sequence to adapt to the small target input of the project, and the size of each convolution kernel is 3×3×64 (the number of long×wide×channel), because the kernel size is reduced, more small target features can be captured; the step length of the first layer is 1, the step length of the second layer is 2, so that the gasket image with the input of 224×224×3 is processed by the two layers, and the output of 112×112×64 is obtained, and the output characteristic diagram is marked as C1.

The third layer is a maximum pooling layer, and the characteristic diagram C1 obtained by convolution of the first two layers is strengthened to a certain extent, so that characteristics are not lost in subsequent continuous convolution treatment; the pool core size of the layer is 3×3×64, the step size is 1, and thus the final output is 112×112×64.

The fourth layer to the seventh layer adopt small blocks designed by the principle of residual errors, and each two layers of convolution layers are taken as a residual error block, wherein the convolution kernel sizes of the two convolution layers in each residual error block are respectively 1 x 64 and 3 x 64; the step length of the convolution layers from the fourth layer to the sixth layer is 1, and the number of convolution kernels is 64; and the step length of the convolution layer of the seventh layer is 2, and the number of convolution kernels is increased to 128; the size of the feature map of the third layer output is 112×112×64, after passing through the two residual blocks, the size of the feature map of the seventh layer output is 56×56×128, and the output is marked as C2.

The eighth layer to the eleventh layer adopt small blocks designed by the principle of residual error, and each two layers of convolution layers are taken as a residual error block, wherein the convolution kernel sizes of the two convolution layers in each residual error block are respectively 1 x 128 and 3 x 128; the step length of the convolution layers from the eighth layer to the tenth layer is 1, and the number of convolution kernels is 128; the step length of the convolution layer of the eleventh layer is 2, and the number of convolution kernels is increased to 256; the size of the output feature map of the seventh layer is 56×56×128, and after passing through the two residual blocks, the size of the output feature map of the eleventh layer is 28×28×256, and is denoted as C3.

The twelfth layer to the fifteenth layer adopt small blocks designed by the principle of residual errors, and each two convolution layers are taken as a residual error block, wherein the convolution kernel sizes of the two convolution layers in each residual error block are respectively 1 x 256 and 3 x 256; the step length of the convolution layers from the twelfth layer to the fourteenth layer is 1, and the number of convolution kernels is 256; and the step length of the convolution layer of the tenth layer is 2, and the number of convolution kernels is increased to 512; the size of the feature map output by the eleventh layer is 28×28×256, and after passing through the two residual blocks, the size of the feature map output by the fifteenth layer is 14×14×512, and is marked as C4.

Sixteenth to nineteenth layers, small blocks designed by adopting a residual principle are adopted, and every two convolution layers are taken as a residual block, wherein the convolution kernel sizes of the two convolution layers in each residual block are respectively 1 x 512 and 3 x 512; the step length of the convolution layers from the sixteenth layer to the eighteenth layer is 1, and the number of convolution kernels is 512; the step length of a convolution layer of the nineteenth layer is 2, and the number of convolution kernels is increased to 1024; the size of the feature map output by the fifteenth layer is 14×14×512, and after passing through the two residual blocks, the size of the feature map output by the nineteenth layer is 7×7×1024, and is marked as C5.

The Backbone feature extraction layer design ends up. The final obtained feature map s outputs are 56×56×128 of C2, 28×28×256 of C3, 14×14×512 of C4, and 7×7×1024 of C5, respectively, as inputs to the subsequent feature pyramid network Neck.

2. Feature pyramid network Neck

The Neck network adopts a feature pyramid structure, and mainly performs fusion output on features with various sizes output by the backhaul network so as to improve the feature recognition capability under various sizes. In order to adapt to the characteristics of small targets and high density of gasket detection, the application provides a FPN network with a 5-layer structure, wherein the first four layers are C2, C3, C4 and C5 which are respectively corresponding to the output of a backup network, and P2, P3, P4 and P5; layer 5P 6 performs a 1*1 convolution operation with a step length of 2 on C5 again, and reduces the feature map again to grasp a smaller target; finally, 5 output characteristic diagrams are obtained, the number of channels is unified to 128, and the sizes are 56×56, 28×28, 14×14, 7*7 and 3*3 respectively.

The P6 layer obtains 7.7X104 characteristic diagrams output by the P5 identity, the P6 layer passes through 1*1 convolution kernels, the step length is 2, the number of the kernels is 128, and therefore the characteristic scaling and the channel number are reduced; the P6 output is 3 x 128.

The P5 layer obtains 7 x 1024 of C5 characteristic images, adopts 1*1 convolution kernel, carries out convolution treatment with the step length of 1 and the kernel number of 128, and only carries out channel number reduction operation on the characteristic images; p5 outputs 7×7×128.

The method comprises the steps that a P4 layer obtains 14 x 512 of C4 characteristic graphs and 7 x 1024 of P5 output characteristic graphs, the P4 firstly adopts 1*1 of C4 output, the step length is 1, the number of cores is 128 to carry out convolution kernel processing, the 14 x 128 size characteristic graphs are obtained after channel number reduction processing is completed, 1*1 of the P5 output characteristic graphs are used, the step length is 1/2, the number of cores is 128 to carry out convolution kernel processing, up-sampling processing is completed, the 14 x 128 characteristic graphs are obtained, finally, the P4 fuses the two characteristic graphs, and the final 14 x 128 characteristic graphs are obtained.

The processing mode of P3 and P2 is consistent with P4:

that is, P3 fuses the feature map after the channel down processing of C3 and the feature map after the up sampling processing of the output of P4, and obtains a feature map with a final output size of 28×28×128.

And P2 fuses the characteristic diagram after the channel descent processing of C2 and the characteristic diagram after the up-sampling processing of the output of P3 to obtain the characteristic diagram with the final output size of 56 x 128.

Thus, the Neck network design ends. In the final output, the feature size of P2 is 56×56×128, the feature size of P3 is 28×28×128, the feature size of P4 is 14×14×128, the feature size of P5 is 7×7×128, and the feature size of P6 is 3×3×128.

3. Detecting network Head

In order to improve the performance of dense detection, the Head network structure in the scheme is divided into two branches, and the two branches are respectively responsible for obtaining accurate category confidence detection and accurate bounding box positioning accuracy from feature graphs output from different P2 to P6.

The structure of the first branch is as follows:

the first layer is a convolution layer, and Neck output is obtained and subjected to 3*3 convolution operation; taking P5 as an example, the feature map of 7×7×128 output by P5, the size and the channel number are all kept unchanged, and this operation is mainly to keep the feature stable and strengthen the effect of the Neck network fusion.

The second layer is a convolution layer, and channel descent processing is carried out on the output result of the first layer, so that preparation work is carried out for generating a bounding box on the feature map. The 1*1 convolution kernel is adopted, the number of the kernels is 36, the input is 7×7×128, and the output is 7×7×36.

The third layer is a boundingbox generation and prediction layer, which generates a prediction frame pixel by pixel on the 7 x 36 feature map output by the second layer, and performs a regression operation, and finally outputs an initial four-dimensional vector (x, y, w, h) of the detected object in the feature map to establish the ROI region.

The fourth layer is deconvolution layer deconv, and the deconvolution operation is performed on the ROI area (region of interest) obtained by the previous layer by the deconvolution layer, namely, the feature map is restored to the front of convolution on the premise of locking the target position, so that the range of the target in lower-level semantics is obtained.

The fifth layer is a prediction layer and a prediction layer, and has the same structure as the third layer, and mainly performs a prediction frame generation and regression operation on the feature map obtained by deconvolution again, so that new four-dimensional vectors (x ', y', w ', h') are obtained.

The sixth layer is a binding box correction layer, and four-dimensional vectors obtained by the third layer and the fifth layer are compared and corrected; finally, an accurate bounding box is obtained, and the method is specifically calculated as follows:

Obox＝{x，y，w，h}

Tbox＝{x′，y′，w′，h′}

the Obox represents the initial four-dimensional vector of the target, the Tbox represents the new four-dimensional vector, and Nbox represents the final obtained positioning result of the positioning box.

The second branch inputs the feature map of 7 x 36 output by the second layer of the first branch and the accurate bounding box obtained by the sixth layer of the first branch into the classification layer together, and identifies the target in the boundary box.

Feature images with different size levels output by P2-P6 in a Neck network are respectively input into a Head network for processing; if the target is not detected after the feature map is processed by the Neck network, the boundary box is not output finally, and otherwise, the boundary box and the identified target are output.

For the gasket detection model, the training process of the invention is as follows:

the size direction of the gasket pictures is adjusted, and the gasket pictures can be unified into the same format, so that the gasket pictures can be placed into a gasket detection model for training and testing. The cleaning is to define the labels of various types of gaskets according to the definition of the gasket by the producer; labeling is to label the corresponding type label in the gasket by using a labeling tool, and generate a labeling file as a training positive sample, and classify the non-gasket part as a training negative sample.

Step 3, manufacturing a training set and a testing set for the preprocessed gasket picture; wherein the training set does not coincide with the test set.

And 4, training the established gasket detection model by using a training set, performing generalization test on the trained gasket detection model by using a testing set, detecting performance indexes such as accuracy and the like of the gasket detection model according to a testing result, obtaining the trained gasket detection model after the performance indexes meet design requirements, and applying the trained model to a detection assembly line for gasket production.

In the process of detecting the performance index, counting the classification recognition accuracy of the gasket, and respectively calculating the false detection rate and the accuracy; and if the performance does not reach the standard, repeating the steps 1 to 4, and continuously increasing the number and the types of the gaskets to improve the detection richness of the target detection algorithm on the gaskets, and simultaneously adjusting the proportion of positive and negative samples to enhance the rejection capability of the target detection algorithm on the negative samples.

Another aspect of the present invention further provides a spacer detection apparatus for improving dense target performance based on deep learning, including:

It should be noted that, the specific structures of the deep feature extraction network backhaul, the feature pyramid network Neck, and the detection network Head can be referred to the corresponding steps in the foregoing method embodiments with related explanations, which are not repeated herein.

The embodiment of the application further provides a terminal device, which can be a computer or a server; the method comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the gasket detection method for improving the dense target performance based on deep learning when executing the computer program.

A computer program may also be split into one or more modules/units that are stored in memory and executed by a processor to complete the present application. One or more modules/units may be a series of instruction segments of a computer program capable of performing a specific function, where the instruction segments are used to describe an execution process of the computer program in the terminal device, for example, the computer program may be divided into a picture acquisition module and an identification detection module, and the functions of each module are referred to in the foregoing apparatus and are not described herein.

Implementations of the present application provide a computer readable storage medium storing a computer program that when executed by a processor performs the steps of the deep learning-based method for improving dense target performance.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each method embodiment described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. The gasket detection method for improving the dense target performance based on deep learning is characterized by comprising the following steps of:

the depth feature extraction network backhaul has a 19-layer network structure, wherein the first layer and the second layer are both convolution layers, and the third layer is a maximum pooling layer; the fourth layer to the seventh layer, the eighth layer to the eleventh layer, the twelfth layer to the tenth layer and the sixteenth layer to the nineteenth layer respectively form a residual error unit, each residual error unit comprises two residual error blocks, and each residual error block is formed by two convolution layers of adjacent layers; in each residual error unit, the step length and the number of convolution kernels of the first three convolution layers are the same, and the step length and the number of convolution kernels of the last layer are 2 times of the step length and the number of convolution kernels of the first three convolution layers; the feature maps output by the seventh layer, the eleventh layer, the tenth layer and the nineteenth layer are respectively marked as C2, C3, C4 and C5;

the first layer of the first branch is a convolution layer, a feature map output by a Neck network P2, P3, P4, P5 and P6 is obtained, convolution operation is carried out, and the size and the channel number are kept unchanged to stabilize the feature; the second layer is a convolution layer and is used for carrying out channel descent processing on the output result of the first layer; the third layer is a boundary frame generation and prediction layer and is used for generating a prediction frame pixel by pixel on the feature map output by the second layer, performing regression operation and finally outputting an initial four-dimensional vector of a detected target in the feature map to establish an ROI (region of interest); the fourth layer is a deconvolution layer, and is used for executing deconvolution operation on the ROI area obtained by the previous layer; the fifth layer is a boundary frame generation and prediction layer and is used for executing prediction frame generation and regression operation again on the feature map output by the fourth layer, and a new four-dimensional vector is obtained at the moment; the sixth layer is a boundary frame correction layer and is used for correcting the initial four-dimensional vector and the new four-dimensional vector obtained by the third layer and the fifth layer to obtain an accurate boundary frame;

2. The method for detecting the gasket based on deep learning to improve the dense target performance according to claim 1, wherein the convolution kernel size of each of the first layer and the second layer of the backhaul network is 3×3×64, wherein the step size of the first layer is 1, and the step size of the second layer is 2; the size of the pooling core of the third layer is 3 x 64, and the step length is 1; the step length of the convolution layers from the fourth layer to the sixth layer is 1, and the number of convolution kernels is 64; and the step length of the convolution layer of the seventh layer is 2, and the number of convolution kernels is increased to 128; the step length of the convolution layers from the eighth layer to the tenth layer is 1, and the number of convolution kernels is 128; the step length of the convolution layer of the eleventh layer is 2, and the number of convolution kernels is increased to 256; the step length of the convolution layers from the twelfth layer to the fourteenth layer is 1, and the number of convolution kernels is 256; and the step length of the convolution layer of the tenth layer is 2, and the number of convolution kernels is increased to 512; the step length of the convolution layers from the sixteenth layer to the eighteenth layer is 1, and the number of convolution kernels is 512; whereas the nineteenth layer has a convolution layer step size of 2, the number of convolution kernels rises to 1024.

3. The gasket detection method for improving dense target performance based on deep learning according to claim 1, wherein the convolution kernel size in the P6 layer of the negk network is 1*1, the step length is 2, and the number of convolution kernels is 128; the convolution kernel size in the P5 layer is 1*1, the step length is 1, and the number of convolution kernels is 128; two convolution layers are arranged in the P4 layer, the first convolution kernel has a size of 1*1, the step length is 1, and the number of convolution kernels is 128; the second convolution kernel size is 1*1, the step size is 1/2, and the number of kernels is 128.

4. The method for detecting the gasket based on deep learning to improve the dense target performance according to claim 1, wherein the convolution kernel size of the first layer in the Head network is 3*3, the convolution kernel size of the second layer is 1*1, and the number of kernels is 36.

5. The method for detecting the gasket based on deep learning to improve the dense target performance according to claim 1, wherein the correction of the initial four-dimensional vector and the new four-dimensional vector obtained by the third layer and the fifth layer is performed to obtain an accurate bounding box calculated as follows:

6. The method for detecting the gasket based on deep learning to improve the dense target performance according to claim 1, wherein the training process of the gasket detection model is as follows:

7. The method for detecting the gasket based on deep learning to improve the dense target performance according to claim 6, wherein the preprocessing of the collected gasket pictures is specifically as follows:

8. Gasket detection device based on intensive target performance of deep learning promotion, characterized in that includes:

the identification detection module is used for sending the preprocessed gasket image into a trained gasket detection model for detection to obtain a gasket detection and positioning result;

9. A terminal device, characterized in that it is installed on a gasket detection pipeline, comprising a memory, a processor and a computer program stored in said memory and executable on said processor, the processor implementing the steps of the method for gasket detection based on deep learning to improve dense target performance according to any one of claims 1-7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the deep learning-based dense target performance improvement gasket detection method according to any one of claims 1 to 7.