CN113487550B - Target detection method and device based on improved activation function - Google Patents

Target detection method and device based on improved activation function Download PDF

Info

Publication number
CN113487550B
CN113487550B CN202110732476.4A CN202110732476A CN113487550B CN 113487550 B CN113487550 B CN 113487550B CN 202110732476 A CN202110732476 A CN 202110732476A CN 113487550 B CN113487550 B CN 113487550B
Authority
CN
China
Prior art keywords
layer
convolution
gasket
residual error
layers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110732476.4A
Other languages
Chinese (zh)
Other versions
CN113487550A (en
Inventor
黄坤山
李霁峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Original Assignee
Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute filed Critical Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Priority to CN202110732476.4A priority Critical patent/CN113487550B/en
Publication of CN113487550A publication Critical patent/CN113487550A/en
Application granted granted Critical
Publication of CN113487550B publication Critical patent/CN113487550B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method and device based on an improved activation function, wherein the method comprises the following steps: collecting a gasket image on a detection assembly line, preprocessing the gasket image, and sending the preprocessed gasket image into a trained target detection model for detection to obtain a detection and positioning result of the gasket; the target detection model comprises a depth feature extraction network backbox, a feature pyramid network Neck and a detection network Head, and is based on the problems that the data volume is large, the data types are more, and the version iteration is required to be continuously carried out, so that the activation function is improved, and better learning performance is obtained; the scheme can be used for rapidly detecting and positioning the gaskets, thereby providing convenience for sorting the subsequent mechanical arms and accelerating the realization of the intellectualization of gasket classification; meanwhile, the device can run for a long time due to stable and reliable performance, realizes high-speed, high-precision and non-contact detection and positioning of the gasket, improves the efficiency and saves the cost.

Description

Target detection method and device based on improved activation function
Technical Field
The invention relates to the field of target detection and neural networks, in particular to a target detection method and device based on an improved activation function.
Background
Parts produced on factory lines, especially various tiny parts, are most common, and in the production process, the produced workpieces need to be checked and detected in time. In the current processing method, a production line is erected in a special production line, and workers stand on two sides of the production line to count and detect the workpieces flowing through. But this is a lagging approach, is inefficient, costly, and due to the increased working time, the workers are fatigued and cause a lot of missed inspections.
Disclosure of Invention
The invention aims to provide a target detection method and device based on an improved activation function, which are used for solving the problems of low efficiency and high cost of the existing detection means.
In order to realize the tasks, the invention adopts the following technical scheme:
an object detection method based on an improved activation function, comprising the steps of:
collecting a gasket image on a detection assembly line, preprocessing the gasket image, and sending the preprocessed gasket image into a trained target detection model for detection to obtain a detection and positioning result of the gasket;
the target detection model comprises a depth feature extraction network backbox, a feature pyramid network Neck and a detection network Head, wherein:
the depth feature extraction network backhaul has a 53-layer network structure, wherein the first layer and the second layer are both convolution layers, and the third layer is a maximum pooling layer; the fourth layer to the twelfth layer, the thirteenth layer to the twenty-fourth layer, the twenty-fifth layer to the thirty-third layer, and the thirty-fourth layer to the fifty-third layer respectively constitute first residual error units to fourth residual error units, wherein: the first residual error unit and the third residual error unit both comprise two residual error blocks, and each residual error block is composed of two convolution layers of adjacent layers; the second residual error unit and the fourth residual error unit respectively comprise three residual error blocks and five residual error blocks, and each residual error block consists of four convolution layers of adjacent layers; in each residual error unit, the step length and the number of convolution kernels of the rest convolution layers are the same except for the last convolution layer, and the step length and the number of the convolution kernels of the last layer are multiples of the step length and the number of the convolution kernels of the first three convolution layers; the feature maps output by the twelfth layer, the twenty-fourth layer, the thirty-third layer and the fifty-third layer are respectively marked as C2, C3, C4 and C5;
the feature pyramid network Neck has a five-layer structure of P2, P3, P4, P5, and P6, respectively, wherein:
the P6 layer receives the feature map output by the P5 layer and carries out convolution operation to carry out feature scaling and channel descent treatment; the P5 layer carries out convolution operation on the feature map C5 so as to carry out channel number reduction processing; the P4 layer carries out convolution operation on the feature map C4 to carry out channel number reduction processing, carries out convolution operation on the feature map output by the P5 to carry out up-sampling processing, and then carries out fusion on the feature map after the channel number reduction processing and the feature map after the up-sampling processing; p3 fuses the feature map after the channel descent processing of C3 and the feature map after the up-sampling processing of the output of P4; p2 fuses the characteristic diagram after the C2 is subjected to channel descent treatment and the characteristic diagram after the P3 output is subjected to up-sampling treatment;
the detection network Head comprises an average pooling layer, a full connection layer and a frame regression and classification layer, wherein:
five average pooling layers are arranged and correspond to five feature graphs output by the Neck networks P2 to P6 respectively; after carrying out average pooling treatment on the corresponding feature graphs by the five average pooling layers respectively, inputting the results of the five average pooling layers into the full-connection layer, and finally entering a frame regression and classification layer to carry out prediction frame generation and regression operation to obtain classification results;
the activation functions used in the deep feature extraction network backbox, the feature pyramid network neg and the detection network Head are expressed as follows:
wherein x represents input data, e is a natural logarithmic base, beta is an adjustable parameter, and the value range is [1, + ] infinity.
Further, the adjustable parameter β in the activation function is obtained by manual adjustment setting or by adaptive learning.
Further, the convolution kernel size of each layer in the first layer and the second layer of the backhaul network is 3×3×64, wherein the step length of the first layer is 1, and the step length of the second layer is 2; the size of the pooling core of the third layer is 3 x 64, and the step length is 1; the step length of the convolution layers from the fourth layer to the eleventh layer is 1, and the number of convolution kernels is 64; the step length of the convolution layer of the twelfth layer is 2, and the number of convolution kernels is increased to 256; the step length of the convolution layers from the thirteenth layer to the twenty-third layer is 1, and the number of convolution kernels is 256; and the step length of the convolution layer of the twenty-fourth layer is 2, and the number of convolution kernels is increased to 512; the step length of the convolution layers of the twenty-fifth layer to the thirty-second layer is 1, and the number of convolution kernels is 512; the step length of the convolution layer of the thirty-third layer is 2, and the number of convolution kernels is increased to 1024; the step length of the convolution layers from the thirty-fourth layer to the fifty-second layer is 1, and the number of convolution kernels is 1024; and the step length of the convolution layer of the fifty third layer is 2, and the number of convolution kernels is increased to 2048.
Further, the convolution kernel size in the P6 layer of the Neck network is 1*1, the step length is 2, and the number of kernels is 256; the convolution kernel size in the P5 layer is 1*1, the step length is 1, and the number of kernels is 256; two convolution layers are arranged in the P4 layer, the first convolution kernel size is 1*1, the step length is 1, and the number of kernels is 256; the second convolution kernel size is 1*1, the step size is 1/2, and the number of kernels is 256.
Further, the training process of the target detection model is as follows:
step 1, collecting a large number of gasket pictures, and ensuring the quantity balance of various types of gaskets in all gasket pictures;
step 2, preprocessing the acquired gasket pictures, including arrangement, cleaning and labeling;
step 3, manufacturing a training set and a testing set for the preprocessed gasket picture; wherein the training set does not coincide with the test set;
and 4, training the established target detection model by using a training set, performing generalization test on the trained target detection model by using a testing set, detecting the performance index of the target detection model according to a testing result, and obtaining the trained target detection model after the performance index meets the design requirement.
Further, the preprocessing of the collected gasket picture specifically includes:
the arrangement is to adjust the size direction of the gasket pictures, and the purpose is to unify the gasket pictures into the same format; the cleaning is to define the labels of various types of gaskets according to the definition of the gasket by the producer; labeling is to label the corresponding type label in the gasket by using a labeling tool, and generate a labeling file as a training positive sample, and classify the non-gasket part as a training negative sample.
An object detection apparatus based on an improved activation function, comprising:
the image acquisition module is used for acquiring and detecting gasket images on the assembly line and preprocessing the gasket images;
the recognition detection module is used for sending the preprocessed gasket image into a trained target detection model for detection to obtain a detection and positioning result of the gasket; the target detection model comprises a depth feature extraction network backbox, a feature pyramid network Neck and a detection network Head.
A terminal device for installation on a gasket detection pipeline, comprising a memory, a processor and a computer program stored in said memory and executable on said processor, the processor implementing the steps of the aforementioned method for target detection based on an improved activation function when executing the computer program.
A computer readable storage medium storing a computer program which when executed by a processor performs the steps of the aforementioned method for detecting an object based on an improved activation function.
Compared with the prior art, the invention has the following technical characteristics:
according to the method, a target detection model of the deep learning method is introduced into gasket detection, so that production automation and intellectualization are realized, an activation function more suitable for the project is provided, and better learning performance is obtained; the target detection model can rapidly detect and position the gaskets, thereby providing convenience for the sorting of the subsequent mechanical arms and accelerating the realization of the intellectualization of the gasket sorting; meanwhile, the device can run for a long time due to stable and reliable performance, realizes high-speed, high-precision and non-contact detection and positioning of the gasket, replaces an inefficient manual method, improves efficiency and saves cost.
Drawings
FIG. 1 is a schematic diagram of a structure of a target detection model;
FIG. 2 is a flowchart of a training process for a target detection model;
FIG. 3 is an image of a target pad to be inspected in one embodiment of the invention;
FIG. 4 is a graph showing the output of the object detection pad after model detection in accordance with one embodiment of the present invention.
Detailed Description
Referring to fig. 1, an object detection method based on an improved activation function of the present invention includes the steps of:
collecting a gasket image on a detection assembly line, preprocessing the gasket image, and sending the preprocessed gasket image into a trained target detection model for detection to obtain a detection and positioning result of the gasket; the target detection model comprises a depth feature extraction network backbox, a feature pyramid network rock and a detection network Head, wherein:
1. depth feature extraction network backhaul
The backhaul network has a 53-layer network structure, specifically:
the first layer and the second layer are both convolution layers, and are processed by adopting two convolution layers which are processed in series in sequence to adapt to the small target input of the project, and the size of each convolution kernel is 3×3×64 (the number of long×wide×channel), because the kernel size is reduced, more small target features can be captured; the step length of the first layer is 1, the step length of the second layer is 2, so that the gasket image with the input of 224×224×3 is processed by the two layers, and the output of 112×112×64 is obtained, and the output characteristic diagram is marked as C1.
The third layer is a maximum pooling layer, and the characteristic diagram C1 obtained by convolution of the first two layers is strengthened to a certain extent, so that characteristics are not lost in subsequent continuous convolution treatment; the pool core size of the layer is 3×3×64, the step size is 1, and thus the final output is 112×112×64.
A fourth layer to a twelfth layer (a first residual error unit), wherein small blocks designed by adopting a residual error principle are used as a residual error block of each three convolution layers to form three residual error blocks; the convolution kernel sizes of the three convolution layers in each residual block are respectively 1x 64, 3 x 64 and 1x 64; the step length of the convolution layers from the fourth layer to the eleventh layer is 1, and the number of convolution kernels is 64; the step length of the convolution layer of the twelfth layer is 2, and the number of convolution kernels is increased to 256; the size of the feature map output by the third layer is 112×112×64, after passing through three residual blocks, the size of the feature map output by the twelfth layer is 56×56×256, and the output is marked as C2.
The thirteenth layer to the twenty-fourth layer (second residual error unit) adopt small blocks designed by the principle of residual error, and each four layers of convolution layers are taken as a residual error block to form three residual error blocks; the convolution kernel sizes of the four convolution layers in each residual block are respectively 1x 256, 3 x 256 and 1x 256; the step length of the convolution layers from the thirteenth layer to the twenty-third layer is 1, and the number of convolution kernels is 256; and the step length of the convolution layer of the twenty-fourth layer is 2, and the number of convolution kernels is increased to 512; the thirteenth layer outputs a map size of 56 x 256, and after passing through the three residual blocks, the twenty-fourth layer outputs a map size of 28 x 512, labeled C3.
The twenty-fifth to thirty-third layers (third residual error units) have the same structure as the fourth to twelfth layers, namely, each three convolution layers are taken as a residual error block to form three residual error blocks; the convolution kernel sizes of three convolution layers in each residual block are respectively 1x 512, 3 x 512 and 1x 512, the convolution layer step sizes of the twenty-fifth layer to the thirty-second layer are all 1, and the number of the convolution kernels is 512; the step length of the thirty-third layer convolution layer is 2, and the number of convolution kernels is increased to 1024; the output feature map size of the twenty-fifth layer is 28×28×512, and after passing through the three residual blocks, the output feature map size of the thirty-third layer is 14×14×1024, and is labeled as C4.
In the thirty-fourth layer to the fifty-third layer (a fourth residual error unit), small blocks designed by adopting a residual error principle are used, and each four layers of convolution layers are used as a residual error block to form five residual error blocks; the convolution kernel sizes of the four convolution layers in each residual block are respectively 1x 1024, 3 x 1024 and 1x 1024; the step length of the convolution layers from the thirty-fourth layer to the fifty-second layer is 1, and the number of convolution kernels is 1024; the step length of the convolution layer of the twenty-fourth layer is 2, and the number of convolution kernels is increased to 2048; the output feature map size of the twenty-fifth layer is 14×14×1024, and after five residual blocks, the output feature map size of the fifty-third layer is 7×7×2048, and is labeled as C5.
The Backbone feature extraction layer design ends up. The final feature map output is 56×56×256 of C2, 28×28×512 of C3, 14×14×1024 of C4, 7×7×2048 of C5, respectively, which are used as inputs to the subsequent feature pyramid network Neck.
2. Feature pyramid network Neck
The Neck network adopts a feature pyramid structure, and mainly performs fusion output on features with various sizes output by the backhaul network so as to improve the feature recognition capability under various sizes. In order to adapt to the characteristics of small targets and high density of gasket detection, the application provides a FPN network with a 5-layer structure, wherein the first four layers are C2, C3, C4 and C5 which are respectively corresponding to the output of a backup network, and P2, P3, P4 and P5; layer 5P 6 performs a 1*1 convolution operation with a step length of 2 on C5 again, and reduces the feature map again to grasp a smaller target; finally, 5 output characteristic diagrams are obtained, the number of channels is unified to be 256, and the sizes are 56×56, 28×28, 14×14, 7*7 and 3*3 respectively.
The P6 layer obtains 7.72048 characteristic diagrams of P5 identity output, the P6 layer passes through 1*1 convolution kernels, the step length is 2, the number of the kernels is 256, and therefore the characteristic scaling and the channel number are reduced; the P6 output is 3 x 256.
The P5 layer obtains 7-2048 of C5 characteristic images, convolution processing is carried out by adopting 1*1 convolution kernels, the step length is 1, the number of the kernels is 256, and only the channel number reduction operation is carried out on the characteristic images; p5 outputs 7×7×256.
The P4 layer obtains 14 x 1024 of C4 characteristic diagram and 7 x 256 of P5 output characteristic diagram, P4 adopts 1*1 to C4 output, step length is 1, the number of cores is 256 to carry out convolution core processing, 14 x 256 size characteristic diagram is obtained after channel number reduction processing is completed, and 1*1 is used for the output of P5, the step length is 1/2, the number of the cores is 256, the convolution core processing is completed, the up-sampling processing is completed, the 14 x 256 feature images are obtained, and finally, the P4 fuses the two feature images to obtain the feature image with the final output size of 14 x 256.
The processing mode of P3 and P2 is consistent with P4:
that is, P3 fuses the feature map after the channel down processing of C3 and the feature map after the up sampling processing of the output of P4, and obtains a feature map with a final output size of 28×28×256.
And P2 fuses the characteristic diagram after the channel descent processing of C2 and the characteristic diagram after the up-sampling processing of the output of P3 to obtain the characteristic diagram with the final output size of 56 x 256.
Thus, the Neck network design ends. In the final output, the feature size of P2 is 56×56×256, the feature size of P3 is 28×28×256, the feature size of P4 is 14×14×256, the feature size of P5 is 7×7×256, and the feature size of P6 is 3×3×256.
3. Detecting network Head
The detection network Head in this scheme includes average pooling layer, full tie layer and frame regression and classification layer, wherein:
five average pooling layers are arranged and correspond to five feature graphs output by the Neck networks P2 to P6 respectively; after carrying out average pooling treatment on the corresponding feature graphs by the five average pooling layers respectively, converting the feature graphs into a size of 1x1, inputting the results of the five average pooling layers into the full-connection layer, and finally entering a frame regression and classification layer to carry out prediction frame generation and regression operation, thereby finally obtaining classification results.
In the scheme, the traditional activation function is adaptively modified, mainly because the convergence rate of the conventional activation function training is too slow, the data quantity of training in gasket detection is large, the data types are more, and version iteration is required to be continuously carried out; the modified activation function is as follows:
wherein x represents input data, e is a natural logarithmic base, beta is a manually adjustable parameter, and the value range is [1, + ]; meanwhile, the method can also be used for self-adaptive learning, and the detected target is smaller and single, so that the interference intensity of the background of the detected object is also small, namely the possibility of sinking into a local minimum value in the training process is almost zero, and in order to reduce the number of training parameters and speed up the training speed, the method of self-adaptive learning is not used for obtaining the beta parameter, but a large number of experiments and tests are used for manually adjusting and obtaining the beta parameter, and finally, the activation function is proved to have the maximum learning effect. The activation function in this scheme is applied after the convolutional layer, the max-pooling layer, the average pooling layer, and the full-join layer.
For the above object detection model, the training process of the present invention is as follows:
step 1, collecting a large number of gasket pictures, and ensuring the quantity balance of various types of gaskets in all gasket pictures;
step 2, preprocessing the acquired gasket pictures, including arrangement, cleaning and labeling;
the size direction of the gasket picture is adjusted in the arrangement mode, and the gasket picture can be unified into the same format, so that the gasket picture can be placed into a target detection model for training and testing. The cleaning is to define the labels of various types of gaskets according to the definition of the gasket by the producer; labeling is to label the corresponding type label in the gasket by using a labeling tool, and generate a labeling file as a training positive sample, and classify the non-gasket part as a training negative sample.
Step 3, manufacturing a training set and a testing set for the preprocessed gasket picture; wherein the training set does not coincide with the test set.
And 4, training the established target detection model by using a training set, performing generalization test on the trained target detection model by using a testing set, detecting performance indexes such as accuracy and the like of the target detection model according to a testing result, obtaining the trained target detection model after the performance indexes meet design requirements, and applying the trained model to a detection assembly line for gasket production.
In the process of detecting the performance index, counting the classification recognition accuracy of the gasket, and respectively calculating the false detection rate and the accuracy; and if the performance does not reach the standard, repeating the steps 1 to 4, and continuously increasing the number and the types of the gaskets to improve the detection richness of the target detection algorithm on the gaskets, and simultaneously adjusting the proportion of positive and negative samples to enhance the rejection capability of the target detection algorithm on the negative samples.
Another aspect of the present invention further provides an object detection apparatus based on an improved activation function, comprising:
the image acquisition module is used for acquiring and detecting gasket images on the assembly line and preprocessing the gasket images;
the recognition detection module is used for sending the preprocessed gasket image into a trained target detection model for detection to obtain a detection and positioning result of the gasket; the target detection model comprises a depth feature extraction network backbox, a feature pyramid network Neck and a detection network Head.
It should be noted that, the specific structures of the deep feature extraction network backhaul, the feature pyramid network Neck, and the detection network Head can be referred to the corresponding steps in the foregoing method embodiments with related explanations, which are not repeated herein.
The embodiment of the application further provides a terminal device, which can be a computer or a server; comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above-described method for target detection based on an improved activation function when the computer program is executed.
A computer program may also be split into one or more modules/units that are stored in memory and executed by a processor to complete the present application. One or more modules/units may be a series of instruction segments of a computer program capable of performing a specific function, where the instruction segments are used to describe an execution process of the computer program in the terminal device, for example, the computer program may be divided into a picture acquisition module and an identification detection module, and the functions of each module are referred to in the foregoing apparatus and are not described herein.
Implementations of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the above-described method of target detection based on an improved activation function.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each method embodiment described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (9)

1. A method of target detection based on an improved activation function, comprising the steps of:
collecting and detecting gasket images on a pipeline, and preprocessing, including finishing, cleaning and labeling; sending the preprocessed gasket image into a trained target detection model for detection to obtain a detection and positioning result of the gasket;
the target detection model comprises a backbox adopting a depth feature extraction network, a Neck adopting a feature pyramid structure and a detection network Head, wherein:
the backhaul has a 53-layer network structure, wherein the first layer and the second layer are both convolution layers, and the third layer is a maximum pooling layer; the fourth layer to the twelfth layer, the thirteenth layer to the twenty-fourth layer, the twenty-fifth layer to the thirty-third layer, and the thirty-fourth layer to the fifty-third layer respectively constitute first residual error units to fourth residual error units, wherein: the first residual error unit and the third residual error unit both comprise two residual error blocks, and each residual error block is composed of two convolution layers of adjacent layers; the second residual error unit and the fourth residual error unit respectively comprise three residual error blocks and five residual error blocks, and each residual error block consists of four convolution layers of adjacent layers; in each residual error unit, the step length and the number of convolution kernels of the rest convolution layers are the same except for the last convolution layer, and the step length and the number of the convolution kernels of the last layer are multiples of the step length and the number of the convolution kernels of the first three convolution layers; the feature maps output by the twelfth layer, the twenty-fourth layer, the thirty-third layer and the fifty-third layer are respectively marked as C2, C3, C4 and C5;
neck has a five-layer structure, P2, P3, P4, P5 and P6, respectively, wherein:
the P6 layer receives the feature map output by the P5 layer and carries out convolution operation to carry out feature scaling and channel descent treatment; the P5 layer carries out convolution operation on the feature map C5 so as to carry out channel number reduction processing; the P4 layer carries out convolution operation on the feature map C4 to carry out channel number reduction processing, carries out convolution operation on the feature map output by the P5 to carry out up-sampling processing, and then carries out fusion on the feature map after the channel number reduction processing and the feature map after the up-sampling processing; p3 fuses the feature map after the channel descent processing of C3 and the feature map after the up-sampling processing of the output of P4; p2 fuses the characteristic diagram after the C2 is subjected to channel descent treatment and the characteristic diagram after the P3 output is subjected to up-sampling treatment;
the Head comprises an average pooling layer, a full connection layer and a frame regression and classification layer, wherein:
five average pooling layers are arranged and correspond to five feature graphs output by the P2 layer to the P6 layer of the Neck respectively; after carrying out average pooling treatment on the corresponding feature graphs by the five average pooling layers respectively, inputting the results of the five average pooling layers into the full-connection layer, and finally entering a frame regression and classification layer to carry out prediction frame generation and regression operation to obtain classification results;
the activation function used in Backbone, neck and Head is expressed as follows:
wherein x represents input data, e is a natural logarithmic base, beta is an adjustable parameter, and the value range is [1, + ] infinity.
2. The method for target detection based on an improved activation function according to claim 1, wherein the adjustable parameter β in the activation function is obtained by manual adjustment setting or by adaptive learning.
3. The method for detecting an object based on an improved activation function according to claim 1, wherein the convolution kernel size of each of the first layer and the second layer of the backhaul is 3×3×64, and the step size of the first layer is 1, and the step size of the second layer is 2; the size of the pooling core of the third layer is 3 x 64, and the step length is 1; the step length of the convolution layers from the fourth layer to the eleventh layer is 1, and the number of convolution kernels is 64; the step length of the convolution layer of the twelfth layer is 2, and the number of convolution kernels is increased to 256; the step length of the convolution layers from the thirteenth layer to the twenty-third layer is 1, and the number of convolution kernels is 256; and the step length of the convolution layer of the twenty-fourth layer is 2, and the number of convolution kernels is increased to 512; the step length of the convolution layers of the twenty-fifth layer to the thirty-second layer is 1, and the number of convolution kernels is 512; the step length of the convolution layer of the thirty-third layer is 2, and the number of convolution kernels is increased to 1024; the step length of the convolution layers from the thirty-fourth layer to the fifty-second layer is 1, and the number of convolution kernels is 1024; and the step length of the convolution layer of the fifty third layer is 2, and the number of convolution kernels is increased to 2048.
4. The method for detecting an object based on an improved activation function according to claim 1, wherein the convolution kernel size in the P6 layer of negk is 1*1, the step size is 2, and the number of kernels is 256; the convolution kernel size in the P5 layer is 1*1, the step length is 1, and the number of kernels is 256; two convolution layers are arranged in the P4 layer, the first convolution kernel size is 1*1, the step length is 1, and the number of kernels is 256; the second convolution kernel size is 1*1, the step size is 1/2, and the number of kernels is 256.
5. The method for detecting an object based on an improved activation function according to claim 1, wherein the training process of the object detection model is as follows:
step 1, collecting a large number of gasket pictures, and ensuring the quantity balance of various types of gaskets in all gasket pictures;
step 2, preprocessing the acquired gasket pictures, including arrangement, cleaning and labeling;
step 3, manufacturing a training set and a testing set for the preprocessed gasket picture; wherein the training set does not coincide with the test set;
and 4, training the target detection model by using a training set, performing generalization test on the trained target detection model by using a testing set, detecting the performance index of the target detection model according to a test result, and obtaining the trained target detection model after the performance index meets the design requirement.
6. The method for detecting an object based on an improved activation function according to claim 5, wherein the preprocessing of the acquired gasket picture is specifically:
the arrangement is to adjust the size direction of the gasket pictures, and the purpose is to unify the gasket pictures into the same format; the cleaning is to define the labels of various types of gaskets according to the definition of the gasket by the producer; labeling is to label the corresponding type label in the gasket by using a labeling tool, and generate a labeling file as a training positive sample, and classify the non-gasket part as a training negative sample.
7. An object detection device based on an improved activation function, comprising:
the image acquisition module is used for acquiring and detecting gasket images on the assembly line and preprocessing, including arrangement, cleaning and labeling;
the recognition detection module is used for sending the preprocessed gasket image into a trained target detection model for detection to obtain a detection and positioning result of the gasket;
the target detection model comprises a backbox adopting a depth feature extraction network, a Neck adopting a feature pyramid structure and a detection network Head, wherein:
the backhaul has a 53-layer network structure, wherein the first layer and the second layer are both convolution layers, and the third layer is a maximum pooling layer; the fourth layer to the twelfth layer, the thirteenth layer to the twenty-fourth layer, the twenty-fifth layer to the thirty-third layer, and the thirty-fourth layer to the fifty-third layer respectively constitute first residual error units to fourth residual error units, wherein: the first residual error unit and the third residual error unit both comprise two residual error blocks, and each residual error block is composed of two convolution layers of adjacent layers; the second residual error unit and the fourth residual error unit respectively comprise three residual error blocks and five residual error blocks, and each residual error block consists of four convolution layers of adjacent layers; in each residual error unit, the step length and the number of convolution kernels of the rest convolution layers are the same except for the last convolution layer, and the step length and the number of the convolution kernels of the last layer are multiples of the step length and the number of the convolution kernels of the first three convolution layers; the feature maps output by the twelfth layer, the twenty-fourth layer, the thirty-third layer and the fifty-third layer are respectively marked as C2, C3, C4 and C5;
neck has a five-layer structure, P2, P3, P4, P5 and P6, respectively, wherein:
the P6 layer receives the feature map output by the P5 layer and carries out convolution operation to carry out feature scaling and channel descent treatment; the P5 layer carries out convolution operation on the feature map C5 so as to carry out channel number reduction processing; the P4 layer carries out convolution operation on the feature map C4 to carry out channel number reduction processing, carries out convolution operation on the feature map output by the P5 to carry out up-sampling processing, and then carries out fusion on the feature map after the channel number reduction processing and the feature map after the up-sampling processing; p3 fuses the feature map after the channel descent processing of C3 and the feature map after the up-sampling processing of the output of P4; p2 fuses the characteristic diagram after the C2 is subjected to channel descent treatment and the characteristic diagram after the P3 output is subjected to up-sampling treatment;
the Head comprises an average pooling layer, a full connection layer and a frame regression and classification layer, wherein:
five average pooling layers are arranged and correspond to five feature graphs output by the P2 layer to the P6 layer of the Neck respectively; after carrying out average pooling treatment on the corresponding feature graphs by the five average pooling layers respectively, inputting the results of the five average pooling layers into the full-connection layer, and finally entering a frame regression and classification layer to carry out prediction frame generation and regression operation to obtain classification results;
the activation function used in Backbone, neck and Head is expressed as follows:
wherein x represents input data, e is a natural logarithmic base, beta is an adjustable parameter, and the value range is [1, + ] infinity.
8. Terminal device, characterized in that it is installed on a gasket detection pipeline, comprising a memory, a processor and a computer program stored in said memory and executable on said processor, the processor implementing the steps of the method for object detection based on an improved activation function according to any one of claims 1-6 when the computer program is executed.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the improved activation function based object detection method according to any of claims 1-6.
CN202110732476.4A 2021-06-30 2021-06-30 Target detection method and device based on improved activation function Active CN113487550B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110732476.4A CN113487550B (en) 2021-06-30 2021-06-30 Target detection method and device based on improved activation function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110732476.4A CN113487550B (en) 2021-06-30 2021-06-30 Target detection method and device based on improved activation function

Publications (2)

Publication Number Publication Date
CN113487550A CN113487550A (en) 2021-10-08
CN113487550B true CN113487550B (en) 2024-01-16

Family

ID=77936813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110732476.4A Active CN113487550B (en) 2021-06-30 2021-06-30 Target detection method and device based on improved activation function

Country Status (1)

Country Link
CN (1) CN113487550B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532859A (en) * 2019-07-18 2019-12-03 西安电子科技大学 Remote Sensing Target detection method based on depth evolution beta pruning convolution net
CN111626120A (en) * 2020-04-24 2020-09-04 南京理工大学 Target detection method based on improved YOLO-6D algorithm in industrial environment
KR20200129314A (en) * 2019-05-08 2020-11-18 전북대학교산학협력단 Object detection in very high-resolution aerial images feature pyramid network
CN112330682A (en) * 2020-11-09 2021-02-05 重庆邮电大学 Industrial CT image segmentation method based on deep convolutional neural network
CN112434672A (en) * 2020-12-18 2021-03-02 天津大学 Offshore human body target detection method based on improved YOLOv3

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10726244B2 (en) * 2016-12-07 2020-07-28 Samsung Electronics Co., Ltd. Method and apparatus detecting a target

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200129314A (en) * 2019-05-08 2020-11-18 전북대학교산학협력단 Object detection in very high-resolution aerial images feature pyramid network
CN110532859A (en) * 2019-07-18 2019-12-03 西安电子科技大学 Remote Sensing Target detection method based on depth evolution beta pruning convolution net
CN111626120A (en) * 2020-04-24 2020-09-04 南京理工大学 Target detection method based on improved YOLO-6D algorithm in industrial environment
CN112330682A (en) * 2020-11-09 2021-02-05 重庆邮电大学 Industrial CT image segmentation method based on deep convolutional neural network
CN112434672A (en) * 2020-12-18 2021-03-02 天津大学 Offshore human body target detection method based on improved YOLOv3

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Wen Boyuan ; Wu Muqing.Study on Pedestrian Detection Based on an Improved YOLOv4 Algorithm.2020 IEEE 6th International Conference on Computer and Communications (ICCC).2021,全文. *
基于深度学习的滚动轴承故障诊断方法研究;郭晓林;中国优秀硕士学位论文全文数据库工程科技Ⅱ辑;全文 *
深度卷积神经网络在目标检测中的研究进展;姚群力;胡显;雷宏;;计算机工程与应用(17);全文 *

Also Published As

Publication number Publication date
CN113487550A (en) 2021-10-08

Similar Documents

Publication Publication Date Title
CN110543878B (en) Pointer instrument reading identification method based on neural network
CN111402248A (en) Transmission line lead defect detection method based on machine vision
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN111179249A (en) Power equipment detection method and device based on deep convolutional neural network
CN110827297A (en) Insulator segmentation method for generating countermeasure network based on improved conditions
CN111368825B (en) Pointer positioning method based on semantic segmentation
CN110726898B (en) Power distribution network fault type identification method
CN111401358B (en) Instrument dial correction method based on neural network
CN110263790A (en) A kind of power plant's ammeter character locating and recognition methods based on convolutional neural networks
CN115908142B (en) Visual identification-based damage inspection method for tiny contact net parts
CN116071315A (en) Product visual defect detection method and system based on machine vision
CN116703885A (en) Swin transducer-based surface defect detection method and system
CN111767826A (en) Timing fixed-point scene abnormity detection method
CN114359167A (en) Insulator defect detection method based on lightweight YOLOv4 in complex scene
CN113487550B (en) Target detection method and device based on improved activation function
CN116523885A (en) PCB defect detection method based on multi-scale fusion and deep learning
CN115937492A (en) Transformer equipment infrared image identification method based on feature identification
CN115690001A (en) Method for detecting defects in steel pipe welding digital radiographic image
CN115100546A (en) Mobile-based small target defect identification method and system for power equipment
CN114841980A (en) Insulator defect detection method and system based on line patrol aerial image
CN114494236A (en) Fabric defect detection method and system based on over-complete convolutional neural network
CN114140662A (en) Insulator lightning stroke image sample amplification method based on cyclic generation countermeasure network
CN113408805A (en) Lightning ground flashover identification method, device, equipment and readable storage medium
KR20220027674A (en) Apparatus and Method for Classifying States of Semiconductor Device based on Deep Learning
CN113034432A (en) Product defect detection method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant