CN117994232A

CN117994232A - Defect detection method and device for industrial product, electronic equipment and storage medium

Info

Publication number: CN117994232A
Application number: CN202410168156.4A
Authority: CN
Inventors: 吴亚晖; 王昶力; 黄世永; 谈钱辉; 李生辉
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2024-02-05
Filing date: 2024-02-05
Publication date: 2024-05-07

Abstract

The invention discloses a defect detection method and device for industrial products, electronic equipment and a storage medium, and relates to the field of artificial intelligence, wherein the defect detection method comprises the following steps: obtaining a picture to be detected of a target industrial product, inputting the picture to be detected into a preset detection model, outputting a marking picture, analyzing the marking picture to obtain a defect type marked by each marking frame and a defect probability value corresponding to the defect type, and adding the defect type corresponding to the defect probability value into a product defect set of the target industrial product under the condition that the defect probability value is larger than a preset defect threshold. The invention solves the technical problem that the defect detection of industrial products cannot be carried out under the condition of limited equipment resources in the related technology.

Description

Defect detection method and device for industrial product, electronic equipment and storage medium

Technical Field

The invention relates to the field of artificial intelligence, in particular to a defect detection method and device for industrial products, electronic equipment and a storage medium.

Background

Machine vision is an important branch of artificial intelligence technology, and is capable of acquiring and processing images of real objects by means of cameras, sensors, etc. to extract the required information therefrom. The need for machine vision technology in the industry has continually driven the development of this area. Currently, industrial vision has been widely used in a variety of industrial scenarios, including vision inspection of pipelined products, vision-guided robotic arm operation, intelligent factory construction, and the like.

Computing power is the basis for industrial vision processing technology, and as industrial manufacturing technology continues to improve, more and more intelligent applications will migrate to edge computing devices to achieve immediate response and reduce latency. This results in a higher demand on computing power. The task of algorithms based on deep learning in the field of industrial vision mainly consists in identifying, classifying, detecting and segmenting specific objects in industrial equipment, workers and manufacturing environments. Currently, general vision algorithms are implemented based on deep neural networks, which often possess complex and bulky structures, with large amounts of data and parameters.

With the continuous development of artificial intelligence, more and more mechanisms begin to utilize machine vision technology to improve production efficiency and product quality. Through the machine vision technology, automatic and high-precision product detection can be realized, and meanwhile, the multi-sense fusion technology can also provide more comprehensive information support for the production decision of the mechanism. In addition, the edge detection technology can also help the mechanism to monitor and early warn in real time, and ensure the stability and safety of the production process. However, the use of machine vision in institutions is also somewhat limited. Because the mechanism hardware equipment resources and the computing resources are limited, the visual algorithm model is huge and has high complexity, and is difficult to deploy on related hardware equipment.

In the related art, although most of the visual detection algorithms basically meet the requirements of the mechanism in terms of accuracy, the model is huge and cannot be deployed under the condition that hardware equipment is limited. If the algorithm model can be reduced, the algorithm detection precision can be ensured, and further guarantee can be provided for the application of the machine vision in industry. This will help enable machine vision technology to be widely used in more organizations with limited organization hardware device resources and computing resources.

Therefore, machine vision faces several problems in the industry: (1) limited hardware device resources and computing resources; (2) The complexity of the machine vision algorithm is generally high, and the calculated amount and the parameter amount are large; (3) insufficient and unbalanced data: the development of machine vision technology requires a large amount of data for training and testing, but in practical applications, it is difficult to obtain a sufficient and balanced data set; (4) high application cost: the application of industrial vision needs to combine the special hardware and software matched with the actual selection, so that the cost and time of the mechanism are high, and the machine vision technology is difficult to spread in small and medium-sized mechanisms; (5) data security and privacy issues: machine vision techniques require processing large amounts of image and video data that may involve the core technology and confidentiality of the organization.

In view of the foregoing, there is a need for a machine vision inspection method with limited hardware resources to improve the productivity and quality of the machine, reduce the cost and promote the development of industrial automation.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a defect detection method and device for industrial products, electronic equipment and a storage medium, which at least solve the technical problem that the defect detection of the industrial products cannot be carried out under limited equipment resources in the related technology.

According to an aspect of an embodiment of the present invention, there is provided a defect detection method for an industrial product, including: obtaining a picture to be detected of a target industrial product; inputting the picture to be detected into a preset detection model, and outputting a marked picture, wherein the preset detection model is a detection model constructed based on a lightweight backbone network, and a plurality of marked frames are arranged on the marked picture; analyzing the marking pictures to obtain the defect type marked by each marking frame and a defect probability value corresponding to the defect type; and adding the defect type corresponding to the defect probability value into a product defect set of the target industrial product under the condition that the defect probability value is larger than a preset defect threshold value.

Further, before inputting the picture to be detected to the preset detection model and outputting the marked picture, the method further comprises: constructing a lightweight module, wherein the lightweight module comprises: a plurality of preset modules; constructing a light backbone network based on the light module; constructing an initial detection model based on the lightweight backbone network; and training the initial detection model by adopting a historical data set to obtain a preset detection model.

Further, the step of constructing a lightweight module includes: constructing a first channel branch structure and a second channel branch structure, wherein the first channel branch structure comprises: two preset convolutions and a depth separable convolution, the second channel branch structure comprising: two of the preset convolutions and two of the depth separable convolutions; the first channel branch structure is used for processing the received feature map and outputting a first initial feature representation, and the second channel branch structure is used for processing the received feature map and outputting a second initial feature representation; based on the first channel branch structure and the second channel branch structure, the preset module is constructed, wherein the preset module at least comprises: a separation module and a channel combination module; the separation module is used for equally dividing the channel number of the received feature images to obtain a first channel feature image and a second channel feature image, transmitting the first channel feature image to the first channel branch structure and transmitting the second channel feature image to the second channel branch structure; the channel merging module is used for merging the channels of the first initial characteristic representation and the second initial characteristic representation to obtain a merged characteristic representation; and constructing the light module based on the preset module, wherein the light module is used for outputting a preset feature representation, and the preset feature representation is the combined feature representation output by the last preset module.

Further, based on the lightweight module, the step of constructing a lightweight backbone network includes: constructing a convolution pooling structure and a preset normalization structure, and constructing a preset operation structure based on the preset normalization structure; determining a preset number of the light modules; and constructing the light backbone network based on the convolution pooling structure, the preset number of light modules and the preset operation structure, wherein the light backbone network is used for extracting abstract feature representations of the pictures to be detected.

Further, based on the lightweight backbone network, the step of constructing an initial detection model includes: constructing a feature fusion network, wherein the feature fusion network at least comprises: the device comprises an up-sampling module and a channel merging module, wherein the up-sampling module is used for amplifying a received reduced feature map to obtain an amplified feature map; the channel merging module is used for merging the channels of the enlarged feature map and the feature map indicated by the abstract feature representation, the reduced feature map is a feature map obtained by processing the feature map indicated by the abstract feature representation through a preset normalization structure, and the feature fusion network is used for outputting a target feature map; constructing a classification network, wherein the classification network is used for classifying the target feature images and outputting detection results with different sizes, and the detection results comprise: the marking frame marking the input picture and the defect information marked for each marking frame, wherein the defect information comprises: the defect type and the defect probability value; and constructing the initial detection model based on the lightweight backbone network, the feature fusion network and the classification network.

Further, training the initial detection model by using a historical data set to obtain a preset detection model, which comprises the following steps: collecting the historical data set of the product to be detected, wherein the historical data set comprises: a plurality of product pictures and the defect types corresponding to the product pictures; dividing the historical data set into a training set, a verification set and a test set; training the initial detection model by adopting the training set to obtain the trained initial detection model; adopting the verification set to verify the initial detection model after training to obtain a verification result; and under the condition that the verification result indicates that verification is passed, testing the trained initial detection model by adopting the test set to obtain a test result, wherein the test result comprises the following steps: presetting an index value; and under the condition that the preset index value belongs to a preset index threshold range, determining that training of the initial detection model is completed, and obtaining the preset detection model.

Further, the step of inputting the picture to be detected to the preset detection model and outputting a marked picture includes: determining a target detection result corresponding to the target size from detection results of different sizes output by a classification network by adopting the preset detection model based on the size information of the target industrial product; and generating the marked picture based on the target detection result.

According to another aspect of the embodiment of the present invention, there is also provided a defect detecting device for an industrial product, including: the acquisition unit is used for acquiring a picture to be detected of the target industrial product; the output unit is used for inputting the picture to be detected into a preset detection model and outputting a marked picture, wherein the preset detection model is a detection model constructed based on a lightweight backbone network, and the marked picture is provided with a plurality of marked frames; the analyzing unit is used for analyzing the marking pictures to obtain the defect type marked by each marking frame and the defect probability value corresponding to the defect type; and the adding unit is used for adding the defect type corresponding to the defect probability value into the product defect set of the target industrial product under the condition that the defect probability value is larger than a preset defect threshold value.

Further, the defect detecting device further includes: the first construction module is used for constructing a light module before inputting the picture to be detected into the preset detection model and outputting the marked picture, wherein the light module comprises: a plurality of preset modules; the second construction module is used for constructing a light backbone network based on the light module; the third construction module is used for constructing an initial detection model based on the lightweight backbone network; the first training module is used for training the initial detection model by adopting a historical data set to obtain a preset detection model.

Further, the first building block includes: the first construction submodule is used for constructing a first channel branch structure and a second channel branch structure, wherein the first channel branch structure comprises: two preset convolutions and a depth separable convolution, the second channel branch structure comprising: two of the preset convolutions and two of the depth separable convolutions; the first channel branch structure is used for processing the received feature map and outputting a first initial feature representation, and the second channel branch structure is used for processing the received feature map and outputting a second initial feature representation; the second constructing submodule is configured to construct the preset module based on the first channel branch structure and the second channel branch structure, where the preset module at least includes: a separation module and a channel combination module; the separation module is used for equally dividing the channel number of the received feature images to obtain a first channel feature image and a second channel feature image, transmitting the first channel feature image to the first channel branch structure and transmitting the second channel feature image to the second channel branch structure; the channel merging module is used for merging the channels of the first initial characteristic representation and the second initial characteristic representation to obtain a merged characteristic representation; and the third construction submodule is used for constructing the light weight module based on the preset module, wherein the light weight module is used for outputting a preset characteristic representation, and the preset characteristic representation is the combined characteristic representation output by the last preset module.

Further, the second building block includes: a fourth construction submodule, configured to construct a convolution pooling structure and a preset normalization structure, and construct a preset operation structure based on the preset normalization structure; a first determining sub-module for determining a preset number of the lightweight modules; and a fifth building sub-module, configured to build the light-weight backbone network based on the convolution pooling structure, the preset number of light-weight modules, and the preset operation structure, where the light-weight backbone network is used to extract an abstract feature representation of the picture to be detected.

Further, the third building block includes: a sixth construction submodule, configured to construct a feature fusion network, where the feature fusion network at least includes: the device comprises an up-sampling module and a channel merging module, wherein the up-sampling module is used for amplifying a received reduced feature map to obtain an amplified feature map; the channel merging module is used for merging the channels of the enlarged feature map and the feature map indicated by the abstract feature representation, the reduced feature map is a feature map obtained by processing the feature map indicated by the abstract feature representation through a preset normalization structure, and the feature fusion network is used for outputting a target feature map; a seventh construction submodule, configured to construct a classification network, where the classification network is configured to classify the target feature map, output detection results with different sizes, and the detection results include: the marking frame marking the input picture and the defect information marked for each marking frame, wherein the defect information comprises: the defect type and the defect probability value; and an eighth construction submodule, configured to construct the initial detection model based on the lightweight backbone network, the feature fusion network and the classification network.

Further, the first training module includes: the first collection submodule is used for collecting the historical data set of the product to be detected, wherein the historical data set comprises: a plurality of product pictures and the defect types corresponding to the product pictures; the first dividing sub-module is used for dividing the historical data set into a training set, a verification set and a test set; the first training submodule is used for training the initial detection model by adopting the training set to obtain the trained initial detection model; the first verification sub-module is used for verifying the initial detection model after training by adopting the verification set to obtain a verification result; the first testing sub-module is configured to test the trained initial detection model by using the test set to obtain a test result when the verification result indicates that verification is passed, where the test result includes: presetting an index value; and the second determining submodule is used for determining that the initial detection model is trained under the condition that the preset index value belongs to the preset index threshold range, and obtaining the preset detection model.

Further, the output unit includes: the first determining module is used for determining a target detection result corresponding to the target size from detection results of different sizes output by the classification network by adopting the preset detection model based on the size information of the target industrial product; the first generation module is used for generating the mark picture based on the target detection result.

According to another aspect of the embodiment of the present invention, there is also provided a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer readable storage medium is controlled to execute the defect detection method of any one of the industrial products described above.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the defect detection method of any of the above-mentioned industrial products.

In the method, a picture to be detected of a target industrial product is obtained, the picture to be detected is input into a preset detection model, a marking picture is output, the marking picture is analyzed, a defect type marked by each marking frame and a defect probability value corresponding to the defect type are obtained, and the defect type corresponding to the defect probability value is added into a product defect set of the target industrial product under the condition that the defect probability value is larger than a preset defect threshold.

In the invention, the picture to be detected of the industrial product can be input into the preset detection model constructed based on the lightweight backbone network, if the defect probability value of a certain marking frame exists on the output marking picture, the defect type of the marking frame is added into the product defect set of the industrial product, and related personnel are reminded of the possible defect type shown in the product defect set of the industrial product in time, and the preset detection model constructed by the lightweight backbone network can be deployed in terminal equipment with limited equipment resources, so that real-time defect detection of the industrial product can be realized, and the technical problem that the defect detection of the industrial product cannot be carried out under limited equipment resources in the related art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a flow chart of an alternative method of defect detection for an industrial product according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an alternative product testing process based on a pre-set testing model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an alternative Block module according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an alternative Stage architecture according to an embodiment of the invention;

FIG. 5 is a schematic diagram of an alternative backbone network architecture according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an alternative detection model structure according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an alternative printed circuit board according to an embodiment of the invention;

FIG. 8 is a schematic diagram of an alternative industrial product defect detection device according to an embodiment of the present invention;

Fig. 9 is a block diagram of a hardware configuration of an electronic device (or mobile device) for a defect detection method of an industrial product according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

To facilitate an understanding of the invention by those skilled in the art, some terms or nouns involved in the various embodiments of the invention are explained below:

And (3) light weight: the method refers to the process of simplifying, converting and reducing the implementation model in the aspects of geometric entity, carrying information, constructing logic and the like. The purpose of light weight is to realize quick on-line transmission, reduce the resource consumption of computer, mobile device, satisfy the requirement such as information is harmless, model precision, service function simultaneously.

And (3) target detection: object detection is a computer technology that refers to identifying and locating a specific object of interest in an image or video.

Backbone network: refers to a network for extracting features that automatically learns feature representations from input data and uses the features for subsequent tasks such as classification, regression or segmentation.

Conv (convolution): a special linear operation is generally used to process data having a similar grid structure, such as time series data and image data.

It should be noted that, the relevant information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present invention are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the relevant data need to comply with relevant laws and regulations and standards of the relevant area, and are provided with corresponding operation entries for the user to select authorization or rejection.

Aiming at the problems existing in the edge deployment of the visual algorithm based on deep learning (namely, the complexity of an original model is large, the reasoning time is long and the real-time requirement is not met, model compression and reasoning acceleration are mainly adopted to solve the problems in the related art, wherein,

Model compression is to reduce the size of the model by eliminating redundant parameters and complexity in the original model, thereby reducing the demands on storage, communication bandwidth and computational resources, and can be divided into five types, namely quantization, branch reduction, knowledge distillation and network structure search, which are specifically as follows: (1) quantification: the parameters of the model are stored and calculated in the calculation by 32-bit floating point numbers, and the size of the whole model can be reduced by quantizing the parameters of the model into 8 bits, 4 bits or even binarization by reducing the number of bits occupied by the parameters; (2) model pruning: the size and the calculation cost of the model are reduced by cutting off some unimportant parameters in the neural network, and the model can be divided into weight subtraction, convolution kernel subtraction and the like; (3) knowledge distillation: by constructing a large "teacher" model, a "student" model is used to learn the "teacher" model continuously and use the knowledge of "teacher" to achieve similar or higher accuracy. By constantly learning, the "student" model will achieve a similar accuracy as the "teacher" model and replace the "teacher" model; (4) network search: based on a certain search optimization algorithm, an optimal model structure meeting the requirements is found in a specific space, and finally the requirement of reducing the complexity of the model is met.

The reasoning acceleration refers to optimizing the performance of the operation on a specific equipment platform, improving the calculation efficiency of an AI (ARTIFICIAL INTELLIGENCE, i.e. artificial intelligence) model and reducing the reasoning delay. According to the difference of the optimization methods, the optimization method is divided into operator optimization, graph optimization and the like, and specifically comprises the following steps: (1) The operator optimization refers to optimizing the calculation logic of a specific calculation unit in the neural network, such as optimizing convolution operation, wherein the optimized convolution has less multiplication operation amount, and the calculation amount of a model is reduced; (2) The graph optimization refers to computation logic optimization among a plurality of operators, and the computation efficiency of a model is remarkably improved by eliminating or fusing operators in an original algorithm with execution logic of a plurality of conventional computation optimization computation graphs.

However, although the above solution can reduce the parameters and size of the model or increase the speed of reasoning, the adopted detection model structure is still complex, and still cannot meet the product detection requirements of some real-time scenes.

Therefore, the invention provides a light-weight production defect real-time detection method based on deep learning, which is used for detecting the defects of products in real time in industrial production. On the industrial production line, the invention can realize the omnibearing scanning of the product and the detection of the defect type, namely, the defect of the product can be scanned and detected and the defect type can be displayed in real time when the product is manufactured, thereby ensuring that the quality detection can be carried out when the product is manufactured, and providing reliable support for related business and safety.

The invention can optimize the neural network structure by adopting a model light-weight strategy from the standpoint of limited hardware equipment, limited calculation resources and high real-time requirements in the industrial field, reduces the complexity of the network and the parameter scale and calculation burden of the whole algorithm on the premise of ensuring the detection precision, so that the system can realize real-time product detection under the hardware condition of limited resources, and simultaneously enhances the robustness of machine vision.

The present invention will be described in detail with reference to the following examples.

Example 1

According to an embodiment of the present invention, there is provided an embodiment of a defect detection method of an industrial product, it should be noted that the steps shown in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases the steps shown or described may be performed in an order different from that herein.

FIG. 1 is a flow chart of an alternative method of defect detection for an industrial product, as shown in FIG. 1, according to an embodiment of the present invention, the method comprising the steps of:

Step S101, obtaining a picture to be detected of a target industrial product.

Step S102, inputting a picture to be detected into a preset detection model, and outputting a marked picture, wherein the preset detection model is a detection model constructed based on a lightweight backbone network, and the marked picture is provided with a plurality of marked frames.

And step S103, analyzing the marking pictures to obtain the defect type marked by each marking frame and the defect probability value corresponding to the defect type.

Step S104, adding the defect type corresponding to the defect probability value into the product defect set of the target industrial product under the condition that the defect probability value is larger than the preset defect threshold value.

Through the steps, the picture to be detected of the target industrial product can be obtained, the picture to be detected is input into a preset detection model, a marking picture is output, the marking picture is analyzed, the defect type marked by each marking frame and the defect probability value corresponding to the defect type are obtained, and the defect type corresponding to the defect probability value is added into a product defect set of the target industrial product under the condition that the defect probability value is larger than a preset defect threshold. In the embodiment of the invention, the picture to be detected of the industrial product can be input into the preset detection model constructed based on the lightweight backbone network, if the defect probability value of a certain marking frame exists on the output marking picture and is larger than the preset defect threshold value, the defect type of the marking frame is added into the product defect set of the industrial product, and related personnel are timely reminded that the industrial product possibly has the defect type shown in the product defect set, and the preset detection model constructed through the lightweight backbone network can be deployed in terminal equipment with limited equipment resources, so that real-time defect detection of the industrial product can be realized, and the technical problem that the defect detection of the industrial product cannot be carried out under limited equipment resources in the related technology is solved.

Embodiments of the present invention will be described in detail with reference to the following steps.

In an embodiment of the present invention, a defect detection algorithm is provided, the algorithm including: the design of the Stage structure (namely the lightweight module) and the design of the defect detection network structure (namely the preset detection model) can be applied to the scene of limited hardware equipment resources and computing resources, are applicable to the environment of limited resources and computing resources, and can be easily deployed on terminal equipment.

An optional embodiment, before inputting the picture to be detected into the preset detection model and outputting the marked picture, further comprises: constructing a lightweight module, wherein the lightweight module comprises: a plurality of preset modules; based on the light weight module, constructing a light weight backbone network; constructing an initial detection model based on a lightweight backbone network; training the initial detection model by adopting a historical data set to obtain a preset detection model.

In the embodiment of the invention, when the lightweight network design is performed, the following four lightweight network principles are required to be followed: (1) When the input and output channels are the same, the memory access amount MAC (Memory Access Control) is minimum; (2) an excessive number of packets may result in an increase in MAC; (3) the fragmentation operation does not facilitate parallel acceleration; (4) element-by-element operations increase memory consumption. Therefore, the four lightweight network principles described above will also be strictly followed in designing the defect detection algorithm in the embodiments of the present invention.

In the embodiment of the invention, the design of the Block module (namely the preset module) can be carried out according to four light-weight network principles, and then the Stage structure is constructed based on the Block module (namely the light-weight module is constructed and comprises a plurality of preset modules), so that the parameter quantity and the calculation burden can be obviously reduced while the image characteristics are extracted efficiently, and the complexity of the model is reduced. Then, a backbone network (i.e., a lightweight backbone network) may be designed, and the backbone network may be composed of a plurality of stages (based on a lightweight module, a lightweight backbone network is constructed) to extract image features, and then based on the lightweight backbone network, an initial detection model is constructed.

In the embodiment of the invention, the initial detection model can be trained by adopting the historical data set to obtain a trained preset detection model (such as DETECTNET model), and then classification and candidate frame regression can be performed by adopting the preset detection model, so that the detection and the positioning of the defects are realized.

Fig. 2 is a schematic diagram of an alternative product detection flow based on a preset detection model according to an embodiment of the present invention, as shown in fig. 2, an input picture is preprocessed, then feature extraction is performed on the preprocessed picture through a lightweight backbone feature extraction network (i.e., a lightweight backbone network), and then classification of objects and regression of candidate frames are performed through the preset detection model, so as to obtain a detection result, that is, an object positioned on the picture through the candidate frames, and the object is detected as bus (car), and the probability is 0.77.

Optionally, the step of constructing the lightweight module includes: constructing a first channel branch structure and a second channel branch structure, wherein the first channel branch structure comprises: two preset convolutions and one depth separable convolution, the second channel branch structure comprising: two preset convolutions and two depth separable convolutions; the first channel branch structure is used for processing the received feature map and outputting a first initial feature representation, and the second channel branch structure is used for processing the received feature map and outputting a second initial feature representation; based on the first channel branch structure and the second channel branch structure, a preset module is constructed, wherein the preset module at least comprises: a separation module and a channel combination module; the separation module is used for dividing the channel number of the received feature images evenly to obtain a first channel feature image and a second channel feature image, transmitting the first channel feature image to the first channel branch structure and transmitting the second channel feature image to the second channel branch structure; the channel merging module is used for merging the channels of the first initial characteristic representation and the second initial characteristic representation to obtain a merged characteristic representation; based on a preset module, a lightweight module is constructed, wherein the lightweight module is used for outputting a preset feature representation, and the preset feature representation is a combined feature representation output by the last preset module.

In the embodiment of the invention, the Stage structure is designed by combining the idea of avoiding gradient disappearance and four lightweight network principles, so that the calculation and parameter burden can be reduced, and the scale of the whole model is effectively reduced, specifically: a first channel branch structure (the first channel branch structure being capable of processing the received feature map and outputting a first initial feature representation) and a second channel branch structure (the second channel branch structure being capable of processing the received feature map and outputting a second initial feature representation) may be constructed, the first channel branch structure may include: two predetermined convolutions (e.g., 1 x1 conv) and one depth separable convolution (e.g., 3 x 3 dwconv), e.g., 1 x 1conv is connected to 3 x 3dwconv and finally 1 x 1conv is connected; the second channel branching structure may include: two preset convolutions and two depth separable convolutions, e.g. 1 x 1conv connected 3 x 3dwconv, then connected to another 3 x 3dwconv, and finally connected to 1 x 1conv. Then, based on the first channel branch structure and the second channel branch structure, a preset module is constructed, and the preset module may include: convolutional layers (i.e., convLayer), split modules (i.e., split modules), channel merge modules (i.e., concat modules), etc., e.g., convLayer connects Split modules, then connects the first channel branch structure with the second channel branch structure in parallel, then connects Concat modules in unison, and finally connects ConvLayer. The separation module can divide the channel number of the received feature images evenly to obtain a first channel feature image and a second channel feature image, and then the first channel feature image is transmitted to the first channel branch structure and the second channel feature image is transmitted to the second channel branch structure; the channel merge module is capable of merging channels of the first initial feature representation and the second initial feature representation to obtain a merged feature representation. Then, based on the preset module, a lightweight module is constructed, and the lightweight module can output a preset feature representation (the preset feature representation is a combined feature representation output by the last preset module) (i.e. the Stage structure can be composed of a plurality of Block modules and is used for extracting features of an input picture).

The Block module is a minimum unit of the Stage architecture, and can divide input data into two paths according to channel average (i.e. input to the first channel branch structure and the second channel branch structure) and each channel branch structure adopts 1*1 convolution to ensure the consistency of channel numbers before and after operation, and finally performs Concat operation (i.e. channel number merging operation) on the results of the two paths, thereby avoiding element-by-element addition and reducing the consumption of memory.

In the embodiment, 1*1 convolution is adopted in the Block module, so that the same number of input and output channels can be ensured, the memory access amount MAC is minimized, deep separable convolution is adopted in the Block module, the excessive number of packets can be avoided, the memory access amount MAC can be reduced, in addition, only two branches are constructed in the Block module, the gradient disappearance problem can be avoided, the fragmentation of a network is also reduced, and Concat operation is adopted for data processed by the two branches, element-by-element addition operation can be avoided, and the consumption of a memory is further reduced. Therefore, the lightweight design of the Block module in this embodiment can reduce the number of parameters and the calculation amount of the model.

FIG. 3 is a schematic diagram of an alternative Block module according to an embodiment of the present invention, where, as shown in FIG. 3, the Block module may be configured to: convLayer connect Split module, then Split module parallel connection first passageway branch structure and second passageway branch structure, again unified connection Concat module, connect ConvLayer at last, wherein, first passageway branch structure can design as: 1 x 1conv connected 3 x 3dwconv, and finally connected 1 x 1conv; the second channel branching structure may be designed as: 1 x 1conv was connected to 3 x 3dwconv, one more 3 x 3dwconv, and finally to 1 x 1conv.

In the embodiment of the invention, in the Stage structure, each Block module performs nonlinear mapping, extracts local features from input data, learns spatial and temporal information (such as spatial position relation among pixels, corresponding position relation of pixels on a plurality of channels and the like) of the input data, and after the local features are subjected to feature fusion, higher-level feature representation (namely preset feature representation) can be formed, so that valuable feature input is provided for subsequent tasks such as image recognition, classification and the like.

Fig. 4 is a schematic diagram of an alternative Stage structure according to an embodiment of the present invention, as shown in fig. 4, where the Stage structure is composed of a plurality of Block modules (for example, block1, block2, block3, …, block n, etc.) to perform feature extraction on an input picture. The structure of each Block module is as follows: convLayer connect Split module, then Split module parallel connection branch 1 and branch 2, again unified connection Concat module, convLayer of last connection next Block module, wherein, the structure of branch 1 is: 1x 1conv is connected with 3 x 3dwconv, one 3 x 3dwconv is connected, and finally 1x 1conv is connected; the structure of branch 2 is: 1x 1conv was connected to 3 x 3dwconv and finally to 1x 1conv.

Optionally, the step of constructing a lightweight backbone network based on the lightweight module includes: constructing a convolution pooling structure and a preset normalization structure, and constructing a preset operation structure based on the preset normalization structure; determining the preset number of the light modules; based on the convolution pooling structure, a preset number of light-weight modules and a preset operation structure, a light-weight backbone network is constructed, wherein the light-weight backbone network is used for extracting abstract feature representation of pictures to be detected.

In the embodiment of the invention, the lightweight real-time detection network (i.e. the preset detection model, such as DETECTNET model) for the production defects based on deep learning can be divided into three parts: backhaul (i.e., lightweight Backbone network), neck (i.e., feature fusion network), head (i.e., classification network). The light backbone network is responsible for extracting the characteristics of an input image; the Neck network further integrates the extracted features, namely, performs feature fusion; and the Head network executes a defect detection task according to the fused characteristics.

In the embodiment of the invention, the lightweight backbone network can be composed of a plurality of Stage structures, specifically: a convolution pooling structure (conv_ maxpool) and a preset normalization structure (CBL), which may consist of Conv layer (convolution layer), BN layer (batch normalization layer) and Relu (RectifiedLinearUnit, i.e. modified linear unit) layer (activation function layer), may be constructed first, and a preset operation structure (SPP, which may consist of two CBL structures, three parallel maxpool (pooling layer) and Concat layers) is constructed based on the preset normalization structure. Then, a preset number of light weight modules can be determined, and then a light weight backbone network is constructed based on the convolution pooling structure, the preset number of light weight modules and the preset operation structure, wherein the light weight backbone network can extract abstract feature representations of pictures to be detected.

In the embodiment of the invention, the design of the Stage can reduce the calculation burden and the parameter quantity of the whole network, thereby effectively reducing the scale of the whole network model, and the backbone network occupies a considerable proportion in the whole network, so that the backbone network is obtained by combining a plurality of stages, thereby being beneficial to reducing the calculation resource and the storage requirement, improving the operation efficiency of the network and simultaneously keeping higher performance and accuracy. And by optimizing the Stage design, the whole network can effectively reduce the scale of the backbone network while maintaining good performance, and lays a foundation for downstream tasks.

Table 1 is a table of the structure of an alternative backbone network, as shown in table 1:

TABLE 1

Where K represents the convolution kernel size.

Fig. 5 is a schematic diagram of an alternative backbone network architecture according to an embodiment of the present invention, as shown in fig. 5, the backbone network architecture may be designed to: conv_ maxpool connects multiple Stage structures (e.g., stage2, stage3, stage4, where each Stage structure may be composed of multiple Block modules, such as Stage2 is composed of Block_x1 through Block_x7, stage3 is composed of Block_x1 through Block_x7, stage4 is composed of Block_x1 through Block_x3, block_xn (i.e., block_x1 through Block_xn) represents multiple Block modules), then connects SPP structures, which may be designed to: the CBL structure is connected with three parallel maxpool layers, then is connected with Concat layers, and finally is connected with a CBL structure, and the CBL structure can be designed as follows: the Conv layer is connected with the BN layer and then connected with the Relu layer.

Optionally, the step of constructing an initial detection model based on the lightweight backbone network includes: constructing a feature fusion network, wherein the feature fusion network at least comprises: the up-sampling module is used for amplifying the received reduced feature map to obtain an amplified feature map; the channel merging module is used for merging the channels of the enlarged feature map and the feature map indicated by the abstract feature representation, and the reduced feature map is a feature map obtained by processing the feature map indicated by the abstract feature representation through a preset normalization structure, and the feature fusion network is used for outputting a target feature map; constructing a classification network, wherein the classification network is used for classifying the target feature images and outputting detection results with different sizes, and the detection results comprise: marking frames marked on the input picture and defect information marked on each marking frame, wherein the defect information comprises: defect type and defect probability value; an initial detection model is constructed based on the lightweight backbone network, the feature fusion network and the classification network.

In the embodiment of the present invention, a feature fusion network (i.e. Neck network) may be constructed first, where the feature fusion network may include: the up-sampling module can amplify the received reduced feature map (the reduced feature map is a feature map obtained by processing the feature map indicated by the abstract feature representation through a preset normalization structure) so as to obtain an amplified feature map (i.e. a restored feature map), and the channel merging module can merge the amplified feature map and the channel of the feature map indicated by the abstract feature representation. The feature fusion network can output the target feature map processed by the up-sampling module and the channel merging module.

Illustratively, in DETECTNET networks, neck networks consist essentially of up-sampling and Concat. The up-sampling is used for amplifying the feature diagram extracted by the backbone network so as to better adapt to the requirements of target detection tasks; concat combine the feature maps of different levels to provide a richer feature representation. In Neck networks, firstly, the feature images extracted by the backbone network are amplified layer by layer through an up-sampling operation, then, the feature images of different levels and the up-sampled images are combined through Concat operation to form a higher-level feature representation, and the fusion operation is beneficial to extracting more feature information and improving the accuracy and the robustness of target detection. The Neck network realizes the amplification and fusion of the characteristic map through up-sampling and Concat operation, provides a richer and more effective characteristic representation for the DETECTNET network, and is beneficial to improving the performance and accuracy of target detection.

In the embodiment of the invention, the fusion method for realizing the characteristics comprises an attention mechanism, element-by-element addition, concat and the like. Although global information can be fused by using the attention network, model parameters and calculation are huge, and are not suitable for a scene with limited calculation resources, and element-by-element addition violates the light weight principle, so Concat is adopted for feature fusion in the embodiment.

In the embodiment of the present invention, a classification network is further required to be constructed, where the classification network is used to classify the target feature map, so as to output detection results with different sizes, where the detection results may include: a mark frame marked on the input picture and defect information (the defect information includes a defect type and a defect probability value) marked on each mark frame. Then, an initial detection model is constructed based on the lightweight backbone network, the feature fusion network and the classification network.

Illustratively, the Head network may perform a classification task on the Neck processed feature maps, generating a plurality of differently sized (e.g., 3 differently sized) detection results. And transmitting the Neck fused feature images to a Head network, classifying the feature images, and finally generating detection results with different sizes. The detection results with multiple sizes can be better suitable for detection requirements of objects with different sizes, and the accuracy and the robustness of target detection can be improved.

In the embodiment of the invention, in DETECTNET networks, the extracted features of the backbone network are up-sampled for multiple times and fused with the previous features, so that richer image feature information can be acquired, and finally, the network generates a plurality of detection results with different sizes, wherein the smaller size has a larger receptive field and is suitable for detecting larger objects, and the larger size has a smaller receptive field and is suitable for detecting smaller objects, so that the network can effectively detect objects with various sizes in the original image, and the detection precision of the algorithm is improved.

Fig. 6 is a schematic diagram of an alternative detection model structure according to an embodiment of the present invention, as shown in fig. 6, after an input picture is processed by conv_ maxpool, the input picture is transmitted to multiple Stage structures (e.g., stage2, stage3, stage 4) connected to conv_ maxpool for processing, and then transmitted to an SPP structure connected to Stage4 for processing, and then processed by a first CBL structure, and then processed by a first upsampling module, and then the feature map transmitted by the SPP structure and the feature map transmitted by the upsampling module are combined by a first Concat module, and then processed by a second CBL structure and a second upsampling module, and then the feature map transmitted by Stage3 and the feature map transmitted by the second upsampling module are combined by a second Concat module, and then processed by a C3 layer, and then the feature map transmitted by the Conv layer is processed, so as to obtain a maximum size detection result; the feature map transmitted by the second CBL structure and the feature map transmitted by the third CBL structure connected to the second Concat module can be combined through the third Concat module, the combined feature map is processed through the C3 module, and then the detection result of the middle size is obtained after the Conv layer processing; the feature map transmitted by the first CBL structure and the feature map transmitted by the fourth CBL structure connected to the third Concat module can be combined by the fourth Concat module, the combined feature map is processed by C3, and then the detection result with the minimum size is obtained after the Conv layer processing. Wherein, the C3 structure can be designed as: the input pictures are respectively input into two branches, one branch can be connected with the other CBL structure for the CBL structure, and then connected with the Conv layer; the other branch may be the Conv layer, which is then merged by Concat modules and then processed by CBL architecture.

Optionally, training the initial detection model by using the historical data set to obtain a preset detection model, including: collecting a historical data set of a product to be detected, wherein the historical data set comprises: a plurality of product pictures and defect types corresponding to the product pictures; dividing the historical data set into a training set, a verification set and a test set; training an initial detection model by adopting a training set to obtain a trained initial detection model; adopting a verification set to verify the trained initial detection model to obtain a verification result; under the condition that the verification result indicates that verification is passed, testing the trained initial detection model by adopting a test set to obtain a test result, wherein the test result comprises the following steps: presetting an index value; under the condition that the preset index value belongs to the preset index threshold range, training of the initial detection model is determined to be completed, and a preset detection model is obtained.

In the embodiment of the invention, the initial detection model can be trained by adopting the historical data set to obtain a trained preset detection model, which is specifically as follows: a historical data set of a product to be inspected (e.g., a Printed Circuit Board (PCB)) may be first collected, which may include: multiple product pictures and defect types (e.g., missing holes, rat bites, open circuits, short circuits, strays, dummy copper, etc.) to which the product pictures correspond. The historical data set may then be divided into a training set, a validation set, and a test set according to a predetermined ratio (e.g., 0.8:0.1:0.1), and then the training set is used to train the initial detection model until the model converges, thereby obtaining a trained initial detection model. And may employ a verification set to verify the trained initial detection model (e.g., verify whether the predicted defect type of a certain verification picture is consistent with the true defect type of the verification picture), if the verification result indicates that verification is passed (i.e., the predicted defect types of all verification pictures are consistent with the corresponding true defect types), the trained initial detection model may be tested using a test set to obtain a test result (the test result may include a preset index value, for example, a value of a confusion matrix index, a value of an F1 index (i.e., a balance index between accuracy and recall of an evaluation model), a value of a PR index (i.e., a performance index of the evaluation model under different thresholds), etc., and if all the preset index values belong to the corresponding preset index threshold ranges, it may be determined that training the initial detection model is completed to obtain the preset detection model.

Step S101, obtaining a picture to be detected of a target industrial product.

In the embodiment of the invention, the pictures to be detected of the target industrial products (such as printed circuit boards and other products) to be detected on the production line can be obtained through shooting equipment (such as video cameras, scanning equipment and the like).

Optionally, the step of inputting the picture to be detected into a preset detection model and outputting the marked picture includes: determining a target detection result corresponding to the target size from detection results of different sizes output by a classification network by adopting a preset detection model based on the size information of the target industrial product; and generating a marked picture based on the target detection result.

In the embodiment of the invention, the acquired picture to be detected can be input into a trained preset detection model (the preset detection model is a detection model constructed based on a lightweight backbone network) to obtain a marked picture (a plurality of marked frames are arranged on the marked picture), the preset detection model can also be used for determining a target detection result (for example, a small size detection result is selected if the size of a product is larger and a large size detection result is selected if the size of the product is smaller) corresponding to the target size from detection results of different sizes (for example, a large size detection result is selected if the size of the product is smaller) output by a classification network by adopting the preset detection model according to size information of a target industrial product (namely, the size of the product), and then the marked picture is generated according to the selected target detection result.

In the embodiment of the invention, the marking picture can be analyzed to obtain the defect type marked by each marking frame and the defect probability value corresponding to the defect type.

In the embodiment of the invention, if the defect probability value is larger than the preset defect threshold value, the product can be determined to have the defect with high probability, and the defect type corresponding to the defect probability value can be added into the product defect set of the target industrial product, so that relevant personnel can be informed of confirming and adjusting the defect type included in the product defect set in time.

The following detailed description is directed to alternative embodiments.

In the embodiment of the invention, the lightweight production defect real-time detection method based on deep learning can be divided into two modules: one (1) is Stage structure. In an industrial environment, efficient image processing is generally required under the condition of limited computing and storage resources, and parameters and computing complexity of a network can be reduced through a designed Stage structure, so that efficiency and practicability of the whole system are improved: (2) another is DETECTNET network architecture. In the module, a backbone network is formed by means of Stage designed by the first module, the characteristics of the image are extracted, then Neck networks are used for fusing the extracted characteristics, and finally a Head network is used for detecting various defects possibly existing in the product, wherein the task of detecting the defects comprises deep analysis of the image, and any abnormal characteristics or modes are found. Defects may encompass multiple types, such as surface cracks, color anomalies, dimensional failure, and the like. The two tightly combined modules enable the industrial defect detection method based on machine vision to provide quality guarantee for industrial production, and improve efficiency.

In the embodiment of the invention, a Stage structure based on a lightweight network architecture is provided, and the Stage structure is designed to follow four principles of the lightweight network, so that a new Stage structure is constructed, and the parameter number and the calculation amount of the whole network are effectively reduced. And an object detection algorithm based on a lightweight network is also provided, and an image is processed by using a lightweight Stage structure and introducing up-sampling and Concat operations, so that the lightweight but efficient object detection algorithm is finally realized, is suitable for various practical application scenes, and provides a reliable solution for tasks such as industrial automation and object recognition.

Illustratively, a Printed Circuit Board (PCB) defect dataset was collected, containing 693 images and 6 defects (missing holes, mouse bite, open circuit, short circuit, spur, pseudo copper), and the dataset was randomly partitioned by scale, e.g., 560 training set pictures, 63 validation set pictures, and 70 test set pictures.

Fig. 7 is a schematic view of an alternative printed circuit board according to an embodiment of the present invention, as shown in fig. 7, which is a printed circuit board including a certain defect.

In this embodiment, two models (namely, yolov5s. Pt model (i.e., a small model of YOLOv 5) and yolov5x. Pt model (i.e., a large model of YOLOv 5)) of the YOLOv algorithm (You Only Look Once version, i.e., a real-time target detection algorithm based on deep learning) may be tested in training using the collected printed circuit board defect dataset, while the DETECTNET model proposed in this embodiment is tested in training, resulting in the test results shown in table 2.

TABLE 2

As can be seen from table 2, the detection accuracy of DETECTNET algorithm is similar to that of YOLOv algorithm, and there is no great loss.

The sizes and the number of parameters of the whole model were also tested for DETECTNET algorithm and YOLOv algorithm, and the test results are shown in table 3:

TABLE 3 Table 3

	Layers	parameters	GFLOPS
				Yolov5	607	87278019	217.9
DetectNet	308	3806435	8.1

Wherein, layers represents the layer number of the network structure, parameters represents the parameter quantity of the network structure, and GFLOPS represents the floating point operation times.

As can be seen from Table 3, the number of layers of DETECTNET network structure is half of Yolov network structure, the parameter quantity is reduced by one order of magnitude, the floating point operation times are reduced by two orders of magnitude, the calculated quantity of the whole network is greatly reduced, and the purpose of light weight is realized.

Therefore, the DETECTNET network can greatly reduce the scale of the network under the condition that the precision is almost unchanged, and can be deployed under the condition that the hardware equipment resources are limited.

In the embodiment of the invention, the parameters and the calculation complexity of the network can be reduced through the designed Stage structure, so that the efficiency and the practicability of the whole system are improved, the backbone network formed by the Stage structure is used for extracting the characteristics of the image, then Neck networks are used for fusing the extracted characteristics, and finally the Head network is used for detecting various defects possibly existing in the product, so that the detection can be performed under the condition of limited hardware equipment resources, the quality guarantee can be provided for industrial production, and the detection efficiency is effectively improved.

The following describes in detail another embodiment.

Example two

The defect detecting device for industrial products provided in this embodiment includes a plurality of implementation units, each of which corresponds to each implementation step in the first embodiment.

FIG. 8 is a schematic diagram of an alternative defect detection apparatus for industrial products, according to an embodiment of the present invention, as shown in FIG. 8, the defect detection apparatus may include: an acquisition unit 80, an output unit 81, a parsing unit 82, an adding unit 83, wherein,

An acquisition unit 80 for acquiring a picture to be detected of a target industrial product;

The output unit 81 is configured to input a picture to be detected into a preset detection model, and output a marked picture, where the preset detection model is a detection model constructed based on a lightweight backbone network, and the marked picture has a plurality of marked frames;

The parsing unit 82 is configured to parse the marking pictures to obtain a defect type marked by each marking frame and a defect probability value corresponding to the defect type;

An adding unit 83, configured to add, in a case where the defect probability value is greater than the preset defect threshold, a defect type corresponding to the defect probability value to the product defect set of the target industrial product.

In the above defect detection device, the obtaining unit 80 may obtain the to-be-detected picture of the target industrial product, the output unit 81 may input the to-be-detected picture into the preset detection model, the output unit may output the mark picture, the analysis unit 82 may analyze the mark picture to obtain the defect type marked by each mark frame and the defect probability value corresponding to the defect type, and the adding unit 83 may add the defect type corresponding to the defect probability value to the product defect set of the target industrial product when the defect probability value is greater than the preset defect threshold. In the embodiment of the invention, the picture to be detected of the industrial product can be input into the preset detection model constructed based on the lightweight backbone network, if the defect probability value of a certain marking frame exists on the output marking picture and is larger than the preset defect threshold value, the defect type of the marking frame is added into the product defect set of the industrial product, and related personnel are timely reminded that the industrial product possibly has the defect type shown in the product defect set, and the preset detection model constructed through the lightweight backbone network can be deployed in terminal equipment with limited equipment resources, so that real-time defect detection of the industrial product can be realized, and the technical problem that the defect detection of the industrial product cannot be carried out under limited equipment resources in the related technology is solved.

Optionally, the defect detecting device further includes: the first construction module is used for constructing a light weight module before inputting a picture to be detected into a preset detection model and outputting a marked picture, wherein the light weight module comprises: a plurality of preset modules; the second construction module is used for constructing a lightweight backbone network based on the lightweight module; the third construction module is used for constructing an initial detection model based on the lightweight backbone network; the first training module is used for training the initial detection model by adopting the historical data set to obtain a preset detection model.

Optionally, the first building module includes: the first construction submodule is used for constructing a first channel branch structure and a second channel branch structure, wherein the first channel branch structure comprises: two preset convolutions and one depth separable convolution, the second channel branch structure comprising: two preset convolutions and two depth separable convolutions; the first channel branch structure is used for processing the received feature map and outputting a first initial feature representation, and the second channel branch structure is used for processing the received feature map and outputting a second initial feature representation; the second construction submodule is used for constructing a preset module based on the first channel branch structure and the second channel branch structure, wherein the preset module at least comprises: a separation module and a channel combination module; the separation module is used for dividing the channel number of the received feature images evenly to obtain a first channel feature image and a second channel feature image, transmitting the first channel feature image to the first channel branch structure and transmitting the second channel feature image to the second channel branch structure; the channel merging module is used for merging the channels of the first initial characteristic representation and the second initial characteristic representation to obtain a merged characteristic representation; and the third construction submodule is used for constructing a light module based on the preset module, wherein the light module is used for outputting a preset feature representation, and the preset feature representation is a combined feature representation output by the last preset module.

Optionally, the second building block comprises: a fourth construction submodule, configured to construct a convolution pooling structure and a preset normalization structure, and construct a preset operation structure based on the preset normalization structure; the first determining submodule is used for determining the preset number of the light weight modules; and a fifth construction submodule, configured to construct a light backbone network based on the convolution pooling structure, a preset number of light modules and a preset operation structure, where the light backbone network is used to extract abstract feature representations of the pictures to be detected.

Optionally, the third building block comprises: a sixth construction submodule, configured to construct a feature fusion network, where the feature fusion network at least includes: the up-sampling module is used for amplifying the received reduced feature map to obtain an amplified feature map; the channel merging module is used for merging the channels of the enlarged feature map and the feature map indicated by the abstract feature representation, and the reduced feature map is a feature map obtained by processing the feature map indicated by the abstract feature representation through a preset normalization structure, and the feature fusion network is used for outputting a target feature map; a seventh construction submodule, configured to construct a classification network, where the classification network is configured to classify the target feature map, output detection results with different sizes, and the detection results include: marking frames marked on the input picture and defect information marked on each marking frame, wherein the defect information comprises: defect type and defect probability value; and an eighth construction submodule for constructing an initial detection model based on the lightweight backbone network, the feature fusion network and the classification network.

Optionally, the first training module includes: the first collection submodule is used for collecting a historical data set of a product to be detected, wherein the historical data set comprises: a plurality of product pictures and defect types corresponding to the product pictures; the first dividing sub-module is used for dividing the historical data set into a training set, a verification set and a test set; the first training sub-module is used for training the initial detection model by adopting a training set to obtain a trained initial detection model; the first verification sub-module is used for verifying the trained initial detection model by adopting the verification set to obtain a verification result; the first testing sub-module is used for testing the trained initial detection model by adopting the testing set under the condition that the verification result indicates that the verification is passed, so as to obtain a testing result, wherein the testing result comprises the following steps: presetting an index value; and the second determining submodule is used for determining that the initial detection model is trained under the condition that the preset index value belongs to the preset index threshold range, and obtaining the preset detection model.

Optionally, the output unit includes: the first determining module is used for determining a target detection result corresponding to the target size from detection results of different sizes output by the classification network by adopting a preset detection model based on the size information of the target industrial product; the first generation module is used for generating a mark picture based on the target detection result.

The defect detecting device may further include a processor and a memory, wherein the acquiring unit 80, the output unit 81, the analyzing unit 82, the adding unit 83, and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.

The processor includes a kernel, and the kernel fetches a corresponding program unit from the memory. The kernel can be set with one or more than one, and the defect types corresponding to the defect probability values are added into the product defect set of the target industrial product by adjusting the kernel parameters under the condition that the defect probability values are larger than the preset defect threshold.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), which includes at least one memory chip.

The invention also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: obtaining a picture to be detected of a target industrial product, inputting the picture to be detected into a preset detection model, outputting a marking picture, analyzing the marking picture to obtain a defect type marked by each marking frame and a defect probability value corresponding to the defect type, and adding the defect type corresponding to the defect probability value into a product defect set of the target industrial product under the condition that the defect probability value is larger than a preset defect threshold.

According to another aspect of the embodiments of the present invention, there is also provided a computer readable storage medium, including a stored computer program, where the computer program is executed to control a device in which the computer readable storage medium is located to perform the above-described defect detection method for an industrial product.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for detecting defects of industrial products described above.

Fig. 9 is a block diagram of a hardware configuration of an electronic device (or mobile device) for a defect detection method of an industrial product according to an embodiment of the present invention. As shown in fig. 9, the electronic device may include one or more processors 902 (shown in fig. 9 as 902a, 902b, … …,902 n) (the processor 902 may include, but is not limited to, a microprocessor MCU, a programmable logic device FPGA, etc.) a memory 904 for storing data. In addition, the method may further include: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a keyboard, a power supply, and/or a camera. It will be appreciated by those skilled in the art that the configuration shown in fig. 9 is merely illustrative and is not intended to limit the configuration of the electronic device. For example, the electronic device may also include more or fewer components than shown in fig. 9, or have a different configuration than shown in fig. 9.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present invention, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A method for detecting defects in an industrial product, comprising:

Obtaining a picture to be detected of a target industrial product;

Inputting the picture to be detected into a preset detection model, and outputting a marked picture, wherein the preset detection model is a detection model constructed based on a lightweight backbone network, and a plurality of marked frames are arranged on the marked picture;

analyzing the marking pictures to obtain the defect type marked by each marking frame and a defect probability value corresponding to the defect type;

and adding the defect type corresponding to the defect probability value into a product defect set of the target industrial product under the condition that the defect probability value is larger than a preset defect threshold value.

2. The defect detection method according to claim 1, further comprising, before inputting the picture to be detected to the preset detection model and outputting a mark picture:

Constructing a lightweight module, wherein the lightweight module comprises: a plurality of preset modules;

constructing a light backbone network based on the light module;

constructing an initial detection model based on the lightweight backbone network;

and training the initial detection model by adopting a historical data set to obtain a preset detection model.

3. The defect detection method of claim 2, wherein the step of constructing a lightweight module comprises:

constructing a first channel branch structure and a second channel branch structure, wherein the first channel branch structure comprises: two preset convolutions and a depth separable convolution, the second channel branch structure comprising: two of the preset convolutions and two of the depth separable convolutions; the first channel branch structure is used for processing the received feature map and outputting a first initial feature representation, and the second channel branch structure is used for processing the received feature map and outputting a second initial feature representation;

Based on the first channel branch structure and the second channel branch structure, the preset module is constructed, wherein the preset module at least comprises: a separation module and a channel combination module; the separation module is used for equally dividing the channel number of the received feature images to obtain a first channel feature image and a second channel feature image, transmitting the first channel feature image to the first channel branch structure and transmitting the second channel feature image to the second channel branch structure; the channel merging module is used for merging the channels of the first initial characteristic representation and the second initial characteristic representation to obtain a merged characteristic representation;

and constructing the light module based on the preset module, wherein the light module is used for outputting a preset feature representation, and the preset feature representation is the combined feature representation output by the last preset module.

4. The defect detection method of claim 2, wherein the step of constructing a lightweight backbone network based on the lightweight module comprises:

Constructing a convolution pooling structure and a preset normalization structure, and constructing a preset operation structure based on the preset normalization structure;

Determining a preset number of the light modules;

And constructing the light backbone network based on the convolution pooling structure, the preset number of light modules and the preset operation structure, wherein the light backbone network is used for extracting abstract feature representations of the pictures to be detected.

5. The defect detection method of claim 2, wherein the step of constructing an initial detection model based on the lightweight backbone network comprises:

Constructing a feature fusion network, wherein the feature fusion network at least comprises: the device comprises an up-sampling module and a channel merging module, wherein the up-sampling module is used for amplifying a received reduced feature map to obtain an amplified feature map; the channel merging module is used for merging the channels of the enlarged feature map and the feature map indicated by the abstract feature representation, the reduced feature map is a feature map obtained by processing the feature map indicated by the abstract feature representation through a preset normalization structure, and the feature fusion network is used for outputting a target feature map;

Constructing a classification network, wherein the classification network is used for classifying the target feature images and outputting detection results with different sizes, and the detection results comprise: the marking frame marking the input picture and the defect information marked for each marking frame, wherein the defect information comprises: the defect type and the defect probability value;

and constructing the initial detection model based on the lightweight backbone network, the feature fusion network and the classification network.

6. The defect detection method of claim 2, wherein the step of training the initial detection model with a set of historical data to obtain a preset detection model comprises:

Collecting the historical data set of the product to be detected, wherein the historical data set comprises: a plurality of product pictures and the defect types corresponding to the product pictures;

Dividing the historical data set into a training set, a verification set and a test set;

Training the initial detection model by adopting the training set to obtain the trained initial detection model;

adopting the verification set to verify the initial detection model after training to obtain a verification result;

and under the condition that the verification result indicates that verification is passed, testing the trained initial detection model by adopting the test set to obtain a test result, wherein the test result comprises the following steps: presetting an index value;

and under the condition that the preset index value belongs to a preset index threshold range, determining that training of the initial detection model is completed, and obtaining the preset detection model.

7. The defect detection method according to claim 1, wherein the step of inputting the picture to be detected to the preset detection model and outputting a mark picture comprises:

Determining a target detection result corresponding to the target size from detection results of different sizes output by a classification network by adopting the preset detection model based on the size information of the target industrial product;

and generating the marked picture based on the target detection result.

8. A defect detection device for an industrial product, comprising:

The acquisition unit is used for acquiring a picture to be detected of the target industrial product;

the output unit is used for inputting the picture to be detected into a preset detection model and outputting a marked picture, wherein the preset detection model is a detection model constructed based on a lightweight backbone network, and the marked picture is provided with a plurality of marked frames;

The analyzing unit is used for analyzing the marking pictures to obtain the defect type marked by each marking frame and the defect probability value corresponding to the defect type;

and the adding unit is used for adding the defect type corresponding to the defect probability value into the product defect set of the target industrial product under the condition that the defect probability value is larger than a preset defect threshold value.

9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the defect detection method of the industrial product according to any one of claims 1to 7.

10. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of defect detection for an industrial product of any of claims 1-7.