CN110555839A

CN110555839A - Defect detection and identification method and device, computer equipment and storage medium

Info

Publication number: CN110555839A
Application number: CN201910843972.XA
Authority: CN
Inventors: 高斌斌; 高立钊; 贾佳亚; 戴宇荣; 沈小勇
Original assignee: Tencent Cloud Computing Beijing Co Ltd
Current assignee: Tencent Cloud Computing Beijing Co Ltd
Priority date: 2019-09-06
Filing date: 2019-09-06
Publication date: 2019-12-10

Abstract

The invention discloses a defect detection and identification method, a defect detection and identification device, computer equipment and a medium, and belongs to the field of computer vision detection and identification. The method comprises the steps of obtaining a mask image by segmenting the background and the foreground in a target product image, positioning a defect target in the target product image according to the spatial position distribution and the number of connected domains in the mask image of the target product image, and further identifying a target product image block corresponding to a target positioning frame.

Description

Defect detection and identification method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of computer vision inspection, and in particular, to a method and an apparatus for defect detection and identification, a computer device, and a computer-readable storage medium.

Background

the defect detection and identification are widely applied to the fields of industrial production, manufacturing, quality monitoring and the like, such as liquid crystal panel defect identification, workpiece surface quality detection, cloth surface flaw identification, aerospace equipment quality detection and the like. Through defect detection, defects on the surface of a product can be found, and can be corrected in time by maintenance personnel to ensure the quality of the product, however, in order to accurately judge whether the quality of the product is qualified or not, select which process to maintain and the like, after a target product image suspected to include the defects of the product is obtained, the image needs to be carefully analyzed and finely identified. Therefore, a defect detection and identification method is needed to realize automatic positioning of the surface defects of the product and intelligent identification of the defect types.

the current defect detection and identification methods mainly comprise three types: firstly, zooming an original target product image of a product to a fixed size, then identifying the defect type in the original target product image by using a Convolutional Neural Network (CNN), sampling the original target product image by using a sliding window mode to obtain target product image blocks in the actual implementation process, then performing defect positioning on each target product image block by using the CNN, and realizing the identification of the defect type; secondly, firstly, a cascade detection network is constructed by using a target detection algorithm such as a Single-shot Detector (SSD), uniform real-time target detection (You Only Look one, YOLO) and the like to locate the defect of the target product image block, and then the CNN is used for identifying the defect type of the located target product image block; thirdly, designing a system structure of a cascade self-encoder so as to divide the acquired original target product image on the surface of the product to obtain a mask image of the target product image and further obtain a minimum peripheral boundary frame of the target product image, thereby realizing defect positioning, and finally sending the positioned target product image block into a CNN (compressed natural number network) to realize accurate identification of the defect type.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

In the first method, the original image is input when the target product image is sampled in a sliding window mode, accurate identification is difficult to achieve when the defect is too small, and the defect is positioned in the sliding window mode, so that the position of the defect can be only roughly obtained, and the defect positioning is inaccurate; in the second method, the method for constructing the cascade detection network by the target detection algorithm is difficult to realize accurate segmentation of the defect boundary and shape, and influences accurate positioning of the core defect position; in the third method, when the mask image of the target product image has noise or scattered distribution, the result obtained by using the positioning method based on the minimum peripheral frame of the mask is difficult to accurately express the position information of the defect, and further, the performance of identifying the defect type is affected, so that the defect type identification result is inaccurate.

Disclosure of Invention

The embodiment of the invention provides a defect detection and identification method, a defect detection and identification device, computer equipment and a computer readable storage medium, which can solve the problems of inaccurate defect positioning and poor defect type identification precision in the related technology. The technical scheme is as follows:

in one aspect, a defect detection and identification method is provided, and the method includes:

Acquiring a mask image of the target product image based on the target product image;

According to the spatial position distribution and the number of the connected domains in the mask image of the target product image, determining a target positioning frame in the target product image, wherein the connected domains and the image background contained in the target positioning frame meet target conditions;

and identifying the target product image block corresponding to the target positioning frame in the target product image.

In one aspect, a defect detection and identification apparatus is provided, the apparatus including:

the segmentation module is used for acquiring a mask image of the target product image based on the target product image;

The positioning module is used for determining a target positioning frame in the target product image according to the spatial position distribution and the number of the connected domains in the mask image of the target product image;

And the identification module is used for identifying the target product image block corresponding to the target positioning frame in the target product image.

In one possible implementation, the positioning module is further configured to:

when only one connected domain exists in the mask image of the target product image, determining the positioning frame of the connected domain as the target positioning frame, wherein the first connected domain is the largest connected domain in the mask image;

And when two or more connected domains exist in the mask image of the target product image, determining the target positioning frame according to the area ratio of the combined frame and the combined mask ratio.

when the area ratio of the merging frame meets a first value range or the area ratio of the merging mask meets a second value range, determining the positioning frame of the first connected domain as a target positioning frame;

and when the area ratio of the merging frame does not satisfy the first value range and the area ratio of the merging mask does not satisfy the second value range, determining a positioning frame which is positioned in the positioning frame of the first merging domain and contains the positioning frame of the first connected domain in the mask map as the target positioning frame, wherein the first merging domain is obtained by merging all the connected domains.

In a possible implementation manner, the location module is further configured to determine a location box of a first connected domain in the mask map as an initial location box;

the positioning module is further configured to determine a positioning frame of a second merged domain based on a nearest connected domain of the first connected domain, where the second merged domain includes the first merged domain and the nearest connected domain of the first connected domain;

A calculation module, configured to calculate the merge frame area ratio and the merge mask ratio of the second merge domain;

The positioning module is further configured to determine the enlarged positioning frame as the target positioning frame when the area ratio of the merging frame satisfies a first value range or the area ratio of the merging mask satisfies a second value range.

in one possible implementation, the apparatus further includes:

The extraction module is used for extracting a characteristic diagram of the target product image through a convolutional neural network of the defect detection model;

the pyramid module is used for inputting the feature map into a spatial pyramid module of the defect detection model to obtain feature maps with different particle sizes of the target product image;

The upsampling module is used for performing upsampling processing on the feature maps with different granularities through the spatial pyramid module to obtain a final feature map;

And the segmentation mask extraction module is used for extracting the defect mask by using the convolution layer of 1x1 based on the final feature map to obtain a mask map of the target product image.

in one possible implementation, the apparatus further includes:

The intercepting module is used for intercepting a square target product image block by taking the longest edge of the target positioning frame as the side length based on the center of the target positioning frame;

The identification module is also used for identifying the square target product image block.

in one aspect, a computer device is provided that includes one or more processors and one or more memories having at least one program code stored therein, the program code being loaded and executed by the one or more processors to perform the operations performed by the defect detection and identification method.

in one aspect, a computer-readable storage medium having at least one program code stored therein is provided, the program code being loaded into and executed by a processor to implement the operations performed by the defect detection and identification method.

The method comprises the steps of obtaining a mask image by segmenting the background and the foreground in a target product image, positioning a defect target frame in the target product image according to the spatial position distribution and the number of the connected domains in the mask image of the target product image, and further identifying a target product image block corresponding to the target positioning frame.

drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an implementation environment of a defect detection and identification method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a defect detection and identification model according to an embodiment of the present invention;

FIG. 3 is a flowchart of a defect detection and identification method according to an embodiment of the present invention;

Fig. 4 is a schematic structural diagram of a spatial pyramid module according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a connected component merging according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a positioning result of a target positioning frame according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a defect detection and identification apparatus according to an embodiment of the present invention;

Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

in order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

the artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image Recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, Three-Dimensional object reconstruction, Three-Dimensional graphics (3D) technology, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face Recognition and fingerprint Recognition.

In brief, in the image semantic division, a computer performs division according to the semantics of an image, and in the image field, the semantics refers to the content of the image, and the division means that different objects in the image are divided from the perspective of pixels, and each pixel in the original image is identified.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The CNN is a feedforward neural network, and its artificial neurons can respond to surrounding units in a part of coverage range, and has excellent performance for large-scale image processing, and specifically includes convolutional layers, pooling layers, regularization layers, random inactivation layers, activation function layers, and the like.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The defect detection generally refers to the detection of the surface defects of the article, and the surface defect detection is to detect the defects of spots, pits, scratches, color differences, defects and the like on the surface of a workpiece by adopting an advanced computer vision detection technology.

The scheme provided by the embodiment of the invention relates to the technologies of machine learning, computer vision technology and the like of artificial intelligence, and is specifically explained by the following embodiments:

fig. 1 is a schematic diagram of an implementation environment of a defect detection and identification method provided in an embodiment of the present invention, and referring to fig. 1, the implementation environment includes: a computer device 101.

computer device 101 may be at least one of a desktop Graphics Processing Unit (GPU) computer, a GPU computing cluster, a neural network computer, and the like. Related technicians can use the computer device 101 to process product images, find out defective products and ensure the product quality. The computer device 101 may process an image input therein, illustratively, the computer device 101 is connected to the camera assembly to automatically acquire and process the image, and a related technician may input the image into the computer device to further process the image, and the image acquisition mode is not limited in the present invention. Optionally, the computer device 101 may further have at least one image database, such as a defect type database, a defect image database, or the like, for storing possible defect types and captured defect images.

fig. 2 is a schematic structural diagram of a defect detection model according to an embodiment of the present invention, and referring to fig. 2, the defect detection identification model sequentially includes a CNN, a spatial pyramid module, a segmentation processing layer, defect location, and defect identification. The computer equipment can send the target product image input into the CNN for processing to obtain the feature map of the target product image, the feature map of the target product image is used as the input of a spatial pyramid module, the feature maps with different granularities are obtained by utilizing a plurality of levels of pooling kernels in the spatial pyramid module, the dimension reduction processing of the feature maps with different granularities is realized by utilizing the convolution layer in the spatial pyramid module, then using bilinear interpolation to carry out up-sampling processing on the feature maps with different granularities after dimension reduction, connecting the feature maps subjected to the up-sampling processing in series to be used as the output of the space pyramid module to obtain a final feature map, and processing the final characteristic diagram by using a segmentation processing layer to obtain a mask diagram of the target product image, and performing defect positioning and defect type identification based on the mask diagram of the target product image.

fig. 3 is a flowchart of a defect detection and identification method according to an embodiment of the present invention, and referring to fig. 3, the method includes:

301. the computer device obtains a target product image.

It should be noted that the computer device may acquire the image of the target product through the camera assembly connected to the computer device, or a related technician may input the image of the target product into the computer device, and the specific manner of acquiring the image of the target product is not limited in the embodiments of the present invention.

302. and the computer equipment acquires a feature map of the target product image based on the target product image.

In one possible implementation, the computer device may perform feature extraction on the target product image through the feature extraction layer, the feature extraction layer may include a plurality of feature extraction layers, each of which may be provided with a corresponding weight matrix, so that the computer device can slide over the target product image based on the sliding window of the feature extraction layer, to obtain the sub-image to be processed, to multiply the pixel value of the sub-image with the weight matrix, thereby obtaining the value of a feature point, outputting a feature map of a feature extraction layer through a plurality of sliding of a sliding window, taking the feature map as an input map of the next feature extraction layer, continuously extracting features, and by analogy, the feature diagram output by the last feature extraction layer in the feature extraction model is used as the feature diagram of the target product image. The above-described process is merely an illustration of one possible implementation manner of the feature extraction process, and is not used to limit the feature extraction method adopted in the embodiment of the present invention.

in step 302, multiple layers of feature extraction layers may be used to implement feature extraction, and more layers of networks may iteratively extract more complex features from the low-level features.

303. And the computer equipment inputs the characteristic diagram into a space pyramid module to obtain the characteristic diagrams of different granularities of the target product image.

It should be noted that the main purpose of the spatial pyramid module is to integrate context information of different levels to enrich the feature representation of the image. Fig. 4 is a schematic diagram of a specific structure of a spatial pyramid module according to an embodiment of the present invention, where the schematic diagram is a part of a defect detection model used in the mask prediction process in fig. 2, and referring to fig. 4, the spatial pyramid module uses a plurality of hierarchical pooling kernels, and can obtain feature maps with different granularities.

The context information may be some or all of the information that can affect the objects in the scene and the image, and the context information is not directly obtained from the appearance of the target, but is obtained from data in the neighborhood, the label of the target, the spatial position distribution of the target, or data statistics information. In an actual process, the target can be identified and processed by capturing the interaction information between different objects and taking the interaction information between the objects and the scene as a condition.

304. And the computer equipment acquires a final feature map based on the feature maps with different granularities.

In a possible implementation manner, the computer device performs dimensionality reduction on feature maps of different granularities by using a convolution layer to obtain a dimensionality reduced feature map, performs upsampling processing on the dimensionality reduced feature map by using a bilinear interpolation value, and finally connects the upsampled feature maps in series to obtain the output of the spatial pyramid module as a final feature map.

It should be noted that the final feature map is a final feature representation, and the final feature representation includes information of local and global contexts.

305. and the computer equipment acquires a mask image of the target product image based on the final feature image.

In one possible implementation, the computer device uses the final feature map as an input to a segmentation processing layer of a defect detection model to obtain a mask map of the target product image, the mask map being a pixel-level mask prediction result.

it should be noted that the defect detection model may be a neural network model based on a depth semantic segmentation algorithm, for example, the depth semantic segmentation algorithm may be a two-class semantic segmentation algorithm, and a mask map of the target product image may be obtained through the two-class semantic segmentation algorithm, so as to realize prediction of a pixel level of the target product image.

It should be noted that, the above steps 302 to 305 may also be replaced by other methods to predict the defect mask map, and the embodiment of the present invention does not limit which method is specifically adopted, for example, the template matching method may be used to predict the defect mask map.

306. The computer device detects the spatial position distribution and the number of the connected domains in the mask image of the target product image, executes step 307 when detecting that only one connected domain exists in the mask image of the target product image, and executes step 308 when detecting that two or more connected domains exist in the mask image of the target product image.

It should be noted that the computer device may determine, according to pixel values of different pixel points in the mask image of the target product image, pixel points having the same or similar pixel values and adjacent to each other in position, and further determine positions of connected domains, where the pixel points having different or less similar pixel values may form different connected domains, and the computer device may also perform statistics on spatial position distribution and number of the determined connected domains to obtain a positional relationship between the connected domains in the mask image of the target product image, an area of the connected domains, and the like.

In the embodiment of the present invention, it may be assumed that there are n connected regions in the mask map of the target product image, and the set of n connected regions may be denoted as C ═ { C ₁, C ₂, …, C _n }, where n may be any positive integer greater than or equal to 1.

it should be noted that there may be a case where there is no connected domain in the mask map of the target product image, i.e. C is an empty set, in which case the computer device may not perform the subsequent steps.

307. the computer device performs step 314 with the localization box of the connected domain as the target localization box.

In a possible implementation manner, when the computer device detects that there is only one connected domain in the mask image of the target product image, the location frame of the connected domain may maximally include all defects in the mask image of the target product image, and therefore, the computer device may define, in advance, a location frame of any size and at any position in the mask image of the target product image as a target location frame m, and may further update the position of the target location frame m to a location frame b ₁ of the connected domain, that is, m ← b ₁.

it should be noted that the positioning frame of any connected component c _i in the mask map of the target product image may be denoted as b _i, the positioning frame of the connected component may be denoted as b ₁, and the target positioning frame may be denoted as m.

308. The computer device determines a first connected domain from the two or more connected domains, and determines a positioning frame of the first connected domain as an initial target positioning frame, wherein the first connected domain is the largest connected domain in the mask map.

In one possible implementation, the computer device detects spatial position distribution and number of pixels included in each connected domain of the two or more connected domains, determines the area of each connected domain according to the spatial position and number of the pixels in each connected domain, further finds the largest connected domain in the mask map as a first connected domain, and determines a location frame of the first connected domain as an initial location frame

here, the area max _i area (c _i) may refer to a value of i when the area (c _i) reaches a maximum value, and the area (c _i) may indicate an area of the connected domain c _i.

it should be noted that, when the computer device detects that there are two or more connected domains in the mask image of the target product image, the core idea of the method for balancing and positioning defects between the positioning frames based on the first connected domain and all the connected domains provided in the embodiment of the present invention is to use the largest connected domain as the initial solution of the defect position, and continuously absorb the neighboring connected domains until reaching a certain balance, so that the method can make the area of the defect mask in the target positioning frame more balanced with the area of the background, and can more accurately realize defect positioning.

309. The computer device determines a second connected domain closest to the first connected domain and a positioning frame of the second connected domain based on the first connected domain.

In a possible implementation manner, the computer device may detect the center and the boundary of each connected component in the mask map, determine, according to the detected result, the connected component having the smallest distance to the center and the boundary of the first connected component in combination with the spatial position distribution information of the connected components, and determine the connected component as the second connected component.

Fig. 5 is a schematic diagram of merging connected domains according to an embodiment of the present invention, and referring to fig. 5, a target location frame m and a location frame b corresponding to a connected domain closest to the target location frame are respectively represented by rectangular frames as shown in fig. 5, and masks included in the rectangular frames are respectively circular and crescent-shaped. The circular mask area is a first communication domain, and the crescent mask area is a second communication domain.

310. the computer device determines a merge box area ratio and a merge mask area ratio of the first and second connected domains, the merge box area ratio being used to represent a ratio between the sum of the areas of the localization boxes of the two or more connected domains and the area of the localization box of the merged domain after the connected domains are merged. The merge mask ratio is used to indicate a ratio between the sum of the areas of the two or more connected components and the area of the alignment box of the merged domain after merging the connected components.

it should be noted that the merge box area ratio is defined as the ratio of the area of the target location box and the location box corresponding to the connected component closest to the target location box to the area of the location box of the merge component, i.e. area (uni)/area (clo), where area () may represent the area, the connected component closest to the target location box may represent _c, the corresponding location box may represent b, uni may represent the area where b merges with the target location box m, i.e. uni ← m £ b, and clo may represent the location boxes of m and b, i.e. clo ← [ m, b ].

It should be noted that the merged mask ratio is defined as the ratio of the area of the defect mask in the target location frame and the location frame corresponding to the connected domain closest to the target location frame to the area of the location frame in the merged domain, i.e. mask (close)/area (close), where mask () may represent the area of the defect mask, area () may represent the area, the connected domain closest to the target location frame may represent c, the corresponding location frame may represent b, and close may represent the location frames of m and b.

Referring to the connected domain merging diagram shown in fig. 5, the merging box area ratio is the ratio of the area sum of the boxes m and b to the area of the positioning box in the merging domain, and the merging mask ratio is the ratio of the sum of the areas of the circular and crescent masks to the area of the positioning box in the merging domain.

It should be noted that the computer device may determine the target location box by using the defect location policy based on the spatial distribution of the defect mask map according to the calculated values of the area ratio of the merged box and the merged mask ratio, where the specific implementation method is as shown in the following steps 311 to 313:

311. When the computer device detects that the area occupation ratio of the merged frame meets the first value range, or the merged mask occupation ratio meets the second value range, step 312 is executed, otherwise, step 313 is executed.

It should be noted that two thresholds, which are respectively denoted as a merging box area ratio threshold τ ₁ and a merging mask ratio threshold τ ₂, may be preset by the computer device, where the merging box area ratio satisfies a first value range, and may be that the merging box area ratio is smaller than the merging box area ratio threshold τ ₁, and the merging mask ratio satisfies a second value range, and may be that the merging mask ratio is smaller than the merging mask ratio threshold τ ₂.

312. The computer device determines the location box of the first connected domain as the target location box, and performs step 314.

it should be noted that there may be a special case where the merged frame area ratio and the merged mask ratio are both 1, and when the computer device detects that the merged frame area ratio and the merged mask ratio are both 1, it may be determined that only one connected domain, i.e. the first connected domain, is included in the mask map, and therefore, the computer device may directly determine the location box of the first connected domain as the target location box. Fig. 6 is a schematic diagram of a positioning result of a target positioning frame according to an embodiment of the present invention, and referring to fig. 6, a rectangular frame indicated by 601 in the diagram is the target positioning frame.

313. the computer device takes the merged domain of the first connected domain and the second connected domain as the first connected domain, and continues to execute the above step 309 and subsequent steps.

In one possible implementation, the computer device merges a second connected domain closest to the first connected domain according to the comparison result, expands the representative region of the connected domain, and updates the target localization box, i.e., m ← [ m, b ], [ m, b ] can represent the localization boxes of boxes m and b.

It should be noted that two thresholds, which are respectively recorded as a merging frame area ratio threshold τ ₁ and a merging mask ratio threshold τ ₂, may be preset by the computer device, and when the computer device detects that the merging frame area ratio satisfies a first value range or the merging mask ratio satisfies a second value range, that is, the merging frame area ratio is smaller than the merging frame area ratio threshold τ ₁ or the merging mask ratio is smaller than the merging mask ratio threshold τ ₂, it is not necessary to search for a communication domain closest to a currently determined communication domain, a positioning frame of the currently determined communication domain is a target positioning frame, see fig. 6, and a rectangular frame indicated by 603 in the drawing is the target positioning frame.

When the computer device detects that both the merged frame area ratio and the merged mask ratio are 0, the computer device determines the location frame of the merged domain of the first connected domain and the second connected domain in the mask map as the target location frame, see fig. 6, where the rectangular frame indicated by 602 in the map is the target location frame.

It should be noted that, steps 308 to 313 above provide a loop processing procedure, where the target location frame can be directly determined as a location frame of a connected domain when the number of connected domains is 1, and when the number of connected domains is multiple, the target location frame can be determined through the loop processing procedure, each time the current maximum connected domain and the nearest connected domain are merged, and the determination is performed based on the loop cut-off condition, if any one of the loop cut-off conditions is met, the location frame of the connected domain obtained by current merging can be taken as the target location frame, and if the loop cut-off condition is not met, the location frame of the first connected domain at that time can be taken as the first connected domain to continue executing the above-mentioned step 309 and subsequent steps until the value of the area occupation ratio of the merged box is smaller than the area occupation ratio threshold τ ₁ of the merged box or the area occupation ratio is smaller than the mask occupation ratio threshold τ ₂ or there is no connected domain that is not merged, and the location frame of the first connected domain at that time is taken as the target location frame.

It should be noted that, in the foregoing process, the position relationship between the connected domains, the areas of the connected domains, and the like may be determined in step 308 by detecting the spatial position distribution of the connected domains, the computer device may obtain the connected domain set C ═ C ₁, C ₂, …, C _n } based on the detected connected domains, and in the process of executing a loop, each time the merge process is completed, the merged connected domains may be deleted from the connected domain set, and when the connected domain set is an empty set, it may be determined that there is no un-merged connected domain currently, the loop process may be stopped, and the implementation process of the specific algorithm is not limited by the embodiment of the present invention.

314. the computer device intercepts square target product image blocks by taking the longest edge of the target positioning frame as the side length based on the center of the target positioning frame.

It should be noted that, when processing an image block of a target product by using a segmentation processing layer of the defect detection model, the image block of the target product is required to be a square image block, and therefore, the square image block needs to be intercepted in a mask image of the target product.

315. And the computer equipment identifies the square target product image block.

In a possible implementation manner, the computer device scales the target product image block to a fixed size, determines a boundary box of a defect mask in the target product image, and implements identification of a defect type according to the detected boundary box of the defect mask and the defect type data obtained by training.

It should be noted that, the above steps 317 to 318 may also be replaced by other methods to identify the defect type, and the embodiment of the present invention does not limit which method is specifically adopted, for example, manual features such as Scale-invariant Feature Transform (SIFT), Histogram of Oriented Gradients (HOG), gray level co-occurrence matrix, wavelet features, etc. may be adopted, and then machine learning methods such as multi-classification support vector machine, random forest, etc. may be adopted, or deep learning such as convolutional neural network may be used to identify the defect type.

The method comprises the steps of obtaining a mask image by segmenting the background and the foreground in a target product image, positioning a defect target in the target product image according to the spatial position distribution and the number of the connected domains in the mask image of the target product image, and further identifying a target product image block corresponding to a target positioning frame, wherein the segmentation method converts the prediction of the defect shape and the boundary into the segmentation of the defect foreground and the background, so that the defect mask is more accurately predicted, the defect positioning block comprises a defect foreground and an image background which meet target conditions, and meanwhile, the method for positioning the defect can more accurately position the defect, is beneficial to extracting main defect characteristics, reduces the influence of mask noise and the target product image background on defect type identification, improves the accuracy of defect type identification, and supports the identification of various morphological defects, the accuracy of defect type identification is improved, high-precision identification of fine defects is achieved, and the defect classification method has good classification performance particularly for the defects which are too small and have similar appearance characteristics.

All the above-mentioned optional technical solutions can be combined arbitrarily to form the optional embodiments of the present invention, and are not described herein again.

Fig. 7 is a schematic diagram of a defect detection and identification apparatus provided in an embodiment of the present invention, and referring to fig. 7, the apparatus includes:

An obtaining module 701, configured to obtain a mask map of a target product image based on the target product image;

A determining module 702, configured to determine a target location frame in the target product image according to the spatial position distribution and the number of connected domains in the mask map of the target product image;

the identifying module 703 is configured to identify an image block of the target product corresponding to the target positioning frame in the image of the target product.

in one possible implementation, the determining module is further configured to:

in one possible implementation, the apparatus further includes:

And the segmentation extraction module is used for obtaining a mask map of the target product image based on the final feature map and the convolution layer.

in one possible implementation, the apparatus further includes:

the device obtains the mask image by segmenting the background and the foreground in the target product image, positions the defect target in the target product image according to the spatial position distribution and the number of the connected domains in the mask image of the target product image, and further identifies the target product image block corresponding to the target positioning frame.

it should be noted that: in the defect detection and identification apparatus provided in the above embodiment, when performing defect detection, only the division of the functional modules is illustrated, and in practical application, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the embodiments of the defect detection and identification device and the defect detection and identification method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the embodiments of the methods and are not described herein again.

Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present invention. The computer device 800 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer iv, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Computer device 800 may also be referred to by other names such as user device, portable computer device, laptop computer device, desktop computer device, and so forth.

Generally, the computer device 800 includes: one or more processors 801 and one or more memories 802.

the processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 801 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 801 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 801 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 801 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 802 is used to store at least one program code for execution by processor 801 to implement the defect detection identification method provided by the method embodiments of the present invention.

In some embodiments, the computer device 800 may further optionally include: a peripheral interface 803 and at least one peripheral. The processor 801, memory 802 and peripheral interface 803 may be connected by bus or signal lines. Various peripheral devices may be connected to peripheral interface 803 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 804, a display 805, a camera 806, an audio circuit 807, a positioning component 808, and a power supply 809.

The peripheral interface 803 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 801 and the memory 802. In some embodiments, the processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.

The Radio Frequency circuit 804 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 804 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 804 converts an electrical signal into an electromagnetic signal to be transmitted, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 804 may communicate with other computer devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 804 may further include NFC (Near Field Communication) related circuits, which are not limited in the present disclosure.

The display screen 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to capture touch signals on or above the surface of the display 805. The touch signal may be input to the processor 801 as a control signal for processing. At this point, the display 805 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 805 may be one, providing the front panel of the computer device 800; in other embodiments, the display 805 may be at least two, each disposed on a different surface of the computer device 800 or in a folded design; in still other embodiments, the display 805 may be a flexible display, disposed on a curved surface or on a folded surface of the computer device 800. Even further, the display 805 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 805 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-emitting diode), and the like.

The camera assembly 806 is used to capture images or video. Optionally, camera assembly 806 includes a front camera and a rear camera. Generally, a front camera is disposed on a front panel of a computer apparatus, and a rear camera is disposed on a rear surface of the computer apparatus. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 806 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

the audio circuit 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 801 for processing or inputting the electric signals to the radio frequency circuit 804 to realize voice communication. For stereo capture or noise reduction purposes, the microphones may be multiple and located at different locations on the computer device 800. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 807 may also include a headphone jack.

The Location component 808 is used to locate the current geographic Location of the computer device 800 to implement navigation or LBS (Location Based Service). The positioning component 808 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

A power supply 809 is used to power the various components in the computer device 800. The power supply 809 can be ac, dc, disposable or rechargeable. When the power source 809 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the computer device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815 and proximity sensor 816.

The acceleration sensor 811 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the computer apparatus 800. For example, the acceleration sensor 811 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 801 may control the display 805 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 811. The acceleration sensor 811 may also be used for acquisition of motion data of a game or a user.

the gyro sensor 812 may detect a body direction and a rotation angle of the computer device 800, and the gyro sensor 812 may cooperate with the acceleration sensor 811 to acquire a 3D motion of the user with respect to the computer device 800. From the data collected by the gyro sensor 812, the processor 801 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

pressure sensors 813 may be disposed on the side bezel of computer device 800 and/or underneath display screen 805. When the pressure sensor 813 is arranged on the side frame of the computer device 800, the holding signal of the user to the computer device 800 can be detected, and the processor 801 performs left-right hand identification or shortcut operation according to the holding signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at a lower layer of the display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 805. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 814 is used for collecting a fingerprint of the user, and the processor 801 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Fingerprint sensor 814 may be disposed on the front, back, or side of computer device 800. When a physical key or vendor Logo is provided on the computer device 800, the fingerprint sensor 814 may be integrated with the physical key or vendor Logo.

The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, processor 801 may control the display brightness of display 805 based on the ambient light intensity collected by optical sensor 815. Specifically, when the ambient light intensity is high, the display brightness of the display screen 805 is increased; when the ambient light intensity is low, the display brightness of the display 805 is reduced. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera assembly 806 based on the ambient light intensity collected by the optical sensor 815.

A proximity sensor 816, also known as a distance sensor, is typically provided on the front panel of the computer device 800. The proximity sensor 816 is used to capture the distance between the user and the front of the computer device 800. In one embodiment, the processor 801 controls the display 805 to switch from the bright screen state to the dark screen state when the proximity sensor 816 detects that the distance between the user and the front face of the computer device 800 is gradually reduced; when the proximity sensor 816 detects that the distance between the user and the front of the computer device 800 is gradually increasing, the display screen 805 is controlled by the processor 801 to switch from a breath-screen state to a bright-screen state.

Those skilled in the art will appreciate that the configuration illustrated in FIG. 8 is not intended to be limiting of the computer device 800 and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components may be employed.

in an exemplary embodiment, a computer readable storage medium, such as a memory, including program code, which is executable by a processor to perform the defect detection and identification method in the above embodiments is also provided. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by hardware associated with program code, and the program may be stored in a computer readable storage medium, where the above mentioned storage medium may be a read-only memory, a magnetic or optical disk, etc.

The above description is only exemplary of the present invention and should not be taken as limiting the invention, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for defect detection and identification, the method comprising:

Acquiring a mask image of a target product image based on the target product image;

Determining a target positioning frame in the target product image according to the spatial position distribution and the number of the connected domains in the mask image of the target product image, wherein the foreground and the background determined by the target positioning frame meet target conditions;

2. The method according to claim 1, wherein the determining a target location box in the target product image according to the spatial position distribution and the number of connected domains in the mask map of the target product image comprises:

When only one connected domain exists in the mask image of the target product image, determining the positioning frame of the connected domain as the target positioning frame;

When two or more connected domains exist in the mask image of the target product image, determining the target positioning frame according to the area ratio of the combined frame and the combined mask ratio;

The area ratio of the merging box is used for representing the ratio of the area sum value of the positioning boxes of the two or more connected domains to the area of the positioning box of the merged domain after the connected domains are merged;

the merge mask ratio is used to represent a ratio between the sum of the areas of the two or more connected domains and the area of the positioning frame of the merged domain after the connected domains are merged.

3. The method of claim 2, wherein determining the target position box based on a merge box area ratio and a merge mask ratio comprises:

When the area occupation ratio of the merging frame meets a first value range or the area occupation ratio of the merging mask meets a second value range, determining the positioning frame of the first connected domain as a target positioning frame;

And when the area ratio of the merging frame does not satisfy a first value range and the area ratio of the merging mask does not satisfy a second value range, determining one locating frame which is located in the locating frame of the first merging area and contains the locating frame of the first connected area in the mask map as the target locating frame, wherein the first merging area is obtained by merging all the connected areas.

4. the method of claim 3, wherein prior to determining the target location box based on a merge box area ratio and a merge mask ratio, the method further comprises:

Determining a positioning frame of a first connected domain in the mask map as an initial target positioning frame;

determining a positioning frame of a second merging domain based on a nearest connected domain of the first connected domain, wherein the second merging domain comprises the first merging domain and the nearest connected domain of the first connected domain;

calculating the merge frame area ratio and the merge mask ratio of the second merge domain.

5. the method of claim 1, wherein the obtaining a mask map of the target product image based on the target product image comprises:

extracting a characteristic diagram of the target product image through a convolutional neural network of the defect detection model;

inputting the feature map into a spatial pyramid module of the defect detection model to obtain feature maps with different granularities of the target product image;

Performing upsampling processing on the feature maps with different granularities through the spatial pyramid module to obtain a final feature map;

And acquiring a mask image of the target product image based on the final feature image through a segmentation extraction layer of the defect detection model.

6. The method of claim 1, wherein the identifying the target product image patch corresponding to the target positioning frame in the target product image comprises:

based on the center of the target positioning frame, taking the longest edge of the target positioning frame as the side length, and intercepting an image block of a square target product;

and identifying the square target product image block.

7. a defect detection and identification apparatus, comprising:

8. The apparatus of claim 7, wherein the positioning module is further configured to:

when only one connected domain exists in a mask image of the target product image, determining a positioning frame of the connected domain as the target positioning frame, wherein the first connected domain is the largest connected domain in the mask image;

9. The apparatus of claim 7, wherein the positioning module is further configured to:

10. The apparatus of claim 7, wherein the positioning module is further configured to determine a positioning box of a first connected component in the mask map as an initial positioning box;

The determining module is further configured to determine a location box of a second merged domain based on a nearest connected domain of the first connected domain, where the second merged domain includes the first merged domain and the nearest connected domain of the first connected domain;

A calculating module, configured to calculate the merge frame area ratio and the merge mask ratio of the second merge domain.

11. The apparatus of claim 7, further comprising:

the segmentation module is used for extracting a characteristic diagram of the target product image through a convolutional neural network of the defect detection model;

The pyramid module is used for inputting the spatial pyramid module of the defect detection model based on the feature map to obtain feature maps with different granularities of the target product image;

And the segmentation extraction module is used for realizing the extraction of the defect mask by using the convolution layer of 1x1 based on the final feature map and obtaining the mask map of the target product image.

12. The apparatus of claim 7, further comprising:

the identification module is further used for identifying the square target product image blocks.

13. A computer device comprising one or more processors and one or more memories having at least one program code stored therein, the program code loaded into and executed by the one or more processors to perform the operations of the defect detection and identification method of any one of claims 1 to 6.

14. A computer-readable storage medium having at least one program code stored therein, the program code being loaded and executed by a processor to perform the operations of the defect detection and identification method of any one of claims 1 to 6.