CN117152417A

CN117152417A - Method for detecting defect of chip package, storage medium and detection device

Info

Publication number: CN117152417A
Application number: CN202311180692.8A
Authority: CN
Inventors: 王兆广; 杜文斌; 祝鲁宁; 何春来; 王卫军
Original assignee: Shanghai Micro Motor Research Institute 21st Research Institute Of China Electronics Technology Corp
Current assignee: Shanghai Micro Motor Research Institute 21st Research Institute Of China Electronics Technology Corp
Priority date: 2023-09-13
Filing date: 2023-09-13
Publication date: 2023-12-01

Abstract

The application relates to chip tube shell detection, in particular to a defect detection method of a chip tube shell, and aims to solve the problem that the existing detection method is poor in small target defect detection effect. The detection method comprises the following steps: acquiring an original image of a chip tube shell, and constructing an image feature extraction network with a feature pyramid; acquiring a whole feature map on an original image according to an image feature extraction network; extracting the whole feature map through a region generation network to obtain a defect candidate region and coordinates; carrying out positive and negative sample label distribution on the defect candidate region through normalization processing to obtain coordinates of the defect region; mapping the coordinates to corresponding positions on the defect feature map, and predicting defect types and defect positions through a classifier network; the predicted defect type and defect position are sorted into the final defect type and defect position by using a maximum value suppression algorithm, and the normalization processing can obtain remarkable performance improvement on small target defects.

Description

Method for detecting defect of chip package, storage medium and detection device

Technical Field

The application relates to the field of chip tube shell detection, and particularly provides a defect detection method, a storage medium and detection equipment for a chip tube shell.

Background

The chip tube shell is an important part of a chip product, and the quality of the tube shell can greatly influence the performance of the chip; the image segmentation algorithm and the feature extraction algorithm are key technologies for realizing surface defect detection. The existing chip tube shell defect detection technology based on a visual scheme can be divided into two main types, one type is based on a traditional visual scheme, and the other type is based on a deep learning method;

deep learning-based defect detection has become the dominant surface defect detection scheme; generally, the area of the tube shell is as small as one thumb cover of an adult, and the area of defects caused by factors such as technology, human factors and the like is not large in the process of production, and the size of the defect area can be as small as 10 pixels under shooting of imaging equipment; on the other hand, neural networks typically use downsampling methods to continuously concentrate image features as they process image data, resulting in one or more 1024-dimensional or 2048-dimensional features. In the down sampling process, adjacent pixel values are continuously combined into one element, so that if the area of a defect object is too small, the significance information representing the defect is continuously attenuated after down sampling, and defect information is insufficient in the final characteristic, so that the existing method has poor detection effect on defect detection of a small target.

Accordingly, there is a need in the art for a new method, storage medium and apparatus for detecting defects in chip packages to solve the above-mentioned problems.

Disclosure of Invention

The application aims to solve the technical problems, namely the problem that the existing detection method is poor in small target defect detection effect.

In a first aspect, the present application provides a method for detecting defects of a chip package, which is characterized in that the method comprises the following steps:

acquiring an original image of a chip tube shell, and constructing an image feature extraction network with a feature pyramid;

acquiring a whole feature map on the original image according to the image feature extraction network;

extracting the whole feature map through a region generation network to obtain a defect candidate region and coordinates;

performing positive and negative sample label distribution on the defect candidate region through normalization processing to obtain coordinates of the defect region;

mapping the coordinates to corresponding positions on the defect feature map, pooling the images of the corresponding positions into the same size, and predicting the defect types and the defect positions through a classifier network;

and sorting the predicted defect category and the defect position into the final defect category and the defect position by using a maximum suppression algorithm.

In the preferred technical scheme of the detection method, a backbone network of the image feature extraction network is a convolutional neural network;

the extraction network obtains a full Zhang Tezheng map from the image features, including,

extracting features of the original image by using the convolutional neural network to obtain a feature map;

performing multi-scale fusion on the feature graphs by using the feature pyramid structure to obtain the whole feature graph, wherein the image of the whole feature graph represents V epsilon R ^7×7×1024 。

In the preferred technical scheme of the above detection method, the extracting the whole feature map through the region generating network to obtain the defect candidate region and coordinates includes,

traversing each point on the whole feature map by using a sliding window on the obtained whole feature map, and calculating the position of the window center mapped back to the original image;

calculating and generating k anchor frames by taking the original image position as a center point according to the set area and the aspect ratio, wherein the anchor frames are defect candidate areas, and the points of the anchor frames are coordinates of the defect candidate areas;

the regional generation network predicts two scores for each anchor frame, respectively represents the probability that the anchor frame is foreground and background, and predicts 4 boundary frame regression parameters for fine-tuning the position and the size of the anchor frame so as to enable the anchor frame to be closer to a real label.

In the above preferred technical solution of the detection method, the loss function of the area generating network is:

the first term in the formula is the classification loss, p _i Is the output of the region generation network classification head, and represents the probability of the corresponding ith object in the anchor frame, namely the probability of the foreground, theA true label is represented, 1 represents that an object exists in the anchor frame, and otherwise 0 represents that the object does not exist; the N is _cls Represents the number of input batch samples, L _cls Is a class loss function, measured here using a cross entropy form; the second term of the above formula is the bounding box regression loss, t _i Representing the offset of the prediction frame from the ith said anchor frame, said +.>Representing the offset of the real annotation frame from the ith anchor frame, the N _reg Representing the number of anchor boxes, the lambda being an adjustable hyper-parameter for balancing classification loss and box regression loss, the L _reg Is a target loss regression function

In a preferred embodiment of the foregoing detection method, the calculating, based on the set area and aspect ratio, k anchor frames with the original image position as a center point includes,

the anchor frame is preset with a plurality of different area sizes and length-width ratios, wherein the areas are 128 multiplied by 128, 256 multiplied by 256 and 512 multiplied by 512, and the length-width ratios are 1:1, 1:2 and 2:1 respectively.

In a preferred technical scheme of the above detection method, the performing positive and negative sample label distribution on the defect candidate region through normalization processing to obtain coordinates of the defect candidate region includes,

obtaining candidate frames after adding regression parameters to the anchor frames, removing a large number of overlapped candidate frames by a non-maximum inhibition method, and only keeping a plurality of candidate frames for each picture;

and carrying out positive and negative label distribution on the candidate frames by using normalization processing to obtain coordinates of the defect candidate region.

In the preferred technical scheme of the detection method, the candidate frame reserved is a bounding box,

and performing positive and negative label distribution on the candidate frames by using normalization processing to obtain coordinates of the defect candidate region, wherein the coordinates comprise,

modeling a bounding box as a two-dimensional gaussian distribution, the function of the two-dimensional gaussian distribution being:

wherein x is a coordinate (x, y),

modeling according to the bounding box to form two-dimensional Gaussian distribution, and calculating the distance between the two distributions by using a normalization processing formula; for two Gaussian distributions N (μ) ₁ ，∑ ₁ ) And N (mu) ₂ ，∑ ₂ ) The second order normalization processing formula meets the following conditions:

wherein, the saidIs the square of the distance between bounding box a and bounding box b, said N _a And said N _b Is a gaussian distribution of two of the bounding boxes.

For two of the bounding boxes, a= (cx _a ，cy _a ，w _a ，h _a ) And b= (cx _b ，cy _b ，w _b ，h _b ) Substituting the coordinates of its center point, and the width and height into μ and Σ yields:

wherein, the saidIs the square of the distance between the two bounding box distributions, the distance measure needs to be converted into a normalized similarity measure, namely the normalized distance, and the cx _a 、cy _a Representing the x, y coordinates of the center point of the bounding box A, the cx _b ，cy _b Representing the x, y coordinates of the center point of the bounding box B:

wherein, C is a super parameter related to the bounding box data set, NWD is a normalized distance;

according to NWD (N) _a ，N _b ) And performing positive and negative label distribution, obtaining the defect area from the defect candidate area, and confirming the coordinates of the defect area.

In a preferred technical solution of the above detection method, mapping the coordinates to corresponding positions on the defect feature map, pooling the images of the corresponding positions to the same size, predicting defect types and defect positions through a classifier network, including,

the loss function of the defect category and the defect position predicted according to the classifier network is as follows:

L(p，u，t ^u ，v)＝L _cls (p，u)+λ[u＞1]L _loc (t ^u ，v)；

wherein p is probability distribution output by a classifier network model, u is a labeled real label, and t is ^u Is the position information of model prediction, v is the position information of the label, L _cls Is a classification loss, adopts a cross entropy form, and is characterized by _loc Is the position regression loss, and λ is the hyper-parameter.

In a second aspect, the present application also provides a computer readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the method of any of the above preferred embodiments.

In a third aspect, the present application also provides a detection device comprising a processor and the computer readable storage medium described above.

Under the condition of adopting the technical scheme, the detection method replaces the area cross-correlation ratio measurement of the traditional method by the normalized distance calculation, can obtain remarkable performance improvement on small target defects, and does not damage the detection effect of large-scale defects; in addition, in the detection process, the image characteristic extraction network and the region generation network are combined, so that the accurate identification of defects and the positioning of accurate positions of defects are ensured; the convolutional neural network is adopted to deeply extract image features, so that the identification capability of complex defects is enhanced; the whole coverage of the image is realized by utilizing a sliding window and anchor frame strategy, so that no missing defect is ensured; the classification loss and the bounding box regression loss are introduced, so that the balance of the model in recognition and positioning is ensured; the method comprises the steps of presetting a plurality of anchor frames, and adapting to defect detection of different sizes and shapes; and through non-maximal inhibition and normalization strategies, the detection result is optimized, and repetition and false alarm are avoided.

Further, the application adopts two-dimensional Gaussian distribution as defect modeling, thereby improving the accuracy of sample label distribution; and the coordinates are mapped onto the defect feature map and pooled, so that the uniformity of the features is ensured, and the prediction accuracy is further improved.

Drawings

Preferred embodiments of the present application are described below with reference to the accompanying drawings, in which:

FIG. 1 shows a flow chart of the main steps of the detection method of the present application;

fig. 2 shows the size and positional relationship between the anchor frames constructed in the present application.

Detailed Description

Preferred embodiments of the present application are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present application, and are not intended to limit the scope of the present application. Those skilled in the art can adapt it as desired to suit a particular application.

It should be noted that, in the description of the present application, terms such as direction or positional relationship are based on the direction or positional relationship shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the structure must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In addition, it should be noted that, in the description of the present application, unless explicitly specified and limited otherwise, the terms "connected," "connected," and "connected" are to be construed broadly, and may be, for example, either a fixed connection or a removable connection; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present application can be understood by those skilled in the art according to the specific circumstances.

Furthermore, it should be noted that, in the description of the present application, although the respective steps of the control method of the present application are described in a specific order in the present application, these orders are not limitative, but a person skilled in the art may perform the steps in a different order without departing from the basic principle of the present application.

In particular, the detection device of the present application includes a processor and the computer readable storage medium, and of course, it should be noted that the present application is not limited to the specific type of the processor, and those skilled in the art can set the detection device according to the needs, so long as the processor can be guaranteed to run the program of the present application. The storage medium has at least one instruction or at least one program stored therein, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the detection method of the present application. It should be noted that, the present application does not limit the specific types of the storage medium, and those skilled in the art can set the storage medium according to the needs, for example, the storage medium may be a solid state hard disk, or the storage medium may be a removable hard disk, so long as the storage medium can be ensured to store the program of the detection method.

Referring to fig. 1, as shown in fig. 1, the detection method of the present application includes the following steps:

s1, acquiring an original image of a chip tube shell, and constructing an image feature extraction network with a feature pyramid;

s2, acquiring a whole feature map on the original image according to the image feature extraction network;

s3, extracting the whole feature map through a region generation network to obtain a defect candidate region and coordinates;

s4, carrying out positive and negative sample label distribution on the defect candidate region through normalization processing to obtain coordinates of the defect region;

s5, mapping the coordinates to corresponding positions on the defect feature map, pooling the images of the corresponding positions into the same size, and predicting defect types and defect positions through a classifier network;

s6, sorting the predicted defect type and the predicted defect position into the final defect type and the final defect position by using a maximum suppression algorithm.

Further, in step S2, the backbone network of the image feature extraction network is a convolutional neural network; of course, the application does not limit the specific mode of the image feature extraction network, and a person skilled in the art can set other feature extraction networks according to the requirements, so long as the feature images required by detection can be extracted. Extracting features of the original image by using the convolutional neural network to obtain a feature map; performing multi-scale fusion on the feature graphs by using the feature pyramid structure to obtain the whole feature graph, wherein the image of the whole feature graph represents V epsilon R ^7×7×1024 Wherein R represents a set of real numbers, all elements representing this three-dimensional tensor or matrix are real numbers; "7 x 1024" is the shape of this three-dimensional tensor or matrix, meaning that this matrix has 7 rows, 7 columns, and a depth (third dimension) of 1024. The Convolutional Neural Network (CNN) as a backbone network for image feature extraction can more effectively capture local features and global structures of images, which is important for complex defect detection tasks.

Further, in step S3, on the whole feature map V obtained in step two, each point on the feature map is traversed by using a sliding window, and the position of the center of the window mapped back to the original image is calculated, so that the application of the sliding window and the anchor frame can cover the whole image completely, and no missing area is ensured. This means that the method can find potential defects even in complex or dense scenarios. Fig. 2 shows the size and positional relationship between the constructed anchor frames.

And then, taking the original position as a center point, and calculating and generating k anchor boxes (anchor boxes) according to the set area and the aspect ratio. The present application is not limited to specific values of the area size and the aspect ratio, and can be set by a person skilled in the art according to the requirements, and in the preferred embodiment, the areas are 128×128, 256×256 and 512×512, and the aspect ratios are 1:1, 1:2 and 2:1, respectively. The area and the length-width ratio are combined in 9 ways, and the defects of different sizes and shapes can be captured by the preset anchor frames with various areas and length-width ratios, so that the method is very useful for accurate detection in high-diversity practical production environments.

Further, the loss function of the area generation network is:

the first term in the formula is the classification loss, p _i Is the output of the region generation network classification head, and represents the probability of the corresponding ith object in the anchor frame, namely the probability of the foreground, theA true label is represented, 1 represents that an object exists in the anchor frame, and otherwise 0 represents that the object does not exist; the N is _cls Represents the number of input batch samples, L _cls Is a class loss function, measured here using a cross entropy form; the second term of the above formula is the bounding box regression loss, t _i Representing the offset of the prediction frame from the ith said anchor frame, said +.>Representing the offset of the real annotation frame from the ith anchor frame, the N _reg Representing the number of anchor boxes, the lambda being an adjustable hyper-parameter for balancing classification loss and box regression loss, the L _reg Is a target loss regression function; using two different types of loss functions (divisionClass loss and bounding box regression loss) means that the model will consider both the identification (yes or no) and the specific localization of the defect during the training process, increasing the generalization ability of the model.

Further, in step S4, the candidate frames (proposal) are obtained by adding the regression parameters to the anchor frames, and a large number of overlapped candidate frames are removed by the NMS method, and only a plurality of candidate frames are reserved for each picture. And carrying out positive and negative label distribution on the candidate frames by using normalization processing to obtain coordinates of the defect candidate region. Of course, the present application does not limit the specific number of the candidate frames, and a person skilled in the art can set the number of the candidate frames according to the needs, as long as the number of the candidate frames can be guaranteed to be calculated normally, in the preferred embodiment, the number of the candidate frames is 2000, the conventional method uses the area cross-over ratio (IoU) to measure, and when the similarity between one candidate frame and the real frame exceeds 0.7 (under IoU measurement), the positive sample is determined, and otherwise the negative sample is determined. This approach is poor for small target defect detection, so the present application uses normalized Wasserstein instead.

Further, the reserved candidate boxes are bounding boxes, the bounding boxes are modeled to be two-dimensional Gaussian distribution, and weights are sequentially decreased from the center point of the bbox to the outside. Let bbox center point coordinates be (cx, cy), width be w, height be h, probability density function of two-dimensional gaussian distribution can be expressed as:

where x is the coordinate (x, y), μ is the mean vector of the x-axis and y-axis, and Σ is the covariance matrix. The gaussian distribution is N (μ, Σ). The similarity between two bboxs can be expressed in terms of the distance between two gaussian distributions.

After modeling bbox as a two-dimensional gaussian distribution, the distance between the two distributions can be calculated using the wasperstein distance formula. For two Gaussian distributions N (μ) ₁ ，∑ ₁ ) And N (mu) ₂ ，∑ ₂ ) The second order Wasserstein distance formula is defined as:

the above formula can be simplified as:

||·|| _F is the Frobenius norm, which, for a matrix x of m rows and n columns,

for two bounding boxes, a= (cx _a ，cy _a ，w _a ，h _a ) And b= (cx _b ，cy _b ，w _b ，h _b ) Substituting the coordinates of its center point, and the width and height into μ and Σ yields:

the cx is _a 、cy _a Representing the x, y coordinates of the center point of the bounding box A, the cx _b ，cy _b Representing the x and y coordinates of the center point of the bounding box B; the N is _a And said N _b Gaussian distribution for two of the bounding boxes;is the square of the distance of the two bbox distributions, and the distance metric needs to be converted into a normalized similarity metric, namely a normalized waserstein distance (Normalized Wasserstein Distance, NWD):

c is a super parameter related to a data set, is an adjustable super parameter, is different in the data set and is also different in the data set, and an optimal value can be tested through experiments; according to NWD (N) _a ，N _b ) Performing positive and negative label assignment, obtaining the defect area from the defect candidate area, and confirming coordinates of the defect area

The above procedure may assign positive and negative labels to candidate boxes. When there is no positive sample, the candidate box of the maximum NWD needs to be recalled as a positive sample and the rest as a negative sample for the model optimization to be successful. And randomly selecting a plurality of positive samples and negative samples for training, wherein the number of the positive samples and the negative samples is 128, and when the number of the samples is less than 128, recalling the optimal samples from the negative samples for supplementing.

By using non-maximal inhibition and normalization processing, the method can remove a large number of unnecessary overlapped candidate frames, thus not only improving the calculation efficiency, but also reducing the false alarm rate, and the two-dimensional Gaussian distribution is used for modeling the bounding box to accurately describe the spatial distribution of the object, so that the label distribution of positive and negative samples is more accurate and scientific.

Further, in step S5, the loss function of the defect class and the defect location predicted according to the classifier network is:

L(p，u，t ^u ，v)＝L _cls (p，u)+λ[u＞1]L _loc (t ^u ，v)；

wherein p is probability distribution output by a classifier network model, u is a labeled real label, and t is ^u Is the position information of model prediction, v is the position information of the label, L _cls Is a classification loss, adopts a cross entropy form, and is characterized by _loc The position regression loss is realized by adopting an L1 loss form, wherein lambda is a super parameter; the coordinate mapping and pooling process ensures that the features of the different defects can be normalized to a uniform scale, such that the classifierThe network can make predictions more accurately.

Further, in step S6, an optimized target defect network model may be obtained using steps 1 to 5, given a defective chip package picture, and a candidate box for each defect class may be obtained using the target defect network model (i.e., the candidate box is identified as such) in step 4. Finally, using a maximum suppression algorithm (NMS), a final detection result can be obtained.

In summary, the detection method replaces the area cross-correlation ratio metric of the traditional method by the normalized distance calculation, can obtain remarkable performance improvement on small target defects, and does not damage the detection effect of large-scale defects; in addition, in the detection process, the image characteristic extraction network and the region generation network are combined, so that the accurate identification of defects and the positioning of accurate positions of defects are ensured; the convolutional neural network is adopted to deeply extract image features, so that the identification capability of complex defects is enhanced; the whole coverage of the image is realized by utilizing a sliding window and anchor frame strategy, so that no missing defect is ensured; the classification loss and the bounding box regression loss are introduced, so that the balance of the model in recognition and positioning is ensured; the method comprises the steps of presetting a plurality of anchor frames, and adapting to defect detection of different sizes and shapes; and through non-maximal inhibition and normalization strategies, the detection result is optimized, and repetition and false alarm are avoided.

Thus far, the technical solution of the present application has been described in connection with the alternative embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present application is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present application, and such modifications and substitutions will fall within the scope of the present application.

Claims

1. The defect detection method of the chip tube shell is characterized by comprising the following steps:

2. The method of claim 1, wherein the backbone network of the image feature extraction network is a convolutional neural network;

3. The method according to claim 1, wherein the extracting the entire feature map through the area generating network to obtain the defect candidate area and coordinates includes,

4. A method of detecting according to claim 3, wherein the area generation network has a loss function of:

the first term in the formula is the classification loss, p _i Is the output of the region generation network classification head, and represents the probability of the corresponding ith object in the anchor frame, namely the probability of the foreground, theA true label is represented, 1 represents that an object exists in the anchor frame, and otherwise 0 represents that the object does not exist; the N is _cls Represents the number of input batch samples, L _cls Is a class loss function, measured here using a cross entropy form; the second term of the above formula is the bounding box regression loss, t _i Representing the offset of the prediction frame from the ith said anchor frame, said +.>Representing the offset of the real annotation frame from the ith anchor frame, the N _reg Representing the number of anchor boxes, the lambda being an adjustable hyper-parameter for balancing classification loss and box regression loss, the L _reg Is a target loss regression function, and lambda is a super parameter.

5. The method of detecting according to claim 3, wherein the generating k anchor frames based on the set area and aspect ratio calculation with the original image position as a center point includes,

and presetting a plurality of different area sizes and aspect ratios for the anchor frame, wherein the areas are 128 multiplied by 128, 256 multiplied by 256 and 512 multiplied by 512, and the aspect ratios are 1:1, 1:2 and 2:1 respectively.

6. The inspection method according to claim 5, wherein the positive and negative sample label assignment is performed on the defect candidate region through normalization processing to obtain coordinates of the defect candidate region, comprising,

adding regression parameters to the anchor frames to obtain candidate frames, removing a large number of overlapped candidate frames by a non-maximum inhibition method, and only keeping a plurality of candidate frames for each picture;

7. The detection method according to claim 6, wherein the candidate frame is a bounding box,

wherein x is a coordinate (x, y),

wherein, the saidIs the square of the distance between the two bounding box distributions, the distance measure needs to be converted into a normalized similarity measure, namely the normalized distance, and the cx _a 、cy _a Representing the x, y coordinates of the center point of the bounding box A, the cx _b ,cy _b Representing the x, y coordinates of the center point of the bounding box B:

according to NWD (N) _a ,N _b ) Performing positive and negative label assignment to obtain the defect from the defect candidate areaAnd (3) an area and confirming coordinates of the defect area.

8. The inspection method of claim 7, wherein mapping the coordinates to corresponding locations on the defect map and pooling the images of the corresponding locations to the same size predicts defect categories and defect locations via a classifier network, comprising,

L(p,u,t ^u ,v)＝L _cls (p,u)+λ[u＞1]L _loc (t ^u ,v)；

9. A computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the method of any one of claims 1-8.

10. A detection apparatus comprising a processor and a computer readable storage medium as claimed in claim 9.