CN111951253B

CN111951253B - Method, device and readable storage medium for detecting surface defects of lithium battery

Info

Publication number: CN111951253B
Application number: CN202010827980.8A
Authority: CN
Inventors: 黄罗华; 姜涌; 杜亚玲
Original assignee: Gaoshi Technology Suzhou Co ltd
Current assignee: Gaoshi Technology Suzhou Co ltd
Priority date: 2020-05-19
Filing date: 2020-08-17
Publication date: 2021-08-20
Anticipated expiration: 2040-08-17
Also published as: CN111951253A

Abstract

The invention relates to a method, a device and a readable storage medium for detecting surface defects of a lithium battery, wherein the method comprises the following steps: taking the collected image of the surface of the lithium battery as a sample to be processed; performing one or more downsampling operations on the sample to be processed to obtain a feature map; performing up-sampling operation on the feature mapping chart, and performing channel fusion on the feature mapping chart and an image to be sampled before the down-sampling operation is performed on the feature mapping chart to obtain a fusion feature chart; acquiring target characteristic information of the sample to be processed according to at least one of the characteristic mapping chart and the fusion characteristic chart; training a network model based on the tandem grouping rolling blocks by using a sample data set to obtain a detection model; and detecting the image to be detected through the detection model so as to detect whether the image to be detected comprises defects or not. The method can effectively improve the detection rate and realize the effective detection of the defects with weak characteristic information and multiple scales.

Description

Method, device and readable storage medium for detecting surface defects of lithium battery

Technical Field

The present invention relates generally to the field of defect detection technology, and more particularly, to a method, an apparatus and a readable storage medium for detecting defects on a surface of a lithium battery.

Background

In the production and transportation process of lithium battery products, the appearance of the products becomes dirty and damaged or protective films deform due to the reasons of production line misoperation, collision among objects, poor product packaging and the like. These problems affect the appearance quality of lithium battery products, even damage to the interior of lithium battery products, and may cause safety accidents such as battery explosion. Therefore, enterprises need to strictly guarantee the product quality while improving the production efficiency of the lithium battery. The product quality detection comprises detection steps of performance, safety, service life, appearance quality and the like, wherein the test systems of the performance, the safety and the like of the lithium battery are basically automated, and most of the appearance quality detection depends on manual detection.

With the rapid development of computer technology, machine vision inspection technology is on the move and gradually develops into an important component of an industrial automatic inspection system. The machine vision detection technology takes the appearance of the industrial product into pictures through a camera instead of human eyes and transmits the pictures to an image processing system for processing and analysis so as to detect the appearance quality of the product. The traditional image processing method comprises the steps of image preprocessing, feature extraction, segmentation, matching, identification and the like, and has a good detection effect on defects of obvious feature information, large size and uniform shape. However, the defects of weak characteristic information, size, position and form change of the traditional image processing method are difficult to effectively detect, and the traditional image processing method has various steps and low detection speed.

Disclosure of Invention

In view of the above-mentioned technical problems, the present invention provides, in various aspects, a method, an apparatus, and a readable storage medium for lithium battery surface defect detection.

According to a first aspect of the present invention, there is provided a method for lithium battery surface defect detection, comprising: taking the collected image of the surface of the lithium battery as a sample to be processed; performing one or more downsampling operations on the sample to be processed to obtain one or more feature maps; performing up-sampling operation on the feature mapping chart, and performing channel fusion on the feature mapping chart and an image to be sampled before the down-sampling operation is performed on the feature mapping chart to obtain a fusion feature chart; acquiring target characteristic information of the sample to be processed according to at least one of the characteristic mapping chart and the fusion characteristic chart; training a network model based on a tandem grouping rolling block by using a sample data set comprising the sample to be processed and the target characteristic information to obtain a detection model; and detecting the image to be detected containing the lithium battery surface through the detection model so as to detect whether the image to be detected contains defects.

According to one embodiment of the invention, each downsampling operation comprises: performing convolution downsampling operation on the image to be sampled so as to extract the characteristic information of the image to be sampled and reduce the size of the image to be sampled to obtain a first characteristic mapping layer; executing a first dimension reduction operation on the first feature mapping layer to obtain a second feature mapping layer; dividing the feature mapping in the second feature mapping layer into a plurality of groups, executing continuous convolution operation on each group of feature mapping, and superposing the convolution operation results of the plurality of groups of feature mapping to obtain a third feature mapping layer; performing series fusion operation on the channels of the first feature mapping layer and the third feature mapping layer to obtain a fourth feature mapping layer; and performing a second dimension reduction operation on the fourth feature mapping layer to obtain the feature mapping map.

According to another embodiment of the present invention, performing a downsampling operation on the sample to be processed comprises: executing the downsampling operation once on the sample to be processed, wherein the image to be sampled is the sample to be processed; or executing the downsampling operation on the sample to be processed for multiple times, wherein the image to be sampled is a feature mapping image obtained by the last downsampling operation.

According to a further embodiment of the present invention, performing a downsampling operation on the sample to be processed to obtain a feature map comprises: performing a plurality of downsampling operations on the sample to be processed to obtain a plurality of feature maps; and the performing an upsampling operation on the feature map comprises: and respectively executing the up-sampling operation on the feature mapping images obtained by the last two down-sampling operations.

According to one embodiment of the invention, performing a plurality of down-sampling operations on the sample to be processed comprises: performing five downsampling operations on the sample to be processed; and the performing an upsampling operation on the feature map comprises: and respectively executing the up-sampling operation on the feature maps obtained by the fourth down-sampling operation and the fifth down-sampling operation.

According to another embodiment of the present invention, acquiring the target feature information of the sample to be processed according to at least one of the feature map and the fused feature map includes: and outputting multi-scale target feature information according to the size of the target feature in the feature mapping graph obtained by the fused feature graph and the last downsampling operation.

According to yet another embodiment of the present invention, the object characteristic information includes object position information and object class information, wherein the object class information includes at least one of defective object information and interfering object information.

According to one embodiment of the invention, the defect target information includes at least one of bubble expansion and surface wrinkling; the interference target information comprises at least one of code spraying interference and polar plate interference.

According to another embodiment of the present invention, before performing the down-sampling operation, further comprising: preprocessing the sample to be processed according to the gray information difference of the sample to be processed so as to extract the region of interest of the sample to be processed; and the one or more downsampling operations performed on the sample to be processed comprise: performing one or more of the down-sampling operations on the region of interest of the sample to be processed.

According to a further embodiment of the invention, the pre-processing of the sample to be processed comprises: performing automatic threshold segmentation on the sample to be processed to obtain a binary image of the sample to be processed, wherein the binary image comprises a foreground image and a background image; detecting an upper edge profile of the foreground image, selecting a point set of the upper edge profile to execute least square fitting, and performing angle correction on the foreground image according to a fitting result; extracting all outer edge contours of the foreground image after angle correction, and determining an outer edge contour with the maximum area and the maximum perimeter according to a point set of all the outer edge contours; and determining the position of the region of interest in the sample to be processed according to the contour point closest to the four corner points of the minimum circumscribed rectangle of the outer edge contour.

According to one embodiment of the invention, the image of the surface of the lithium battery comprises at least one of a left side smoothed image, a right side smoothed image and a front side smoothed image.

According to a second aspect of the present invention, there is provided an apparatus for lithium battery surface defect detection, comprising: at least one processor; a memory storing program instructions that, when executed by the at least one processor, cause the apparatus to perform the method according to any one of the first aspects of the invention.

According to a third aspect of the present invention, there is provided a computer readable storage medium storing a program for lithium battery surface defect detection, which when executed by a processor performs the method according to any one of the first aspects of the present invention.

Through the above description of the technical solution and the embodiments of the present invention, those skilled in the art can understand that the method for detecting defects on a lithium battery surface according to the present invention can extract target feature information of a sample to be processed by performing operations such as down-sampling, up-sampling, and fusion on the sample to be processed, and train a network model based on a tandem grouping convolution block by using a sample data set including the sample to be processed and the target feature information, so as to obtain a detection model capable of automatically detecting defects of an image to be detected. According to the method for detecting the defects based on the network model of the tandem packet rolling blocks, the detection rate can be effectively improved, and the defects with weak characteristic information, size, position and form change can be effectively detected through multiple times of processing of the characteristic information and fusion of shallow and deep characteristics, so that the accuracy of defect detection can be improved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. In the accompanying drawings, several embodiments of the present invention are illustrated by way of example and not by way of limitation, and like reference numerals designate like or corresponding parts throughout the several views, in which:

FIG. 1 is a flow chart generally illustrating a method for lithium battery surface defect detection in accordance with the present invention;

FIG. 2 is a flow diagram illustrating a downsampling operation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a shallow network and deep network convergence, according to an embodiment of the invention;

FIG. 4 is a schematic diagram illustrating two packet convolution operations according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating the implementation of the multi-scale feature fusion operation of the present invention using an FPN network architecture;

FIG. 6 is a schematic diagram illustrating the structure of a network model according to an embodiment of the invention;

FIG. 7 is a flow chart illustrating a method of pre-processing a sample to be processed according to an embodiment of the present invention;

FIG. 8 is a flow chart illustrating an visualization process for region of interest extraction;

FIG. 9 is a diagram illustrating a distribution of a plurality of category features in a dataset according to an embodiment of the present invention;

FIGS. 10a and 10b are schematic diagrams illustrating code-jet interference, plate interference, bubble expansion and skin wrinkling;

FIG. 11 is a schematic diagram showing the intersection and union regions of the detected box and the actual labeled box;

FIGS. 12 a-12 d are graphs showing a plurality of PR curves for comparative experiments according to embodiments of the present invention;

FIG. 13 is a histogram illustrating a comparison of DIRs for multiple models in the bubble category, according to an embodiment of the invention;

FIG. 14 is a histogram illustrating a DIR comparison of multiple models in wrinkle classes according to an embodiment of the present invention; and

fig. 15 is a graph showing comparison of the detection results of the detection model of the present invention and the Yolo _ TinyV3 model.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Aiming at the defects of the prior art, the invention provides a brand-new realizable solution. Particularly, the method for detecting the surface defects of the lithium battery can extract the target characteristic information of the sample to be processed by sequentially performing operations such as down-sampling operation, up-sampling operation and fusion on the sample to be processed, and train the network model based on the tandem grouping volume blocks by using the sample data set comprising the sample to be processed, the target characteristic information and the like to obtain the detection model capable of automatically detecting the defects of the image to be detected. The detection model is used for detecting the defects on the surface of the lithium battery, so that the detection rate and the detection rate of the defects with weak characteristic information can be effectively improved. As will be understood by those skilled in the art from the following description, the present invention may further include various embodiments to further enhance the detection effect of defects, such as weak feature information and non-uniform scale. For example, the embodiment of the invention can perform a plurality of processing steps including a packet convolution operation on the image to be sampled in each down-sampling operation to enhance the defect feature information so as to facilitate the extraction of the target feature information, particularly the multi-scale target feature information. The following detailed description of embodiments of the invention refers to the accompanying drawings.

FIG. 1 is a flow chart generally illustrating a method for lithium battery surface defect detection in accordance with the present invention. As shown in fig. 1, in step 102, the method 100 takes the acquired image of the surface of the lithium battery as a sample to be processed. In one embodiment, the image of the surface of the lithium battery may include at least one of a left side smoothed image, a right side smoothed image, and a front side smoothed image. The method 100 may respectively process the collected left side polished image, right side polished image and front side polished image as a plurality of samples to be processed, may select one of the images with a better imaging effect as a sample to be processed for processing, and may further process the collected left side polished image, right side polished image and front side polished image into one image as a sample to be processed for processing.

In another embodiment, when the acquired image is limited, the image sample may be expanded in order to increase the number of samples to be processed, for example, online data expansion may be implemented by horizontally flipping, randomly cropping, or adjusting brightness of the acquired image. Compared with the method of using offline data enhancement to call a series of script codes, the method of online data extension can directly call functions of related data enhancement in the training process to realize data extension, and is simple and convenient to operate.

Next, in step 104, the method 100 performs one or more downsampling operations on the sample to be processed to extract feature information of the sample to be processed and obtain one or more feature maps, wherein at least one feature map can be obtained in each downsampling operation.

The flow then advances to step 106 where the method 100 may perform an upsampling operation on the feature map and perform channel fusion with the image to be sampled before the downsampling operation is performed on the feature map to obtain a fused feature map. The method 100 may perform an upsampling operation on one or more feature maps. For example, in one embodiment, the method 100 may perform an upsampling operation on the feature map obtained for each downsampling operation. In another embodiment, the method 100 may selectively perform an upsampling operation on a portion of the plurality of feature maps obtained, for example, by a plurality of downsampling operations, as desired, the selected basis may include information such as the size, number, etc. of features of the feature map maps.

In yet another embodiment, in step 104, the method 100 may perform a plurality of downsampling operations on the sample to be processed to obtain a plurality of feature maps; and in step 106, the method 100 may perform the upsampling operation on the feature maps obtained by the last two downsampling operations, respectively. Because the deeper the network is, the richer the semantic information is, in some application scenarios, the more abundant semantic information (including feature information and the like) can be obtained by performing the upsampling operation on the feature map obtained by the last two downsampling operations, and the application requirements are met, so that the detection effect can be ensured while the number of operations can be reduced in such a manner, the detection speed and the detection efficiency can be effectively improved, and the calculation amount of the network model and the equipment loss can be reduced.

According to an embodiment of the present invention, in step 104, the method 100 performs the down-sampling operation once on the sample to be processed, and then the image to be sampled in step 106 is the sample to be processed. According to the present embodiment, at step 106, the method 100 performs an upsampling operation on the feature map and performs channel fusion on the result obtained by the upsampling operation and the sample to be processed. According to another embodiment of the present invention, in step 104, the method 100 performs the down-sampling operation on the sample to be processed multiple times, and the image to be sampled in step 106 is a feature map obtained in the last down-sampling operation.

As further shown in fig. 1, in step 108, the method 100 may obtain target feature information of the sample to be processed according to at least one of the feature map and the fused feature map. For example, in one embodiment, since the fusion of the deep-level features and the shallow-level features is beneficial for enhancing the feature information, the method 100 may obtain the target feature information of the sample to be processed according to the feature information in the fused feature map. In another embodiment, the feature information in the feature maps may be sufficiently distinct that the method 100 may obtain the target feature information directly from the feature information in one or more feature maps. In yet another embodiment, the method 100 may obtain the target feature information of the sample to be processed according to the fused feature map and the feature map obtained by the last downsampling operation, and may output the multi-scale target feature information according to the target feature size and other information mapped to the original image (i.e., the sample to be processed).

The target feature information described above may include at least one of target position information, target category information, target size information, and the like, wherein the target position information may include position information and the like of the target feature in the sample to be processed, the target category information may include at least one of defect target information, interference target information, and the like, and the target size information may include information such as a size or a size range of the target feature.

According to an embodiment of the present invention, the defect target information may include at least one of bubble expansion, surface wrinkle, and the like; the interference target information may include at least one of code-spraying interference, plate interference and the like. The defect target information is the target information of unqualified or non-qualified lithium battery surface. The expansion of bubbles and surface wrinkles are usually caused by unqualified packaging of a plastic film on the surface of the battery, the two defects have different shapes and sizes, the shapes have stereoscopic impression, and the defects have the characteristics of uneven size, variable positions and shapes and the like, so that the accuracy of a detection result is greatly challenged. The interference target information is target information which normally exists on the surface of the lithium battery and is easy to cause interference on defect detection. The code spraying interference and the polar plate interference are areas generated by a normal physical process, and do not belong to defects, but when the defects are detected by using machine vision, the edge of the polar plate or code spraying characters are easily judged to be the defects, so that misjudgment is caused. The method for detecting the surface defects of the lithium battery can detect and judge information such as specific positions of the interference targets, and therefore the information of the interference targets can be filtered when defect detection results are output.

Then, the flow proceeds to step 110, where the method 100 may train a network model based on concatenated packet volume blocks by using a sample data set including the sample to be processed and the target feature information, resulting in a detection model. In one embodiment, the sample data set may include a sample to be processed, target feature information with a tag (e.g., a label box), and the like. In another embodiment, the sample data set may further include a feature map, a fused feature map, or the like. In yet another embodiment, the method 100 training the network model may further comprise: and adjusting the network model (such as parameters of the network model such as the down-sampling and/or up-sampling times, the fusion times, the weight and the like) according to the difference between the sample data set and the target characteristics marked by the real marking box so as to continuously optimize the precision and the accuracy of the network model and enable the network model to be better adapted to the type, the scale and the like of the defect to be detected. The method 100 trains a network model based on the tandem packet rolling blocks by using a sample data set to obtain a trained network model, and the trained network model is called a detection model.

After obtaining the inspection model, in step 112, the method 100 may inspect the image to be inspected including the surface of the lithium battery through the inspection model to detect whether the image to be inspected includes a defect. In one embodiment, the method 100 may inspect the image to be inspected, including the surface of the lithium battery, through an inspection model and output labeled defect target information.

While the method for detecting defects on a surface of a lithium battery according to the present invention is generally described above with reference to fig. 1, it will be understood by those skilled in the art that the above description is exemplary and not limiting, for example, in step 110, the method 100 training the network model may further include optimizing parameters of the network model by comparing the training result with actual defect information, so as to continuously improve the accuracy of the network model to ensure the detection accuracy and reliability of the detection model. The downsampling operation in step 104 may include a variety of operations, and one embodiment of the downsampling operation according to the present invention is described below with reference to fig. 2.

Fig. 2 is a flow diagram illustrating a downsampling operation according to an embodiment of the present invention. Those skilled in the art will appreciate from the following description that the method flow shown in fig. 2 is an embodiment of step 104 in fig. 1, and thus the description above in connection with fig. 1 also applies to fig. 2. As shown in fig. 2, the step 104 of each down-sampling operation performed on the sample to be processed may comprise:

in step 1041, the method of the present invention may perform a convolution downsampling operation on the image to be sampled to extract the feature information of the image to be sampled and reduce the size of the image to be sampled, so as to obtain a first feature mapping layer. Compared with the traditional pooling downsampling operation, the convolution downsampling operation can better retain the characteristic information and prevent useful characteristic information from being filtered, so that the detection rate of the target characteristic information can be effectively ensured.

Next, in step 1042, the method of the present invention may perform a first dimension reduction operation on the first feature mapping layer to obtain a second feature mapping layer. The first dimension reduction operation can reduce the dimension of the channel number of the first feature mapping layer, so as to facilitate the subsequent operations such as grouping convolution and series fusion.

The process proceeds to step 1043, where the method of the present invention may divide the feature maps in the second feature mapping layer into a plurality of groups, perform continuous convolution operations on each group of feature maps, and superimpose convolution operation results of the plurality of groups of feature maps to obtain a third feature mapping layer. Compared with the traditional standard convolution operation, the packet convolution operation adopted by the method can effectively reduce the parameter quantity and the calculated quantity, and has the function of lightening the network model. In particular, using a packet convolution (e.g., with a number of packets of G), the amount of parameters can be reduced to that of a standard convolution

This principle will be explained in detail below.

Assuming that the size of the input feature map (e.g., the second feature mapping layer) is C × H × W and the number of channels of the output feature map (e.g., the third feature mapping layer) is N, assuming that the packet convolution is divided into G groups, the number of channels of the input feature map per convolution layer is C × H × W

The number of channels of each group of output characteristic graphs is

Each convolution kernel having a size of

The total number of convolution kernels is N, and the number of convolution kernels in each group is

Each convolution kernel only convolutes the input feature map of the same group, and the total parameter number of the convolution kernels is

The total ginseng is reduced to the original

The computation of the network model is usually expressed by floating point operands (FLOPs), which can be used to measure the complexity of the algorithm/model, and the floating point operands and parameters have the following relations:

FLOPs＝Param×H_out×W_out

the complexity of standard convolutions and block convolutions are measured by calculating the floating point operands FLOPs below. First, without considering the bias term bias, the floating-point operand calculation for a standard convolutional layer convolution operation is as follows:

FLOPs＝2×C_in×K_h×K_w×H_out×W_out×C_out

wherein, C_inAnd C_outNumber of channels, K, representing input and output profiles, respectively_hAnd K_wRespectively representing the height and width dimensions, H, of the convolution kernel_outAnd W_outRespectively, the height and width dimensions of the output characteristic diagram, and 2 indicates that the times of the multiplication and addition operations are calculated for 2 times.

With reference to the above description of a standard convolution with a packet convolution (number of packets G) parameter

The floating point operand calculation for a block convolution operation is thus as follows:

the complexity contrast of the standard convolution and the block convolution is shown in the following table, where the input is the size and the number of channels of the image to be sampled, and the units of FLOPs are billions of floating point operations (BF):

table one:

the comparison of the third layer of convolution layers can observe that floating point operands FLOPs of the grouping convolution are only 1/8 of the standard convolution, and it can be seen visually that parameters and calculated amount can be effectively reduced by using the grouping convolution instead of the standard convolution, and the effect of lightening a network model can be achieved.

As further shown in fig. 2, after the operation of grouping convolution is completed, the process may proceed to step 1044, and a serial fusion operation may be performed on the channels of the first feature mapping layer and the third feature mapping layer to obtain a fourth feature mapping layer. The first feature mapping layer and the third feature mapping layer, even more layers of feature mapping maps, can be subjected to channel fusion in a serial fusion mode, and feature information fusion between different network layers is facilitated, so that the result of convolution in each step is not only dependent on the output feature mapping result of the previous layer, but also can effectively establish the relation between a shallow network and a deep network, and the problem of gradient dispersion can be relieved to a certain extent. As shown in fig. 3, the feature information of a shallow network (e.g., a first feature mapping layer) and a deep network (e.g., a third feature mapping layer) can be fused while implementing downsampling by serial packet convolution, and feature information of surface defects of a lithium battery can be effectively enhanced by combining feature expressions of different network layers.

Returning to fig. 2, in step 1045, a second dimension reduction operation may be performed on the fourth feature mapping layer to obtain the feature map. The second dimension reduction operation may reduce the number of channels of the feature map to be equal to the number of channels of the first feature map layer.

The steps performed in each downsampling operation according to the present invention are exemplarily described above with reference to fig. 2, and for facilitating understanding of the actual operation process, the following exemplary description is made with reference to a specific embodiment shown in fig. 4.

Fig. 4 is an embodiment of the steps shown in fig. 2, and thus the description in fig. 2 may also be applied to the description of the embodiment shown in fig. 4. Two packet convolution approaches are shown schematically in fig. 4, which may be substantially equivalent, wherein the approach of fig. 4 is more intuitive for illustrating the concept and operation of packets.

As shown in fig. 4, x_inRepresenting the input image to be sampled, d-128 representing the number of channels of the input image to be sampled is 128, x_outA third feature map layer indicating output, y indicates the feature map of output, and d-256 indicates that the number of channels of the feature map of output is 256. 128,3x3/2,256 in the box in the first row of the diagram represents a dimension of 128 using 256 convolution kernels of size 3x3 and step size 2; in the second row of boxes in mode one 256,1x1,4 represents the use of 4 convolution kernels of size 1x1, with dimension 256. Similarly, the numbers marked in other blocks indicate similar meanings, but the numerical values are different in size, and are not described in detail here. The specific operation process will be described below with reference to fig. 4.

Firstly, an image x to be sampled is input_inAnd a first feature mapping layer (which can be expressed as x) can be obtained after convolution downsampling operation_in/2) The number of channels is 256. x is the number of_in/2The calculation formula of (c) may be:

x_in/2＝C_s(x_in)，s＝2

wherein, C_s(x_in) Represents the input x_inA convolution operation is performed with s-2 indicating a convolution step of 2.

Then, in the first mode, the convolution operations may be performed by dividing the convolution operations into G groups (in this embodiment, G ═ 32), where each group sequentially uses, from top to bottom, 4 convolution kernels with a size of 1 × 1, for example, and 8 convolution kernels with a size of 3 × 3, for example, and the obtained result may be represented by τ_i(x_in/2) Represents; in the second method, the dimension reduction operation may be performed on the first feature mapping layer, and then the first feature mapping layer is divided into G (in this embodiment, G ═ 32) groups, and the grouping convolution operation is performed, and the obtained result may be represented by τ_i(x_in/2) And (4) showing. Then 32 groups τ can be combined_i(x_in/2) Performing an overlay operation using, for example, an identical shortcut link, resulting in a third eigen-map layer x that is consistent with the eigen-map size and channel of the first eigen-map layer_out。x_outThe calculation formula of (c) may be:

wherein x is_outA third feature mapping layer representing the output, G32, τ_i(x_in/2) Representing the result of the packet convolution operation.

Finally, x is fused in a series connection mode_in/2And x_outThe channels are combined in series to complete a series packet rolling block. And performing a second dimensionality reduction operation by using a 1x1 convolution kernel on the fourth feature mapping layer obtained by the serial operation to obtain an output feature mapping graph (which can be represented by y), wherein the number of channels can be 256. The formula for y can be as follows:

y＝H([x_in/2，x_out])

wherein, H ([ x ]_in/2,x_out]) Representing feature layer x_in/2And x_outThe channels of (a) are operated in series.

Through the above description, those skilled in the art can understand that each downsampling operation of the present invention can implement feature extraction on a sample to be processed through multiple convolution operations and a series packet convolution operation, and can enhance feature information through fusion of a shallow network and a deep network, which is beneficial to improving the detection rate of defects in subsequent steps. Furthermore, the tandem packet convolution operation of the invention can effectively reduce the parameters and the calculation amount of the network model, thereby lightening the network model and improving the calculation speed. The up-sampling operation and the method of acquiring target feature information by a combination of the up-sampling operation and the down-sampling operation according to the present invention will be described in detail below with reference to specific embodiments.

According to an embodiment of the present invention, in step 104 shown in fig. 1, performing a plurality of down-sampling operations on the sample to be processed may include: performing five downsampling operations on the sample to be processed; and in step 106, the performing an upsampling operation on the feature map may include: and respectively executing the up-sampling operation on the feature maps obtained by the fourth down-sampling operation and the fifth down-sampling operation. In one embodiment, the fourth downsampled result map has a size 1/2 that is the size of the most original input picture (e.g., the sample to be processed)⁴Size, fifth downsampled to 1/2⁵The size of the defect is larger than that of the defect in the original image, and the output result obtained by the fifth downsampling is very small, wherein a small defect is equivalent to a defect with a large size detected after the defect is actually mapped back to the original image; similarly, the output result obtained by the fourth downsampling is also smaller, but larger than the fifth downsampling, and the size of the defect detected after the original image is actually mapped back is smaller than the size of the defect detected in the fifth downsampling. According to the principle, in some application scenarios, the up-sampling operation and the channel fusion operation are respectively performed on the feature maps obtained by the fourth down-sampling operation and the fifth down-sampling operation, so that the requirements for detecting defects with different sizes can be met.

According to another embodiment of the present invention, in step 108 shown in fig. 1, acquiring the target feature information of the sample to be processed according to at least one of the feature map and the fused feature map may include: and outputting multi-scale target feature information according to the size of the target feature in the feature mapping graph obtained by the fused feature graph and the last downsampling operation. In practice, it may be implemented using, for example, a feature pyramid network structure (FPN). For ease of understanding, the following exemplary operations are described in conjunction with the visualization process illustrated in fig. 5.

FIG. 5 is a schematic diagram illustrating the implementation of the multi-scale feature fusion operation of the present invention using an FPN network architecture. As shown in fig. 5, the FPN structure of the present invention may consist of two paths, the bottom-up path corresponding to the feature maps output by the third, fourth and fifth downsampling operations in the downsampling network, respectively; the top-down path corresponds to two upsampling operations in the upsampling network, two fused feature maps and three output prediction layers. Because the shallow feature semantic information is less, but the position of the target is accurate, and the deep feature semantic information is richer, but the position of the target is relatively rough, the FPN structure is favorable for fusing feature information of different scales in the shallow feature map and the deep feature map, the relevance among all dimension features is effectively expressed, and meanwhile, the operation of predicting the fusion feature layers of multiple scales is favorable for improving the small target detection effect.

Specifically, as shown in fig. 5, the feature maps obtained by the fourth and fifth downsampling operations may be respectively upsampled and fused with the feature maps obtained by the third and fourth downsampling operations, resulting in two fused feature maps, i.e., fused 1 and fused 2, as shown in the figure. Further, the method according to the present invention may output a plurality of prediction layers (or output feature maps) including multi-scale target feature information according to the sizes of the target features in the feature maps obtained by fusing 1 and 2 two fused feature maps and the fifth downsampling operation.

In an embodiment, the output prediction layer may respectively adopt feature maps of 32-fold, 16-fold and 8-fold down-sampling operations, which respectively correspond to 3 output feature maps with different scales in the FPN structure, and simultaneously adopt a Logistic classifier for multi-label classification, which may perform multi-label prediction to solve the problem of overlapping of detected objects.

Further, the method according to the present invention may also use an anchor box (anchor box) to represent the distribution of sizes of all defects in the sample data set, and divide the sample data set into, for example, 3 groups according to the sizes to constrain the predicted target ranges in the output feature maps of 3 different scales. The feature map of each scale is rasterized and divided into small units (cells), each cell predicts 3 bounding boxes (bounding boxes), and each bounding box adjusts according to the corresponding 3 anchor boxes and predicts 3 results: position information for each box (including center coordinates and width and height), target confidence, and 4 categories of predicted values. The receptive fields of the 3 feature maps with different scales are different, wherein the receptive field of the prediction layer sampled by 32 times is the largest, and one grid in the feature map can correspond to a larger target object input into an original image and a group with a larger scale in an anchor box. Similarly, the target detected by each grid in the prediction layer of the 8-time down-sampling is smaller, so that the method is suitable for detecting small targets and corresponds to a group of anchors box with smaller mesoscale. And finally, superposing the prediction results of the three output prediction layers, and executing non-maximum suppression operation to obtain a final prediction result.

Fig. 6 is a schematic structural diagram showing a network model according to an embodiment of the present invention. As shown in fig. 6, in one embodiment, the width and height of an input picture (e.g., a sample to be processed) is 1024 × 832, and after feature information extraction by a serial packet convolutional block network (five downsampling operations) and feature fusion by an FPN multi-scale network (two upsampling operations and two channel fusion operations), 3 prediction layers can be obtained, where the number of channels of a convolution kernel in the serial packet convolutional block network can be, for example, 32, and the number of channels of five feature maps obtained by five downsampling operations can be, respectively, 64, 128, 256, 512, 1024 in the diagram. The width and height of the prediction layer 1 (shown as a cube 1 in the drawing) are 32 × 26, and one grid can predict characteristic objects with a large original image scale, such as long and curved wrinkles and large bubbles. The width and height of the prediction layer 2 (shown as a cube 2 in the figure) is 64 × 52, and one grid can predict characteristic objects with moderate original image dimensions, such as short striped wrinkles. The width and height of the prediction layer 3 (shown as a cube 3 in the drawing) are 128 × 104, and one grid can predict feature objects with small original image scale, such as wrinkles and dot-like bubbles with shallow feature information. Mapping the target feature information of the three prediction layers onto the input picture may result in an output result picture comprising, for example, a multi-scale defect labeling box.

While the method according to the present invention has been described above with reference to various embodiments, it will be understood by those skilled in the art that the above description is exemplary and not limiting, and for example, the upsampling operation may not be limited to the feature maps obtained for the fourth downsampling operation and the fifth downsampling operation, but may also be performed for the third downsampling operation or shallower or more downsampling operations as needed. According to the technical scheme of the invention, the method may further include preprocessing the sample to be processed to filter the irrelevant information in the sample to be processed so as to simplify the image data, which will be exemplarily described below with reference to fig. 7 and 8.

According to an embodiment of the present invention, before performing the down-sampling operation (e.g., between step 102 and step 104 of the method 100 shown in fig. 1), may further include: preprocessing the sample to be processed according to the gray information difference of the sample to be processed so as to extract the region of interest of the sample to be processed; and said performing one or more downsampling operations on the sample to be processed (i.e. in step 104 shown in fig. 1) comprises: performing one or more of the down-sampling operations on the region of interest of the sample to be processed. The interested region is a part which contains a foreground image of a lithium battery surface image of a sample to be processed and does not contain a background image, the interested region is directly detected, and the detected image data can be simplified, so that the detection efficiency is improved, and the detection range can be reduced, so that the reliability and the accuracy of feature extraction, defect identification and the like in the subsequent detection process are improved.

Fig. 7 is a flow chart illustrating a method of pre-processing a sample to be processed according to an embodiment of the present invention. As shown in fig. 7, in step 202, the method 200 may perform automatic threshold segmentation on a sample to be processed according to a characteristic that a lithium battery body and a background in the sample to be processed occupy different gray scale ranges, so as to obtain a binarized image of the sample to be processed, where the binarized image may include a foreground image and a background image. The foreground image includes a surface image of the lithium battery cell, and the background image includes information unrelated to the lithium battery cell. To facilitate an understanding of the specific operation of the automatic threshold segmentation, the principles thereof will be described below.

Assuming that the gray level of the sample to be processed before the division is K (K is more than or equal to 0 and less than or equal to 255) and the total number of pixels is N, wherein the number of the pixels with the gray level of i is N_iThe following relationship is satisfied:

wherein, P_iThe number of pixels representing the gray level i accounts for a proportion of the total pixels. If the initial gray level threshold is set to T, the image is divided into two parts, and a pixel having a gray level equal to or higher than T is determined as the target area O, whereas a pixel having a gray level equal to or higher than T is determined as the background B. The following relationship can be found by statistics:

wherein, ω is_BAnd ω_OProbability of occurrence, μ, of background and target region, respectively_BAnd mu_ORespectively represent the mean of the probability of the respective region,

respectively representing the probability variance of the respective region,

representing the inter-class variance. Traversing all gray levels T to make inter-class variance using a simple sequential search method

The optimal gray threshold can be obtained when the maximum value is reached, and the expression is as follows：

Wherein T represents an optimal gray threshold value,

to represent

The value of K at maximum (i.e., the optimal gray level threshold).

By the automatic threshold segmentation operation, the approximate region of the lithium battery body (namely, the foreground image) can be obtained by rapidly segmenting the image. Compared with a fixed threshold segmentation method which is difficult to adapt to the influences of illumination transformation and the like of the same scene object, the automatic threshold segmentation method provided by the invention does not need to manually set threshold parameters, is not limited by factors of illumination transformation and the like of the same scene object, can effectively separate a target area from a background area, and has the characteristics of simplicity in operation, high processing speed and the like.

Next, in step 204, the method 200 may detect an upper edge contour of the foreground image, select a point set of the upper edge contour to perform least square fitting, and perform angle correction on the foreground image according to a fitting result. The expression of the straight line representing the upper edge profile can be obtained by performing least square fitting on the point set of the upper edge profile, the inclination angle of the straight line (namely the rotation angle of the lithium battery body) can be obtained according to the expression, and then the lithium battery body can be corrected to the horizontally placed position by adopting affine transformation. The specific calculation formula is as follows:

the sample linear regression model is as follows:

Y_i＝β₁+β₂X_i+e_i

wherein (X)_i,Y_i) Set of points, β, representing the middle region of the upper edge profile₁Is intercept, beta₂Is the slope, e_iIs a random error. Converting a linear regression model to a residualModel, sum of squares of residuals:

e_i=Y_i-β₁-β₂-β₂X_i

where Q represents the sum of the squared residuals. Herein, the method for determining beta by least squares₁And beta₂And (4) solving the variable which meets the residual error flat method and reaches the minimum value to obtain a straight line L capable of expressing the fitting of the upper edge contour. Considering Q as a function, solving Q for two parameters beta to be determined₁And beta₂The first order partial derivative of (a) is converted into an extremum solving problem, and the formula is as follows:

solving according to the first-order partial derivative formula:

therefore, the expression parameter of the fitting straight line L of the upper edge profile is determined, the included angle theta between the straight line L and the x axis can be solved, and according to the formula of the included angle between the two straight lines:

suppose k₂Is the slope of the line L, k₁Is the slope of the x-axis, then k₁When 0, the included angle between the straight line L and the x-axis is:

according to the formula, the inclination angle of the upper edge straight line of the lithium battery body can be obtained, and then the foreground image is corrected to the horizontal placing position.

As further shown in fig. 7, in step 206, the method 200 may extract all outer edge contours of the angle-rectified foreground image and determine an outer edge contour having a maximum area and a maximum circumference from a set of points of the all outer edge contours. Specifically, the method 200 may calculate the area and perimeter of each extracted contour from the set of points for all outer edge contours and determine the outer edge contour with the largest area and largest perimeter.

Further, the flow proceeds to step 208, and the method 200 may determine the position of the region of interest in the sample to be processed according to the contour points closest to the four corner points of the minimum bounding rectangle of the outer edge contour. Specifically, the method 200 may obtain a minimum circumscribed rectangle of the outer edge contour, then traverse the contour point set to obtain contour points nearest to four corner points of the minimum circumscribed rectangle, respectively, to obtain four corner points of the lithium battery, and finally may position the lithium battery to an accurate position according to the four corner points, that is, a position of the region of interest in the sample to be processed.

In order to facilitate understanding of the above method for preprocessing a sample to be processed, the practical operation and effect of the method shown in fig. 7 will be described in detail with reference to fig. 8. The process shown in FIG. 8 is a visual representation of the method described in FIG. 7, and thus the description above with respect to the method 200 of FIG. 7 may also be applied to the description below in connection with the process shown in FIG. 8.

Fig. 8 is an visualization process flow diagram illustrating region of interest extraction. As shown in fig. 8, the sample to be processed may include a light background image and a dark foreground image, and the foreground image may include an image of the surface of the lithium battery cell, and the foreground image is significantly tilted. Next, the sample to be processed may go through step 202 by, for example, the method 200 shown in fig. 7, and a binarized image may be obtained. Further, step 204 may be performed on the binarized image, an expression of the upper edge contour line is obtained by performing least square fitting, and an inclination angle of the line may be obtained according to the expression, so as to correct the foreground image to restore the horizontally placed position, such as the corrected image shown in the figure. Finally, step 206 and step 208 may be performed on the rectified image, the position of the region of interest may be determined by the four corner point positioning method, and a rectangular region of interest may be extracted according to the positioning result.

The steps of preprocessing the sample to be processed according to the present invention are exemplarily described above with reference to fig. 7 and 8, the operations of automatic threshold segmentation, four corner point positioning, and the like according to the present invention are simple to implement, and the region of interest of the sample to be processed or the image to be detected can be effectively extracted, and particularly for a sample with a relatively large background image region, only a useful information region is retained, which can greatly simplify image data.

In order to better embody the advancement of the method for detecting surface defects of a lithium battery according to the present invention, the training of the network model based on the tandem packet rolling blocks and the detection effect of the detection model according to the present invention will be described below by combining various evaluation indexes and a plurality of comparative tests.

In one embodiment, 294 sets of raw data set images are collected by the industrial site, each set of images including a side-finished image, a right-side finished image and a front-side finished image, wherein the types of defect and interference regions mainly include four categories of code-spraying interference, plate interference, bubble expansion and skin wrinkling. Because the size of the collected original image is too large, the invalid background area occupies 30% -50% of the area of the original image, and the problem of insufficient display memory of a Graphics Processing Unit (GPU) is caused when the original image is directly trained, in this embodiment, the preprocessing method shown in fig. 7 is adopted to extract the region of interest of the lithium battery, and 294 groups of data sets are divided into a training set and a testing set according to the ratio of 8:2, wherein the training set includes 235 samples, the testing set includes 59 samples, and the distribution of specific defect and interference area types in the data sets can refer to fig. 9. In this embodiment, 235 sample pictures of the training set are trained on a network model based on tandem grouping volume blocks according to the method of the present invention, and 59 images to be detected in the training set are detected by using the obtained detection model, and the detection result is evaluated.

In addition, fig. 10a and 10b show schematic diagrams of code-spray interference, plate interference, bubble expansion, and skin wrinkling for ease of understanding and distinguishing between defects and interference regions. In fig. 10a, which shows an image of the surface of a lithium battery cell including a plate and bubble expansions of different dimensions, it can be seen that the plate is not a defect, but is easily misinterpreted as a defect because it can be imaged with its edges protruding. Further, the multiple bubble expansions shown in fig. 10a are of different scales and the differences are large, so small bubble expansions are easily missed. Fig. 10b shows an image of the surface of the lithium battery including the code-sprayed pattern and the wrinkles of different sizes, and it can be seen that the lithium battery is easily mistakenly determined as a defect because the color of the code-sprayed pattern is different from the color of the surface of the lithium battery, so that the lithium battery can be imaged and displayed. The code shown in fig. 10b is partially obscured by the relation to commercial information. Further, the plurality of skin folds shown in fig. 10b are different in scale and have a larger difference, so that smaller skin folds are easily missed.

Next, the evaluation of the detection model according to the present invention will be described. The detection task of the detection model is to locate four types of target feature positions in the input image to be detected and classify the target features. The commonly used positioning method is to detect a circumscribed rectangle frame (or called a prediction frame) of the target feature, if the score of the prediction frame exceeds a set score threshold, the frame is determined to be a detection frame, the prediction frame is retained in the prediction result and displayed in the image, otherwise, the frame is discarded and not displayed. Next, the intersection ratio (IoU) between the detected frame and the real label information (ground route) frame is calculated to determine the positioning effect of the detected frame, and the classification result is combined to determine whether the detected frame correctly predicts the position and the type of the target feature. IoU, calculating the ratio between the intersection and union of the detected box and the real label information box, the calculation method is as follows:

where a is a rectangular region of the detection frame, B is a rectangular region of the real labeling frame, and the intersection and union of a and B can refer to fig. 11. As shown in fig. 11(a), the intersection of a and B includes the overlapping portion of a and B. As shown in fig. 11(B), the union of a and B includes a set in which a and B are merged together.

In this embodiment, IoU can be used as an evaluation index for whether the detection frame detects the target feature, and the commonly used threshold IoU is 0.5, if the following decision formula is given:

the final evaluation detection effect can be combined with the classification accuracy by the IoU index, and the following judgment results can be obtained:

according to the IoU index and classification accuracy combined detection frame judgment result, if all the real labeling frames have frames which are not positively detected, judging that the target characteristics labeled by the real labeling frames which are not positively detected are missed to be detected; and if the checked frame exists in all the real marking frames, judging that the target characteristic marked by the checked real marking frame is a recall.

In order to objectively evaluate the detection result of the detection model, two evaluation criteria may be used, including a defect detection rate dir (defect detection rate) and an Average accuracy value mapp (mean Average precision). The defect detection rate DIR can be an evaluation index provided for actual industrial detection requirements, and is a method capable of quantitatively evaluating the detection performance of the model. The defect detection rate DIR includes five indexes, respectively, false alarm rate FA_RPositive detection rate TD_RError detection rate FD_RRC recall ratio (or called recall ratio)_RAnd rate of missed detection MD_R. These indices are defined as follows:

FD_R＝1-FA_R-TD_R

MD_R＝1-RC_R

wherein the content of the first and second substances,

IoU in all check boxes in the presentation category i<The number of boxes of 0.5, i.e. the number of false alarm boxes.

Indicating the number of correctly classified boxes in all the check boxes in the category i, wherein IoU is greater than or equal to 0.5, namely the number of positive check boxes.

And the number of the checked boxes in all the real labeled boxes in the category i, namely the number of the recalled boxes is represented.

Indicating the number of all check boxes in category i.

Indicating the number of all real label boxes in category i.

The average accuracy average mAP described above is the most commonly used evaluation method in target detection, and in the following accuracy detection, the average accuracy AP (average precision) of each category is calculated respectively, and then the average accuracy average mAP of all categories is calculated. The average Precision AP may describe the relationship between Precision (Precision) and Recall (Recall) curves (PR curves for short) as follows:

wherein TP (true Positive) represents the number of frames which are classified correctly and IoU is more than or equal to 0.5 in all check frames, namely the number of positive check frames

FP (false Positive) indicates classification error in all check boxes or IoU<0.5 boxes, i.e. false alarm boxes

And the number of false positive frames. FN (false negative) indicates the number of frames which are not detected in the real labeling frames, i.e. the number of missed frames. AP represents the mean of the highest Precision at different Recall rates recalls, where the standard behind the deep learning image recognition tournament VOC2010 is used: for each different Recall value (including 0 and 1), the Precision maximum value when the Recall value is greater than or equal to the Recall value is selected, and then the area under the PR curve is calculated as the AP value.

The detection model needs to consider two aspects in lithium battery surface defect detection application: efficiency problems and detection effects. The efficiency problem mainly is the storage problem of the model and the speed problem of the model prediction, and the detection effect can be measured according to two evaluation criteria of the defect detection rate dir (defect detection rate) and the mean Average precision mAP (mean Average precision). In the detection embodiment described below, the family models of Faster R-CNN, SSD and Yolo commonly used in the target detection field are selected for comparison with the detection model of the invention. In practical industrial application, the detection model needs to take efficiency problems and detection effects into consideration. Therefore, from the practical application point of view, the experimental results of a plurality of models including the detection model of the present invention are compared and analyzed with respect to the efficiency problem and the detection effect.

Firstly, the efficiency problem of the detection model is compared and analyzed. The file size of a model is completely determined by the number of parameters of the model and the storage form of the parameters, the following parameters of the model are stored in the form that 1 parameter variable occupies 2 bytes, and the file size of the model is approximately equal to: the total number of the parameters is multiplied by 2. The following table two shows the parameter calculation results of the detection model according to the present invention.

Table two:

the first partial network structure in table two may be the network structure described previously for performing downsampling operations, such as the concatenated packet convolutional block network shown in fig. 6; the second partial network structure may be a network structure for performing operations such as upsampling and channel fusion as described in the foregoing, for example, the FPN multi-scale network shown in fig. 6. According to the table above, the size of the detection model file of the invention can be calculated as follows:

45473629×2＝90947258b＝86.73MB

therefore, the storage problem of the model can be intuitively reflected in the size of the storage space occupied by the model file, and the larger the model file is, the higher the requirement on the storage space of the equipment is. The third table below shows the comparison between the size of the storage space occupied by the Faster-Rcnn, SSD, and Yolo series of network models and the detection model of the embodiment of the present invention and the detection time.

Table three:

as shown in table three, the file size of the detection model of the present invention is 86.88MB, which is significantly smaller than that of other comparison models, thereby indicating that in terms of storage problems, the detection model of the present invention can effectively reduce network parameters and reduce the storage space occupied by the model file, which can also be reflected in table one. The smaller the storage space of the device occupied by the detection model is, the more advantageous the detection model is in industrial application, such as supporting mobile terminal and multi-model device deployment. In terms of detection time (i.e., detection speed), the detection time of the detection model of the present invention can be within 38ms, which is significantly lower than that of other comparison models. This may make the detection faster, since the smaller and lighter the network structure of the model, the smaller the time required for detection. Further, real-time performance of detection is often a concern in industrial detection applications. In one embodiment, the size of the input image to be detected by the detection model of the invention is 832 × 1024, the detection speed is about 26fps (frame number, image number detected in 1 second), and the detection real-time requirement of the industrial field can be completely met.

The above experiment comparison and analysis are performed for the efficiency problem of the detection model, and those skilled in the art can understand that the detection model obtained by the method of the present invention has the characteristics of small occupied storage space, high detection speed, capability of meeting the detection real-time requirement, etc. The test effect of the test model is compared and analyzed in the following experiment. In order to verify the detection effect of the detection model of the invention, the detection effect of the model is evaluated by using two indexes, namely a defect detection rate DIR and an average mAP of average precision in the following analysis.

First, the test set in the embodiment of the present invention was verified and the mAP value was calculated using three network models of fast-Rcnn, SSD, and Yolo series, respectively, and the detection model of the present invention. The confidence coefficient (score) threshold of the prediction frame is preset to be 0.005, then a model is called to predict results of all the prediction frames, all the prediction frames are divided into 4 types according to the prediction types and stored in 4 files respectively, and each row in each file stores one result of the prediction frame. Then, a function for calculating the mAP and the precision recall rate curve is called, and the detection results of the following table four and fig. 12 a-12 d can be obtained.

Table four:

as shown in Table IV, the detection effect of the detection model of the invention on the types of code spraying interference and polar plate interference is better in each detection model, and the AP value is 0.998. The AP value of the models SSD _ Mobilene, Yolo _ TinyV3 and the detection model of the invention on the detection effect of the bubble defect category is greater than 0.8, wherein the effect of Yolo _ TinyV3 and the detection model of the invention is better than that of SSD _ Mobilene. In the detection of wrinkle defect types, the detection effect of the detection model is the best, and the AP value can reach 0.854. The detection effects of the four categories are comprehensively considered, and the average mAP value of the average precision is taken as a reference value, so that the mAP value of the detection model exceeds 0.92, and the effect is most remarkable.

According to the description of the mAP in the foregoing, the mAP value can be used as an evaluation criterion of the comprehensive performance capability of the model, and the higher the mAP value is, the better the detection effect of the model is. The mAP value can be determined from a precision-recall curve (PR curve). In order to more intuitively observe the variation of the four category precision P values with the recall R value, the model of the four top mAP values in the table four (i.e. the detection model of the present invention, Yolo _ TinyV3, SSD _ Mobilenetv1 and Faster R-CNN) was selected in the following experiment to compare the variation of PR curves of the four, and the results are shown in FIGS. 12a to 12 d.

As shown in fig. 12a to 12d, different Recall ratio R values correspond to different precision P values, and in this embodiment, three common key points (Recall ═ 0.4,0.6, 0.8) are selected to observe the change of PR curves of each model, where fig. 12a is a PR curve of SSD _ mobileneetv 1 model, fig. 12b is a PR curve of Faster R-CNN model, fig. 12c is a PR curve of Yolo _ TinyV3 model, and fig. 12d is a PR curve of the detection model of the present invention. In fig. 12a, 12b, and 12d, the precision P values of the code spraying and the plate type at three positions of (0.4, 0.6, 0.8) reach about 1.0, the overall change is very stable, and the detection effect is very good only when the Recall ratio R value is increased to above 0.9. The precision P values of the four categories with the recall ratio R value of 0.4 reach more than 0.9, although the P value is high, the recall ratio is only 0.4, and further observation is needed. When the recall rate R value is 0.6, the wrinkle detection capability of the model of fig. 12b starts to decrease greatly, and the precision P value of the bubble also decreases to 0.83, which is lower than the precision P values of the other three models. When the recall rate R value rises to 0.8, the precision values of the model detection bubble types in the graph 12d and the graph 12c reach about 0.78, and the precision P values of the rest models are lower than 0.7. However, when the wrinkle type is detected, only the model in fig. 12d (and the detection model of the present invention) can reach 0.74, and the remaining three models are all lower than 0.7.

Comparing the results and analysis conditions of the experiment shown in the fourth table and the experiment shown in fig. 12a to 12d, it can be seen that compared with other models, the detection model of the present invention has better stability in terms of accuracy and recall rate, and has the highest value and more outstanding performance in the mAP evaluation index.

In order to further compare the detection effect of the model, experiments are performed to compare two types of air bubbles and wrinkles which are difficult to detect, and the defect detection rate DIR value is used to verify the performance of the four models in fig. 12a to 12d on the evaluation index in the actual industrial application. The calculation of the defect detection rate DIR is described in the foregoing, based on the false alarm rate FA described in the foregoing_RPositive detection rate TD_RError detection rate FD_RRecall ratio RC_RAnd rate of missed detection MD_RThe calculation formula of (2) can respectively calculate five indexes of defect detection rate DIR values of the air bubbles and the wrinkle types in different models. The sum of the positive detection rate, the false detection rate and the false alarm rate in the five indexes is 1, the sum of the missed detection rate and the recall rate is 1, and the detection results are shown in fig. 13 and 14.

Figure 13 is a histogram illustrating a comparison of DIR results for multiple models in bubble categories, according to an embodiment of the invention. As can be seen from comparison of the detection results of the bubble types in fig. 13, the detection results of the Yolo _ TinyV3 model and the detection model of the present invention have similar effects on the positive detection rate index, both of which are much higher than the positive detection rates of the other two models, and the false alarm rate and the false detection rate are both lower than those of the other two models, especially the false detection rate of the detection model of the present invention is almost zero. Meanwhile, the recall rate and the omission factor of the Yolo _ TinyV3 model and the detection model of the invention are almost the same as the detection effect of the SSD _ MobilenetV1 model with the best performance, and the difference value is not more than 3 percentage points. The positive rate represents the probability that all prediction frames of the model correctly detect the target object, the false alarm rate and the false detection rate represent the probability of false judgment as a background or other categories, and the recall rate represents the probability of missed detection, and the more recalls (i.e. the higher the recall rate) the less missed detection. Therefore, the prediction frames of the Yolo _ TinyV3 model and the detection model of the invention have higher detection efficiency on the bubble types, lower probability of misjudgment and better detection effect.

FIG. 14 is a histogram illustrating a DIR comparison of multiple models in wrinkle classes according to an embodiment of the present invention. As can be seen from comparison of the detection results of the wrinkle types in fig. 14, the detection results of the Yolo _ TinyV3 model and the detection model of the present invention have similar effects on the positive detection rate, the false alarm rate, and the false detection rate, and are much higher than the detection effects of the other two models on the three indexes. On the recall index, the detection model and the SSD _ Mobilene model of the invention have the best effect, which both reach 0.896 and exceed 9 percentage points of the Yolo _ TinyV3 model. In order to more intuitively show the detection effect, fig. 15 selects a detection result graph of the detection model of the present invention and the Yolo _ TinyV3 model.

As shown in fig. 15, column a of fig. 15 shows the real labeling box of the actual defect and interference area of the test set, column b shows the detection result prediction box of the Yolo _ TinyV3 model, and column c shows the detection result prediction box of the detection model of the present invention. Comparing the b column and the c column of fig. 15, it can be found that the prediction frame of the wrinkle defect and the plate interference region in the Yolo _ TinyV3 model has a problem of partial missing detection, and the overall recall rate is lower than that of the detection model of the present invention. As can be further shown by comparing the column a and the column c of fig. 15, the detection model of the present invention has a better effect of detecting defects with insignificant feature information and dimensional morphology change, and the overall detection effect is closer to the real situation than that of other models and the conventional image processing method, so that the detection accuracy of the detection model of the present invention is higher, and the detection effect is better.

By combining the two factors of the experimental comparison and analysis results about the model efficiency problem and the experimental comparison and analysis results about the detection effect, the detection model disclosed by the invention can give consideration to the performance of small model files and high detection speed in the aspect of efficiency, and has better stability and better overall detection effect (namely on DIR and mAP indexes) compared with other models in the aspect of detection effect. Therefore, the detection model has high overall detection level, and can meet the actual application requirements of lithium battery surface defect detection and the application conditions in other defect detection fields.

Through the above description, those skilled in the art can understand that the method for detecting defects on a lithium battery surface according to the present invention can extract target feature information of a sample to be processed by performing operations such as down-sampling, up-sampling, and fusion on the sample to be processed, and train a network model based on a tandem packet rolling block by using a sample data set including the sample to be processed and the target feature information, so as to obtain a detection model capable of automatically detecting defects of an image to be detected. According to the packet convolution operation, the number of model parameters and the calculated amount can be reduced, so that the detection efficiency is effectively improved, meanwhile, the characteristic information with weak information can be enhanced by combining the characteristic information of a shallow layer and a deep layer in a serial connection mode, the defect characteristics of different network layers can be extracted, the loss of the characteristic information is effectively avoided, and the purposes of detecting the characteristic information with weak information and multi-scale characteristics and improving the detection rate are achieved.

According to a second aspect of the present invention, there is provided an apparatus for lithium battery surface defect detection, which may include: at least one processor; a memory storing program instructions that, when executed by the at least one processor, cause the apparatus to perform the method according to any one of the first aspects of the invention.

Although the present invention has been described with reference to specific preferred embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the scope of protection of one or more embodiments of the present specification shall be subject to the scope of protection of the claims.

Claims

1. A method for lithium battery surface defect detection, comprising:

taking the collected image of the surface of the lithium battery as a sample to be processed;

performing one or more downsampling operations on the sample to be processed to obtain one or more feature maps;

performing up-sampling operation on the feature mapping chart, and performing channel fusion on the feature mapping chart and an image to be sampled before the down-sampling operation is performed on the feature mapping chart to obtain a fusion feature chart;

acquiring target characteristic information of the sample to be processed according to at least one of the characteristic mapping chart and the fusion characteristic chart;

training a network model based on a tandem grouping rolling block by using a sample data set comprising the sample to be processed and the target characteristic information to obtain a detection model; and

detecting an image to be detected containing the surface of the lithium battery through the detection model so as to detect whether the image to be detected contains defects;

wherein each downsampling operation comprises:

performing convolution downsampling operation on the image to be sampled so as to extract the characteristic information of the image to be sampled and reduce the size of the image to be sampled to obtain a first characteristic mapping layer;

executing a first dimension reduction operation on the first feature mapping layer to obtain a second feature mapping layer;

dividing the feature mapping in the second feature mapping layer into a plurality of groups, executing continuous convolution operation on each group of feature mapping, and superposing the convolution operation results of the plurality of groups of feature mapping to obtain a third feature mapping layer;

performing series fusion operation on the channels of the first feature mapping layer and the third feature mapping layer to obtain a fourth feature mapping layer; and

and executing a second dimension reduction operation on the fourth feature mapping layer to obtain the feature mapping map.

2. The method of claim 1, wherein performing a downsampling operation on the sample to be processed comprises:

executing the downsampling operation once on the sample to be processed, wherein the image to be sampled is the sample to be processed; or

And executing the downsampling operation on the sample to be processed for multiple times, wherein the image to be sampled is a feature mapping image obtained by the last downsampling operation.

3. The method of claim 1, wherein performing a downsampling operation on the sample to be processed to obtain a feature map comprises:

performing a plurality of downsampling operations on the sample to be processed to obtain a plurality of feature maps; and

the performing an upsampling operation on the feature map comprises:

and respectively executing the up-sampling operation on the feature mapping images obtained by the last two down-sampling operations.

4. The method of claim 3, wherein performing a plurality of downsampling operations on the sample to be processed comprises:

performing five downsampling operations on the sample to be processed; and

the performing an upsampling operation on the feature map comprises:

and respectively executing the up-sampling operation on the feature maps obtained by the fourth down-sampling operation and the fifth down-sampling operation.

5. The method according to claim 1 or 3, wherein obtaining target feature information of the sample to be processed according to at least one of the feature map and the fused feature map comprises:

and outputting multi-scale target feature information according to the size of the target feature in the feature mapping graph obtained by the fused feature graph and the last downsampling operation.

6. The method of claim 1, wherein

The object characteristic information comprises object position information and object category information, wherein

The object class information includes at least one of defective object information and interference object information.

7. The method of claim 6, wherein

The defect target information includes at least one of bubble expansion and surface wrinkling;

the interference target information comprises at least one of code spraying interference and polar plate interference.

8. The method of claim 1, further comprising, prior to performing the downsampling operation:

preprocessing the sample to be processed according to the gray information difference of the sample to be processed so as to extract the region of interest of the sample to be processed; and

the one or more downsampling operations performed on the sample to be processed include:

performing one or more of the down-sampling operations on the region of interest of the sample to be processed.

9. The method of claim 8, wherein pre-processing the sample to be processed comprises:

performing automatic threshold segmentation on the sample to be processed to obtain a binary image of the sample to be processed, wherein the binary image comprises a foreground image and a background image;

detecting an upper edge profile of the foreground image, selecting a point set of the upper edge profile to execute least square fitting, and performing angle correction on the foreground image according to a fitting result;

extracting all outer edge contours of the foreground image after angle correction, and determining an outer edge contour with the maximum area and the maximum perimeter according to a point set of all the outer edge contours; and

and determining the position of the region of interest in the sample to be processed according to the contour point closest to the four corner points of the minimum circumscribed rectangle of the outer edge contour.

10. The method of claim 1, wherein the image of the lithium battery surface comprises at least one of a left side burnish image, a right side burnish image, and a front side burnish image.

11. An apparatus for lithium battery surface defect detection, comprising:

at least one processor;

a memory storing program instructions that, when executed by the at least one processor, cause the apparatus to perform the method of any of claims 1-10.

12. A computer-readable storage medium storing a program for lithium battery surface defect detection, which when executed by a processor performs the method according to any one of claims 1-10.