CN116773534B

CN116773534B - Detection method and device, electronic equipment and computer readable medium

Info

Publication number: CN116773534B
Application number: CN202311021779.0A
Authority: CN
Inventors: 刘振东; 王兴鹏; 吴恩德; 蔡振浩
Original assignee: Ningde Sikeqi Intelligent Equipment Co Ltd
Current assignee: Ningde Sikeqi Intelligent Equipment Co Ltd
Priority date: 2023-08-15
Filing date: 2023-08-15
Publication date: 2024-03-05
Anticipated expiration: 2043-08-15
Also published as: CN116773534A

Abstract

The application discloses a detection method and device, electronic equipment and a computer readable medium, wherein the detection method comprises the steps of obtaining an appearance picture of a battery module; inputting the appearance picture into a coding layer, and extracting the characteristics of the appearance picture through the coding layer; analyzing the characteristics extracted by the coding layer through the decoding layer, and outputting the result. The three-dimensional feature extraction method is provided to more comprehensively and effectively extract spatial and spectral features, the features of spatial and spectral information in the images are obtained in the 3D convolution process by remolding the images and creating structures similar to three-dimensional medical images, and meanwhile, the multi-dimensional spatial feature extraction module and the depth codec network based on the depth separable convolution can simply and effectively detect defects of the appearance of the product.

Description

Detection method and device, electronic equipment and computer readable medium

Technical Field

The present disclosure relates to the field of image recognition technologies, and in particular, to a detection method and apparatus, an electronic device, and a computer readable medium.

Background

With the development of new energy industry, the welding process of the lithium battery module is gradually automated, but errors are easily generated in the welding process, the errors can lead to unqualified product quality, and how to find the errors in an intelligent factory becomes a problem to be solved urgently. Currently, researchers use appearance defect detection technology as a method for finding out the errors, and the appearance defect detection technology is an important part of quality control of products such as lithium battery modules in new energy factories. In the traditional manufacturing industry, the appearance defect detection is usually carried out by using a manual method, however, due to the conditions of fatigue, distraction and the like of human beings, the cost of the manual detection is high and the accuracy is low. With the development of artificial intelligence, a deep learning method is introduced into an intelligent factory at present to detect appearance defects of a lithium battery module in a high precision in the welding process. Three fundamental problems are encountered in appearance defect detection. Firstly, the types of appearance defects in the manufacturing process of the lithium battery are very diversified; secondly, the appearance defects have no specific measurement standard, and the defect sizes are greatly different; third, defects in appearance may be similar to the background of the factory, resulting in poor detection.

Disclosure of Invention

Purpose of (one) application

In view of the foregoing, an object of the present application is to provide a detection method and apparatus, an electronic device, and a computer readable medium, which are used for solving the above technical problems in the prior art by designing a depth separable convolution layer and a three-dimensional feature extraction module in an encoder portion and designing a multi-input attention module in a decoder portion, and combining the depth separable convolution layer to realize effective feature extraction of an input image and defect detection of product appearance such as a lithium battery module.

(II) technical scheme

According to the method, the device and the system, the multi-dimensional feature extraction module, the three-dimensional feature extraction module, the multi-input attention module and the transposed convolution decoding module are designed, so that the product appearance defect detection work in the external image is effectively achieved. The multi-dimensional feature extraction module is provided with a depth separable convolution layer to reduce convolution parameters, so that features in a two-dimensional image are extracted more simply and effectively; the three-dimensional feature extraction module converts the two-dimensional image into a three-dimensional image, creates a structure similar to the three-dimensional medical image, obtains the spectrum and the space feature of the image through three-dimensional convolution, and learns the weight information in the feature map by fusing the feature map obtained by the multi-dimensional feature extraction module and the three-dimensional feature extraction module; the transposed convolution decoding module further learns information in the appearance image by utilizing transposed convolution and a designed depth separable convolution layer, and finally the proposed method can realize appearance defect detection of products such as lithium battery modules in intelligent factories.

The detection method disclosed by the application comprises the following steps: obtaining an appearance picture of the battery module; inputting the appearance picture into a coding layer, and extracting the characteristics of the appearance picture through the coding layer; the coding layer comprises at least one three-dimensional feature extraction module which is arranged in sequence; analyzing the characteristics extracted by the coding layer through the decoding layer, and outputting a result; the decoding layer comprises a transposed convolution decoding module and an output module, wherein the transposed convolution decoding module comprises a first decoding module and at least one second decoding module which are arranged in sequence; the first decoding module comprises a separable convolution layer and a transposed convolution layer, and the second decoding module comprises a multi-input attention module, the separable convolution layer and the transposed convolution layer; the output module analyzes the learning of the decoding layer through the convolution layer and the softmax function, outputs a result and judges whether defects exist or not.

In one embodiment of the application, the coding layer comprises a data preprocessing module and a multidimensional feature extraction module; the data preprocessing module is used for normalizing the appearance picture into an input vector; the multidimensional feature extraction module is used for extracting a feature map from the input vector, wherein the feature map comprises a low-level feature map and/or a high-level feature map; the three-dimensional feature extraction module is used for converting the appearance picture into a three-dimensional image and extracting a three-dimensional feature image from the three-dimensional image, wherein the three-dimensional feature image comprises three-dimensional spectral features and/or three-dimensional spatial features;

in one embodiment of the application, the multi-dimensional feature extraction module is configured as a first extraction module and at least one second extraction module arranged in sequence, and the first extraction module inputs the extracted features to the first-ordered second extraction module; the second extraction module inputs the extracted features to the second extraction module of the next order.

In one embodiment of the present application, the first decoding module inputs the extracted features to a first-ordered second decoding module; the second decoding module inputs the extracted features to the second decoding module of the next order.

In one embodiment of the present application, the multidimensional feature extraction module, the three-dimensional feature extraction module, and the built-in module of the transposed convolutional decoding module are all ordered according to a processed size standard; and the characteristics extracted by the multidimensional characteristic extraction module and the built-in module of the three-dimensional characteristic extraction module are input to the built-in module of the transposed convolution decoding module according to the size standard.

In one embodiment of the present application, the multi-input attention module fuses the low-level feature map, the three-dimensional feature map, and the high-level feature map as follows:

splicing the low-level characteristic diagram, the three-dimensional characteristic diagram and the high-level characteristic diagram,

wherein,representing a splicing operation->Is output after splicing, is->Three dimensions of width, height and depth of the feature map are respectively corresponding to +.>Representing a low-level feature map obtained by the multi-dimensional feature extraction module, S ₁ Depth of representation low-level feature map, +.>Representing the three-dimensional feature map obtained by the three-dimensional feature extraction module, S ₂ Representing the depth of the three-dimensional feature map,representing the decoded advanced feature map, S ₃ Representing the depth of the advanced feature map;

obtaining the average value of all feature graphs based on global pooling layer，/>Wherein->Representing a global pooling layer of the pool,

obtainingThen, obtaining a weight vector coefficient through two full-connection layers and a sigmoid function，

Wherein the method comprises the steps ofRepresenting a full connection layer, < >>Representing the activation function of the fully connected layer,

obtaining a weight matrix of the feature map through copying operation，

Wherein the method comprises the steps ofA function representing a copy operation;

finally, the weight matrix of the feature mapMultiplying the spliced output->Get the output of the multiple input attention module +.>，

Is a weight matrix of the feature map.

In one embodiment of the present application, there is provided a detection apparatus including: the input module is configured to acquire an appearance picture of the battery module; the coding layer module is configured to extract the characteristics of the appearance pictures to obtain image characteristics; the coding layer comprises at least one three-dimensional feature extraction module which is arranged in sequence; the decoding layer module is configured to analyze the image characteristics and output a result; the decoding layer module comprises a transposed convolution decoding module and an output module, wherein the transposed convolution decoding module comprises a first decoding module and at least one second decoding module which are arranged in sequence; the first decoding module comprises a separable convolution layer and a transposed convolution layer, and the second decoding module comprises a multi-input attention module, the separable convolution layer and the transposed convolution layer; the output module analyzes the learning of the decoding layer through the convolution layer and the softmax function, outputs a result and judges whether defects exist or not.

In one embodiment of the present application, there is provided an electronic device including: one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement the detection method as described above.

In one embodiment of the present application, a computer readable medium having computer readable instructions stored thereon, which when executed by a processor of a computer, cause the computer to perform the detection method as described above is provided.

(III) beneficial effects

The method comprises the steps of reshaping an image and creating a structure similar to a three-dimensional medical image, then obtaining features of spatial and spectral information in the image in a 3D convolution process, and simply and effectively detecting product appearance defects through a multi-dimensional spatial feature extraction module of depth separable convolution and a depth codec network.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The embodiments described below with reference to the drawings are exemplary and intended for the purpose of illustrating and explaining the present application and are not to be construed as limiting the scope of protection of the present application.

FIG. 1 is a method step diagram of the present application;

FIG. 2 is a system block diagram corresponding to the method of the present application;

FIG. 3 is a block diagram of a three-dimensional feature extraction module of the present application;

FIG. 4 is a method step diagram of the three-dimensional feature extraction module of the present application;

FIG. 5 is a block diagram of a multiple input attention module of the present application;

FIG. 6 is a method step diagram of the multiple input attention module of the present application;

FIG. 7 is a block diagram of the detection device of the present application;

fig. 8 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

Reference to "a plurality" in this application means two or more than two. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The scheme provided by the embodiment of the application relates to the technical field of image recognition. As shown in fig. 1, the present application discloses a detection method, including: s100, obtaining an appearance picture of the battery module; s200, inputting the appearance picture into a coding layer, and extracting the characteristics of the appearance picture through the coding layer; s300, analyzing the characteristics extracted by the coding layer through the decoding layer, and outputting a result.

In one embodiment of the present application, as shown in fig. 2, the encoding layer includes a data preprocessing module, a multi-dimensional feature extraction module, and at least one three-dimensional feature extraction module arranged in sequence; the main function of the data preprocessing module is to read the image input and normalize the image input into vector input; the multidimensional feature extraction module is used for extracting feature graphs from the input vectors, the multidimensional space feature extraction module is designed into a depth separable convolution layer, and two types of feature graphs are obtained from the input images by using a standardized function and a ReLU function.

The feature images are low-level feature images respectively, and comprise low-level spatial features such as textures, gray scales and the like; a high-level feature map comprising deep semantic information; the three-dimensional feature extraction module is used for converting the appearance picture into a three-dimensional image and extracting a three-dimensional feature image from the three-dimensional image by utilizing three-dimensional convolution, wherein the three-dimensional feature image comprises three-dimensional spectral features and/or three-dimensional spatial features;

the purpose of the three-dimensional feature extraction module is to obtain efficient spectral and spatial features by reshaping the input image and correlating the far and near pixels, as shown in fig. 3.

The three-dimensional feature extraction module is reconstructedIs converted into a three-dimensional image, specifically as shown in figure 4,

s210, dividing the input image intoThe third dimension is added to obtain +.>Is a three-dimensional image of (a). In this way, unlike standard two-dimensional convolution, spectral and spatial features can also be obtained from the image without destroying the original image structure.

S220, after reconstructionThe three-dimensional image is subjected to three-dimensional convolution, wherein the convolution kernel size is 3x3x3, and the step size is 1x1, so that a +.>Wherein K is the number of filters.

S230, combining the third dimension and the fourth dimension of the four-dimensional feature matrix, and reconstructing the four-dimensional feature matrix intoWherein K is the number of filters.

The three-dimensional feature extraction module is used for multiple times in the whole method, and is parallel to the multi-dimensional feature extraction module, 4 layers are added, so that the N value in the module is set to be the same as the width and the height of the feature map output by the multi-dimensional feature extraction module, see the table 1, and meanwhile, the number of filters of the module is inversely proportional to the N value to avoid high calculation cost caused by three-dimensional convolution.

Layer number	Input device	Layer name	N value	Filter	Output dimension
						1	96 x 96 x 4	Three-dimensional convolution	2	8	96 x 96 x 32
2	48 x 48 x 16	Three-dimensional convolution	4	4	48 x 48 x 64
						3	24 x 24 x 64	Three-dimensional convolution	8	2	24 x 24 x 128
4	12 x 12 x 256	Three-dimensional convolution	16	1	12 x 12 x 256

TABLE 1 three-dimensional feature extraction Module parameters

The multidimensional feature extraction module comprises 5 layers of neural networks, wherein the first layer is a 3x3 convolution layer and a separable convolution layer, and the second to five layers each comprise two separable convolution layers, as shown in the multidimensional feature extraction module shown in fig. 2, wherein the sizes of parameters (convolution step length, filter number and output dimension) of each layer are shown in table 2.

Coding layer number	Layer name	Step size	Filter	Output dimension
					1	Convolution + normalization + ReLU	2 x 2	32	96 x 96 x 32
1	Depth separable convolution	1 x 1	32	96 x 96 x 32
					2	Depth separable convolution	2 x 2	64	48 x 48 x 64
2	Depth separable convolution	1 x 1	64	48 x 48 x 64
					3	Depth separable convolution	2 x 2	128	24 x 24 x 128
3	Depth separable convolution	1 x 1	128	24 x 24 x 128
					4	Depth separable convolution	2 x 2	256	12 x 12 x 256
4	Depth separable convolution	1 x 1	256	12 x 12 x 256
					5	Depth separable convolution	2 x 2	512	6 x 6 x 512
5	Depth separable convolution	1 x 1	512	6 x 6 x 512

Table 2 multidimensional feature extraction module parameters

In particular, the depth separable convolution layer includes two steps, depth convolution and point convolution, as shown in fig. 3.

The depth convolution function is shown in equation 1.

Wherein (1)>Is the result of the deep convolution,/->Representing input image +.>Is->The filter of the channel is used for the filtering of the channel,representing input image +.>Is->A filter of the channel.

In the second step of the depth separable convolution layer, the point convolution will use the depth convolution resultsAs shown in formula 2

Wherein, the point convolution result represents the convolution kernel size 1x1.

The depth separable convolution layer is used multiple times throughout the method, and in order to reduce the size of the feature map, the step size will be set to 2x2 in the convolution operation, except for the first layer.

In one embodiment of the present application, as shown in fig. 2, the decoding layer includes a transpose convolution decoding module and an output module; the transposed convolutional decoding module comprises a first decoding module and at least one second decoding module arranged in sequence, the first decoding module comprises a separable convolutional layer and a transposed convolutional layer, and the second decoding module comprises a multi-input attention module, a separable convolutional layer and a transposed convolutional layer as shown in fig. 5; the output module analyzes the learning of the decoding layer through the convolution layer and the softmax function, outputs a result and judges whether defects exist or not.

The transposed convolutional decoding module includes a total of 5 layers of neural networks, each layer of neural network is composed of a two-dimensional transposed convolutional layer, a normalized layer, a ReLu function, a depth separable convolutional layer, and a multi-input attention module, as shown in fig. 2.

Specifically, each layer of neural network firstly increases the width and the height of the feature map through two-dimensional transposition convolution operation, then obtains output through a depth separable convolution layer, and fuses different feature maps (low-level features, high-level features, three-dimensional spectrum and spatial features) by utilizing a multi-input attention module, wherein parameters of each layer of neural network are shown in table 3.

Decoding layer number	Layer name	Step size	Filter	Output dimension
					1	Two-dimensional transpose convolution + normalization + ReLU	2 x 2	256	12 x 12 x 256
1	Multiple input attention + depth separable convolution	1 x 1	256	12 x 12 x 256
					2	Two-dimensional transpose convolution + normalization + ReLU	2 x 2	128	24 x 24 x 128
2	Multiple input attention + depth separable convolution	1 x 1	128	24 x 24 x 128
					3	Two-dimensional transpose convolution + normalization + ReLU	2 x 2	64	48 x 48 x 64
3	Multiple input attention + depth separable convolution	1 x 1	64	48 x 48 x 64
					4	Two-dimensional transpose convolution + normalization + ReLU	2 x 2	32	96 x 96 x 32
4	Multiple input attention + depth separable convolution	1 x 1	32	96 x 96 x 32
					5	Two-dimensional transpose convolution + normalization + ReLU	2 x 2	16	192 x 192 x 16
5	Multiple input attention + depth separable convolution	1 x 1	16	192 x 192 x 16

Table 3 transposed convolutional decoding module parameters

The present module will use the output of the transposed convolutional decoding module to derive if the image is defective. Specifically, the output is first passed through a convolution layer with a convolution kernel of 1x1, and then the result is obtained using a softmax function, as shown in equation 8.

Where P is the result of the final defect detection,representing convolution,/->Is the output of the transposed convolutional decoding block.

In one embodiment of the present application, the multidimensional feature extraction module, the three-dimensional feature extraction module, and the built-in module of the transposed convolutional decoding module are all ordered according to a processed size standard; the size standard is width x height x channel; and the characteristics extracted by the multidimensional characteristic extraction module and the built-in module of the three-dimensional characteristic extraction module are input to the built-in module of the transposed convolution decoding module according to the size standard.

In one embodiment of the present application, as shown in fig. 6, the steps of the multi-input attention module fusing different feature maps are as follows:

s310, splicing the low-level characteristic diagram, the three-dimensional characteristic diagram and the high-level characteristic diagram,

wherein,representing a splicing operation->Is output after splicing, is->Three dimensions of width, height and depth of the feature map are respectively corresponding to +.>Representing a low-level feature map obtained by the multi-dimensional feature extraction module, S ₁ Depth of representation low-level feature map, +.>Representing the three-dimensional feature map obtained by the three-dimensional feature extraction module, S ₂ Depth of representation three-dimensional feature map, ++>Representing the decoded advanced feature map, S ₃ Representing the depth of the advanced feature map;

s320, obtaining the average value of all feature graphs based on the global pooling layerWherein G represents a global pooling layer,

s330, obtainThen, obtaining a weight vector coefficient through two full-connection layers and a sigmoid function，

Wherein the method comprises the steps ofRepresenting a full connection layer, < >>Representing an activation function of the fully connected layer;

s340, obtaining a weight matrix of the feature map through copy operation，

Wherein Repeat represents a function of the copy operation;

s350, finally, the weight matrix of the feature mapMultiplying the spliced output->Get the output of the multiple input attention module +.>，

Is output after splicing, is->Is a weight matrix of the feature map.

In one embodiment of the present application, there is provided a detection apparatus including: the input module is configured to acquire an appearance picture of the battery module; the coding layer module is configured to extract the characteristics of the appearance pictures to obtain image characteristics; and the decoding layer module is configured to analyze the image characteristics and output a result.

In an exemplary embodiment, as shown in fig. 7, the detection apparatus further includes:

an input module 710 configured to obtain an appearance picture of the battery module;

the coding layer module 720 is configured to perform feature extraction on the appearance picture to obtain image features;

the coding layer comprises at least one three-dimensional feature extraction module which is arranged in sequence;

the decoding layer module 730 is configured to parse the image feature and output a result; the decoding layer module comprises a transposed convolution decoding module, wherein the transposed convolution decoding module comprises a first decoding module and at least one second decoding module which are arranged in sequence; the first decoding module comprises a separable convolution layer and a transposed convolution layer, and the second decoding module comprises a multi-input attention module, the separable convolution layer and the transposed convolution layer; the output module analyzes the learning of the decoding layer through the convolution layer and the softmax function, outputs a result and judges whether defects exist or not.

It should be noted that, the detection apparatus provided in the foregoing embodiment and the detection method provided in the foregoing embodiment belong to the same concept, and a specific manner in which each module and unit perform an operation has been described in detail in the method embodiment, which is not described herein again. In practical application, the detection device provided in the above embodiment may distribute the functions to different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above, which is not limited herein.

The embodiment of the application also provides electronic equipment, which comprises: one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement the detection methods provided in the respective embodiments described above.

Fig. 8 shows a schematic diagram of a computer system suitable for use in implementing the electronic device of the embodiments of the present application. It should be noted that, the computer system 900 of the electronic device shown in fig. 8 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 8, the computer system 900 includes a central processing unit (Central Processing Unit, CPU) 901 which can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a random access Memory (Random Access Memory, RAM) 903, for example, performing the method described in the above embodiment. In the random access memory (Random Access Memory, RAM) 903, various programs and data required for system operation are also stored. A central processing unit (Central Processing Unit, CPU) 901, a Read-Only Memory (ROM) 902, and a random access Memory (Random Access Memory, RAM) 903 are connected to each other through a bus 904. An Input/Output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output section 907 including a speaker and the like, such as a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. Removable media 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed as needed into the storage section 908.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. When the computer program is executed by the central processing unit (Central Processing Unit, CPU) 901, various functions defined in the system of the present application are performed.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

Another aspect of the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the detection method as described above. The computer-readable storage medium may be included in the electronic device described in the above embodiment or may exist alone without being incorporated in the electronic device.

The foregoing is merely a preferred exemplary embodiment of the present application and is not intended to limit the embodiments of the present application, and those skilled in the art may make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of detection comprising:

obtaining an appearance picture of the battery module;

inputting the appearance picture into a coding layer, and extracting the characteristics of the appearance picture through the coding layer;

the coding layer comprises at least one three-dimensional feature extraction module which is arranged in sequence; the coding layer also comprises a data preprocessing module and a multidimensional feature extraction module;

the data preprocessing module normalizes the appearance picture into an input vector;

the multi-dimensional feature extraction module extracts a feature map from the input vector, the feature map comprising a low-level feature map and/or a high-level feature map, the low-level feature map comprising low-level spatial features; the high-level feature map comprises deep semantic information; the multidimensional feature extraction module is designed into a depth separable convolution layer, and two types of feature graphs are obtained from an input image by using a standardized function and a ReLU function;

the three-dimensional feature extraction module converts the appearance picture into a three-dimensional image, and extracts a three-dimensional feature map from the three-dimensional image, wherein the three-dimensional feature map comprises three-dimensional spectral features and/or three-dimensional spatial features;

analyzing the characteristics extracted by the coding layer through the decoding layer, and outputting a result;

the decoding layer comprises a transposed convolution decoding module and an output module, wherein the transposed convolution decoding module comprises a first decoding module and at least one second decoding module which are arranged in sequence; the first decoding module comprises a separable convolution layer and a transposed convolution layer, and the second decoding module comprises a multi-input attention module, the separable convolution layer and the transposed convolution layer; the output module analyzes the learning of the decoding layer through the convolution layer and the softmax function, outputs a result and judges whether defects exist or not.

2. A detection method according to claim 1, wherein the multi-dimensional feature extraction modules are arranged as a first extraction module and at least one second extraction module arranged in sequence,

the first extraction module inputs the extracted features to a second extraction module for sorting the first features;

the second extraction module inputs the extracted features to the second extraction module of the next order.

3. A detection method according to claim 2, wherein the first decoding module inputs the extracted features to a first-ranked second decoding module;

the second decoding module inputs the extracted features to the second decoding module of the next order.

4. A detection method according to claim 3, wherein the multi-dimensional feature extraction module, the three-dimensional feature extraction module and the built-in module of the transposed convolutional decoding module are all ordered according to a processed size criterion;

and the characteristics extracted by the multidimensional characteristic extraction module and the built-in module of the three-dimensional characteristic extraction module are input to the built-in module of the transposed convolution decoding module according to the size standard.

5. The method according to claim 4, wherein the step of merging the low-level feature map, the three-dimensional feature map, and the high-level feature map by the multi-input attention module is as follows:

；

obtainingAfter that, the processing unit is configured to,obtaining a weight vector coefficient ++through two full connection layers and a sigmoid function>，

；

obtaining a weight matrix of the feature map through copying operation，

；

Wherein Repeat represents a function of the copy operation;

；

Is a weight matrix of the feature map.

6. A detection apparatus, characterized by comprising:

the input module is configured to acquire an appearance picture of the battery module;

the coding layer module is configured to extract the characteristics of the appearance pictures to obtain image characteristics;

the coding layer module comprises at least one three-dimensional feature extraction module which is arranged in sequence; the coding layer module further comprises a data preprocessing module and a multidimensional feature extraction module;

the data preprocessing module is used for normalizing the appearance picture into an input vector;

the multidimensional feature extraction module is used for extracting a feature map from the input vector, wherein the feature map comprises a low-level feature map and/or a high-level feature map; the low-level feature map comprises low-level spatial features; the high-level feature map comprises deep semantic information; the multidimensional feature extraction module is designed into a depth separable convolution layer, and two types of feature graphs are obtained from an input image by using a standardized function and a ReLU function;

the three-dimensional feature extraction module is used for converting the appearance picture into a three-dimensional image and extracting a three-dimensional feature image from the three-dimensional image, wherein the three-dimensional feature image comprises three-dimensional spectral features and/or three-dimensional spatial features;

the decoding layer module is configured to analyze the image characteristics and output a result; the decoding layer module comprises a transposed convolution decoding module and an output module, wherein the transposed convolution decoding module comprises a first decoding module and at least one second decoding module which are arranged in sequence; the first decoding module comprises a separable convolution layer and a transposed convolution layer, and the second decoding module comprises a multi-input attention module, the separable convolution layer and the transposed convolution layer; the output module analyzes the learning of the decoding layer through the convolution layer and the softmax function, outputs a result and judges whether defects exist or not.

7. An electronic device, comprising:

one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement the detection method of any of claims 1 to 5.

8. A computer readable medium having stored thereon computer readable instructions which, when executed by a processor of a computer, cause the computer to perform the detection method of any of claims 1 to 5.