CN116977260A

CN116977260A - Target defect detection method and device, electronic equipment and storage medium

Info

Publication number: CN116977260A
Application number: CN202310284013.5A
Authority: CN
Inventors: 赖锦祥
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-03-15
Filing date: 2023-03-15
Publication date: 2023-10-31

Abstract

The application discloses a defect detection method and device of a target object, electronic equipment and a storage medium. The embodiment of the application relates to the technical fields of artificial intelligence machine learning, cloud technology and the like. The method comprises the following steps: acquiring a first image feature corresponding to a color image of a target object and a second image feature corresponding to a normal vector diagram; determining a first mask weight corresponding to the first image feature and a second mask weight corresponding to the second image feature; fusing the first image features and the second image features according to the first mask weights and the second mask weights to obtain target fusion features; and determining a defect detection result of the target object according to the target fusion characteristics. The target fusion characteristic is combined with the respective image characteristics of the color image and the normal vector image, so that the three-dimensional structure of the target object can be more accurately represented, and the accuracy of the defect detection result of the target object obtained according to the target fusion characteristic is higher.

Description

Target defect detection method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of electronic information technology, and in particular, to a method and apparatus for detecting a defect of a target object, an electronic device, and a storage medium.

Background

At present, the tangible articles have various defects in the production process, and if the defective articles flow into the market, the defective articles can cause irreparable loss. For example, after defective batteries are introduced into the market, they may have problems of unstable charge and discharge or leakage of liquid, and even cause serious safety accidents such as explosion. Therefore, how to improve the accuracy of the defect detection of the article is very important.

Disclosure of Invention

In view of the above, the embodiments of the present application provide a method, an apparatus, an electronic device, and a storage medium for detecting a defect of a target object.

In a first aspect, an embodiment of the present application provides a method for detecting a defect of a target object, including: acquiring a color image and a normal vector diagram of a target object; extracting image features of the color image as first image features, and extracting image features of the vector diagram as second image features; determining a first mask weight corresponding to the first image feature and a second mask weight corresponding to the second image feature, wherein the first mask weight represents the importance degree of the first image feature, and the second mask weight represents the importance degree of the second image feature; fusing the first image features and the second image features according to the first mask weights and the second mask weights to obtain target fusion features; and determining a defect detection result of the target object according to the target fusion characteristics.

In a second aspect, an embodiment of the present application provides a defect detection apparatus for a target object, including: the acquisition module is used for acquiring a color image and a normal vector diagram of the target object; the extraction module is used for extracting the image characteristics of the color image as first image characteristics and extracting the image characteristics of the vector diagram as second image characteristics; the weight determining module is used for determining a first mask weight corresponding to the first image feature and a second mask weight corresponding to the second image feature, wherein the first mask weight represents the importance degree of the first image feature, and the second mask weight represents the importance degree of the second image feature; the fusion module is used for fusing the first image features and the second image features according to the first mask weight and the second mask weight to obtain target fusion features; and the result determining module is used for determining the defect detection result of the target object according to the target fusion characteristics.

Optionally, the result determining module is further configured to input the target fusion feature into a target detection model, to obtain a target defect type of the target object and a target detection frame of the target defect type, where the target detection frame is a selection frame for selecting a defect of the target object, the target detection model is obtained by training the initial detection model based on the sample image, the defect type corresponding to the sample image, and a truth frame corresponding to the sample image, the truth frame is a selection frame for selecting a defect of an object in the sample image, and the truth frame is obtained by labeling; acquiring characteristics of a target area selected by a target detection frame in a normal vector diagram as small sample characteristics; and determining a defect detection result of the target object according to the characteristics of the small sample.

Optionally, the result determining module is further configured to obtain, through the backbone network, a respective support characteristic of each of the plurality of support sets; one support set corresponding to one of a plurality of defect levels, each support set including at least one normal vector diagram, the plurality of defect levels being for a target defect class; performing feature alignment processing on each support feature and the small sample feature to obtain a target alignment result of the defect level corresponding to each support feature; and determining a defect detection result of the target object according to the target alignment results of the defect levels.

Optionally, the support features of each support set include respective third image features of the respective sample normal vector diagrams in the support set; the result determining module is further used for determining the correlation degree between each third image feature in the support features and the small sample feature as the respective target weight of each third image feature in the support features; and according to the respective target weights of the third image features in the support features, carrying out weighted summation on the third image features in the support features to obtain target alignment results of defect levels corresponding to the support features.

Optionally, the result determining module is further configured to determine, by using a classifier, a similarity between the target alignment result of each defect level and the small sample feature, as a predicted target value of each defect level; and determining a defect detection result of the target object according to the predicted target values of the defect levels.

Optionally, the result determining module is further configured to obtain a defect level with a maximum predicted target value as the selected defect level; and if the selected defect grade does not reach the grade threshold value aiming at the target defect, obtaining a defect detection result of passing the target object quality inspection.

Optionally, the fusion module is further configured to calculate a sum of the first mask weight and a preset first value, as a first sum value; multiplying the first sum value and a first image feature element by element to obtain a first product, wherein the first image feature comprises a plurality of elements, and each element in the first image feature is a feature value of a pixel point corresponding to the element in the color image; calculating the sum of the second mask weight and a preset second value to be used as a second sum value; multiplying the second sum value and the normal vector graph feature element by element to obtain a second product, wherein the second image feature comprises a plurality of elements, and each element in the second image feature is a feature value of a pixel point corresponding to the element in the normal vector graph; the sum of the first product and the second product is calculated as a target fusion feature.

Optionally, the weight determining module is further configured to convolve the first image feature through a first convolution network in the weight predictor to obtain a first convolution result; performing batch standardization processing on the first convolution result to obtain a first standardization result; activating the first standardized result to obtain a first activation result; convolving the first activation result through a second convolution network in the weight predictor to obtain a second convolution result; performing batch standardization processing on the second convolution result to obtain a second standardization result; and activating the second standardized result to obtain the first mask weight.

Optionally, the weight determining module is further configured to convolve the second image feature through a first convolution network in the weight predictor to obtain a third convolution result; performing batch standardization processing on the third convolution result to obtain a third standardization result; activating the third standardized result to obtain a second activation result; convolving the second activation result through a second convolution network in the weight predictor to obtain a fourth convolution result; performing batch standardization processing on the second convolution result to obtain a fourth standardization result; and activating the fourth standardized result to obtain the second mask weight.

Optionally, the extracting module is further configured to convolve the color image through a third convolution network in the feature extractor to obtain a fifth convolution result; performing batch standardization processing on the fifth convolution result to obtain a fifth standardization result; activating the fifth standardized result to obtain a third activation result; convolving the third activation result through a fourth convolution network in the feature extractor to obtain a sixth convolution result; performing batch standardization processing on the sixth convolution result to obtain a sixth standardization result; and activating the sixth standardized result to obtain the first image feature.

Optionally, the acquiring module is further configured to acquire polarized light images in respective preset directions, where each polarized light image in the preset direction is an image obtained by shooting the target object when the target object is irradiated by a light source disposed in the preset direction; and obtaining a normal vector diagram according to the photometric stereo synthesis algorithm and the polarized light images of each of the plurality of preset directions.

Alternatively, the color image is an image obtained by photographing the object in the case where the object is irradiated simultaneously by the light sources provided in a plurality of preset directions.

Optionally, the first mask weight and the second mask weight respectively include mask weights corresponding to each local area in the target object; mask weights corresponding to each local region are associated with the respective morphology structures of each local region.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the methods described above.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having program code stored therein, wherein the program code, when executed by a processor, performs the method described above.

In a fifth aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the electronic device to perform the method described above.

According to the defect detection method, the device, the electronic equipment and the storage medium for the target object, the first mask weight corresponding to the first image feature and the second mask weight corresponding to the second image feature are determined, the first image feature and the second image feature are fused according to the first mask weight and the second mask weight to obtain the target fusion feature, the target fusion feature is combined with the image features of the color image and the normal vector image, the three-dimensional structure of the target object can be more accurately represented, and therefore the accuracy of the defect detection result of the target object obtained according to the target fusion feature is higher.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic diagram of an application scenario proposed by an embodiment of the present application;

FIG. 2 is a flow chart illustrating a method for detecting defects of an object according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an image capturing device according to an embodiment of the present application;

FIG. 4 is a schematic view of a color image in an embodiment of the application;

FIG. 5 shows a schematic representation of a normal vector diagram in an embodiment of the application;

FIG. 6 is a flowchart of a method for detecting defects of an object according to another embodiment of the present application;

FIG. 7 is a flowchart of a method for detecting defects of an object according to still another embodiment of the present application;

FIG. 8 is a flowchart of a method for detecting defects of an object according to still another embodiment of the present application;

Fig. 9 is a schematic view showing a defect detection process of a battery in an embodiment of the present application;

FIG. 10 is a block diagram of a defect detection apparatus for an object according to an embodiment of the present application;

fig. 11 shows a block diagram of an electronic device for performing a defect detection method of an object according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the application, are within the scope of the application in accordance with embodiments of the present application.

In the following description, the terms "first", "second", and the like are merely used to distinguish between similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", or the like may be interchanged with one another, if permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

It should be noted that: references herein to "a plurality" means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

In view of the technical problems mentioned in the background art, the inventor finds that although a depth network can be trained through a color image of an object with defects to obtain a defect detection model capable of identifying the defects of the object, and then the color image of the object to be detected is detected through the defect detection model to obtain a defect detection result of a target object, for soft package materials such as lithium batteries, the accuracy of the defect detection result obtained by detecting through the method is low under common RGB color image imaging, and defect omission is easily caused. Accordingly, the inventors have proposed a defect detection method for an object in the present application.

The application discloses a defect detection method and device of a target object, electronic equipment and a storage medium, and relates to artificial intelligence machine learning, cloud technology and the like.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

Cloud storage (cloud storage) is a new concept that extends and develops in the concept of cloud computing, and a distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network to work cooperatively through application software or application interfaces through functions such as cluster application, grid technology, and a distributed storage file system, so as to provide data storage and service access functions for the outside.

At present, the storage method of the storage system is as follows: when creating logical volumes, each logical volume is allocated a physical storage space, which may be a disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as a data Identification (ID) and the like, the file system writes each object into a physical storage space of the logical volume, and the file system records storage position information of each object, so that when the client requests to access the data, the file system can enable the client to access the data according to the storage position information of each object.

The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided into stripes in advance according to the set of capacity measures for objects stored on a logical volume (which measures tend to have a large margin with respect to the capacity of the object actually to be stored) and redundant array of independent disks (RAID, redundant Array of Independent Disk), and a logical volume can be understood as a stripe, whereby physical storage space is allocated for the logical volume.

As shown in fig. 1, an application scenario to which the embodiment of the present application is applicable includes a terminal 20 and a server 10, where the terminal 20 and the server 10 are connected through a wired network or a wireless network. The terminal 20 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart home appliance, a vehicle-mounted terminal, an aircraft, a wearable device terminal, a virtual reality device, and other terminal devices capable of page presentation, or other applications (e.g., instant messaging applications, shopping applications, search applications, game applications, forum applications, map traffic applications, etc.) capable of invoking page presentation applications.

The server 10 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligent platforms, and the like. The server 10 may be used to provide services for applications running at the terminal 20.

The terminal 20 may send the color image and the normal vector image of the target object to the server 10, and the server 10 may determine a defect detection result of the target object according to the color image and the normal vector image of the target object, and then feed back the defect detection result of the target object to the terminal 20.

In the present application, the target may refer to an object to be detected in a certain target category, and the target category may refer to at least one of a battery (e.g., a lithium battery), a metal part, and wooden furniture.

In some possible implementations, the server 10 may obtain a weight predictor, a target detection model (a model for detecting defects of an article), a backbone network, a classifier, a feature extractor, and the like according to training of training samples, and store the weight predictor, the target detection model, the backbone network, the classifier, and the feature extractor obtained by training in a local storage space, so that after the terminal 20 sends a color image and a normal vector image of the target object to the server 10, the server 10 obtains a defect detection result of the target object according to the color image and the normal vector image of the target object.

In another embodiment, the server 10 may send the weight predictor, the target detection model, the backbone network, the classifier, and the feature extractor obtained by training to the terminal 20, where the terminal 20 stores the weight predictor, the target detection model, the backbone network, the classifier, and the feature extractor obtained by training locally, and obtains the defect detection result of the target according to the color image and the normal vector image of the target after the terminal 20 obtains the color image and the normal vector image of the target.

Alternatively, the server 10 may store the weight predictor, the target detection model, the backbone network, the classifier, and the feature extractor obtained by training in the cloud storage system, and when the defect detection method of the target object of the present application is executed, the terminal 20 obtains the weight predictor, the target detection model, the backbone network, the classifier, and the feature extractor of the object from the cloud storage system.

For convenience of description, in the following embodiments, description will be made by taking an example in which defect detection of a target object is performed by an electronic apparatus.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for detecting defects of an object according to an embodiment of the present application, where the method may be applied to an electronic device, and the electronic device may be the terminal 20 or the server 10 in fig. 1, and the method includes:

s110, acquiring a color image and a normal vector diagram of the target object.

As described above, the target object refers to an object to be detected of a target class, for example, the target class is a lithium battery, the target object is a lithium battery to be detected, and for example, the target class is a screw, and the target object is a screw to be detected. The object of the application may be a solid article with or without packaging.

In the application, the object can be shot by a high-definition camera built in the electronic equipment to obtain a color image and a normal vector diagram. The object can be shot through an external camera to obtain a color image and a normal vector diagram, and the camera sends the obtained color image and the normal vector diagram to the electronic equipment.

As one possible implementation manner, the method for acquiring the color image of the target object may include: in the case where the target is simultaneously irradiated by the light sources provided in a plurality of preset directions, a color image of the target obtained by photographing the target is captured. The plurality of preset directions may refer to four directions of up, down, left and right of the target object when the target object is viewed from above.

The color image may be acquired by an image acquisition device, as shown in fig. 3, which may be located in a specific laboratory that is not affected by the external ambient light. The acquisition target may be placed on a horizontal operation table 31, a camera 32 may be provided directly above the operation table, and light sources may be provided in a plurality of predetermined directions of the operation table, respectively, the light sources including four light sources respectively provided on an upper side 33, a lower side 34, a left side 35, and a right side 36 of the target (the target is viewed from the camera direction).

After the object is placed on the console 31, a plurality of light sources in a predetermined direction are simultaneously turned on (simultaneously turned on 33, 34, 35, and 36), and in the case where a plurality of light sources in a predetermined direction are simultaneously turned on, the object is photographed by the camera 32, and a color image of the object is obtained. The color image is then transmitted by the image acquisition device to the electronic device.

For example, when the target object is a battery to be detected, a color image acquired by the image acquisition device shown in fig. 3 is shown in fig. 4.

As still another embodiment, a method for obtaining a normal vector diagram of a target object includes: acquiring polarized light images of each of a plurality of preset directions, wherein the polarized light image of each preset direction is an image obtained by shooting a target object under the condition that the target object is irradiated by a light source arranged in the preset direction; and obtaining a normal vector diagram according to the photometric stereo synthesis algorithm and the polarized light images of each of the plurality of preset directions.

The polarized image may be acquired by an image acquisition device as shown in fig. 3. Only a light source in a certain preset direction can be started to obtain a polarized image in the preset direction. For example, turning on only light source 33, turning off light sources 34, 35, and 36, resulting in a corresponding polarized image x1; only the light source 34 is turned on, and the light sources 33, 35 and 36 are turned off, so that a corresponding polarized image x2 is obtained; only the light source 35 is turned on, and the light sources 34, 33 and 36 are turned off, so that a corresponding polarized image x3 is obtained; only the light source 36 is turned on, and the light sources 34, 35, and 33 are turned off, resulting in a corresponding polarized image x4.

After obtaining polarized light images of each of a plurality of preset directions, a normal vector diagram for a target object is determined according to a photometric stereo synthesis algorithm (so-called photometric stereo method is a method of calculating a directional gradient of an object surface from light intensities of a plurality of images photographed with different light source directions to obtain three-dimensional information of the image).

For example, when the target object is a battery to be detected, the image acquisition device shown in fig. 3 acquires polarized images x1, x2, x3 and x4, and determines a normal vector diagram according to the acquired polarized images x1, x2, x3 and x4, where the normal vector diagram is shown in fig. 5.

If the sizes of the obtained polarized image and the photographed color image are different, the normal vector image is obtained from the polarized image, and then the normal vector image and the color image are subjected to clipping and/or scaling processing to obtain the normal vector image and the color image with the same size.

For example, the size of a directly photographed color image is 1000×800, the size of a normal vector image obtained by photographing a polarized image is 900×800, and the color image of 1000×800 may be scaled to obtain a color image of 900×800, the color image of 900×800 being the color image acquired in S110, and the normal vector image of 900×800 being the normal vector image acquired in S110.

S120, extracting image features of the color image as first image features, and extracting image features of the vector diagram as second image features.

And extracting the characteristics of the color image through a trained characteristic extraction model to obtain the image characteristics serving as first image characteristics, and extracting the image characteristics of the vector diagram serving as second image characteristics through the characteristic extraction model.

The feature extraction model may be a model that is obtained by training a training sample and that can extract image features of the target object. The feature extraction model in the initial state (the parameters of the model are initialized) can be trained by the first sample image of the first sample object belonging to the same category as the target object (the first sample object refers to an object that can be used as a training sample), which can include a color image and a normal vector image of the first sample object, and the image features of the first sample image, to obtain the feature extraction model. The feature extraction model of the initial state may include a plurality of convolutional networks and a plurality of processes, for example, the feature extraction model of the initial state may include two convolutional networks connected in series (an output of a previous convolutional network serves as an input of a next convolutional network), and the feature extraction model of the initial state may further include a process of batch normalization (batch normalization, abbreviated BN) and activation of a result after each convolutional network process.

In the present application, for the same class of articles, feature extraction models for extracting image features of the articles may be common, and feature extraction models for different classes of articles may be different. Feature extraction models of different types of articles are obtained through corresponding sample training.

For example, when the target object is a battery, the feature extraction model of the initial state can be trained through a sample image of the battery with the defect and the image features of the sample image to obtain the feature extraction model for the battery; for another example, when the target object is a screw, the feature extraction model of the initial state can be trained through the sample image of the screw with the defect and the image features of the sample image, so as to obtain the feature extraction model for the screw.

S130, determining a first mask weight corresponding to the first image feature and a second mask weight corresponding to the second image feature.

The first mask weight characterizes a degree of importance of the first image feature and the second mask weight characterizes a degree of importance of the second image feature. The first mask weight and the second mask weight respectively comprise mask weights corresponding to all local areas in the target object; mask weights corresponding to each local region are associated with the respective morphology structures of each local region. The target object may be divided into a plurality of different local areas according to the morphological structure of the target object, and the morphological structure of the same local area may be similar or identical.

The morphology of the different local areas is different, the corresponding mask weights in the first mask weights are different, the morphology of the different local areas is different, and the corresponding mask weights in the second mask weights are also different. The imaging characteristics (which may be, for example, exposure or sharpness) of the color image and the normal vector image differ for the same local region, such that the mask weights of the local region in the first mask weight and in the second mask weight differ.

For example, when the target is a lithium battery, the tab area (the tab area is a metal conductor which leads out the positive and negative poles from the battery core of the battery, so that the tab area of the positive and negative poles of the battery is a contact point in charge and discharge, which is not a copper sheet on the appearance of the battery but a connection inside the battery, is seen by us), is bent to a large extent, which easily causes overexposure or underexposure of the tab area in a color image, so that defects of the tab area are inaccurate in color, and the tab area can be stably and clearly represented by a normal vector diagram. Therefore, the proportion of defects in the tab region in the color image can be reduced through the mask weight corresponding to the tab region in the first mask weight, and the proportion of defects in the tab region in the vector image can be increased through the mask weight corresponding to the tab region in the second mask weight.

As an implementation manner, the morphological structure of different local areas can be determined, then the imaging characteristics of the different local areas in the color image and the normal vector image can be determined according to the morphological structure of the different local areas, the mask weights of the different local areas aiming at the color image can be determined according to the imaging characteristics of the different local areas in the color image, the mask weights of the different local areas aiming at the normal vector image can be determined according to the imaging characteristics of the different local areas in the normal vector image, then the first mask weights can be obtained according to the mask weights of the different local areas aiming at the color image, and the second mask weights can be obtained according to the mask weights of the different local areas aiming at the normal vector image.

In another possible implementation manner, the first image feature may be input to the weight predictor to obtain the first mask weight output by the weight predictor, and similarly, the second image feature may be input to the weight predictor to obtain the second mask weight output by the weight predictor. The weight predictor obtained through training can learn the importance degree of the local areas of different forms of structures in the color image and the normal vector diagram, so that the effect of giving different weight masks to the local areas of different forms of structures is achieved.

The weight predictor is used for predicting the weights of the image features of the color image and the weights of the image features of the normal vector diagram. The image feature of the color image of a second sample article belonging to the same class as the target article (which may be different from the first sample article) may be acquired as the first sample image feature, and the image feature of the normal vector of the article belonging to the same class as the target article may be acquired as the second sample image feature; and acquiring preset weights of the first sample image feature (the preset weights may be values set by a user according to actual conditions and may accurately represent the importance degree of the first sample image feature) and preset weights of the second sample image feature (the preset weights may be values set by a user according to actual conditions and may accurately represent the importance degree of the second sample image feature), and training a weight predictor (a parameter initialized weight predictor) in an initial state through the first sample image feature, the second sample image feature, the preset weights of the first sample image feature and the preset weights of the second sample image feature to obtain the weight predictor.

The weight predictor of the initial state may include a plurality of convolution networks and a plurality of processes, for example, the weight predictor of the initial state may include two convolution networks connected in series (the output of the previous convolution network is used as the input of the next convolution network), and the weight predictor of the initial state may further include a process of sequentially performing batch normalization (batch normalization, abbreviated as BN) and activation processing on the processed result of each convolution network.

In this embodiment, the first mask weight and the first image feature may be both characterized by a vector, the first mask weight output by the weight predictor may be a mask corresponding to the first image feature, each of the first mask weight and the first image feature includes a plurality of elements, each element in the first mask weight characterizes a weight of a pixel point corresponding to the element in the color image, and each element in the first image feature is a feature value of the pixel point corresponding to the element in the color image.

For example, the color image is an H W color image, and the corresponding first image feature is characterized by X ε R ^C×H×W And the corresponding first mask weight is W _X ∈R ^H×W C is the number of characteristic channels, H is the height of the color image, and W is the width of the color image. That is, each pixel of the color image corresponds to C elements in the first image feature, and the C elements corresponding to each pixel in the first image feature corresponds to one element of the first mask weight.

Similarly, in this embodiment, the second mask weight and the second image feature may be both characterized by a vector, the second mask weight output by the weight predictor may be a mask corresponding to the second image feature, where each of the second mask weight and the second image feature includes a plurality of elements, each of the elements in the second mask weight characterizes a weight of a pixel point corresponding to the element in the normal vector diagram, and each of the elements in the second image feature is a feature value of the pixel point corresponding to the element in the normal vector diagram.

For example, the normal vector image is a color image of H W, and the corresponding second image feature is characterized as X _N ∈R ^C×H×W And the corresponding second mask weight is W _XN ∈R ^H×W C is the number of characteristic channels, H is the height of the normal vector diagram, and W is the width of the normal vector diagram. That is, each pixel point of the normal vector graph corresponds to C elements in the second image feature, and the C elements in the second image feature corresponding to each pixel point correspond to one element of the second mask weight.

And S140, fusing the first image features and the second image features according to the first mask weights and the second mask weights to obtain target fusion features.

After the first mask weight and the second mask weight are obtained, the first image feature and the second image feature are fused through the first mask weight and the second mask weight, and the target fusion feature is obtained, wherein the target fusion feature not only comprises feature information which can be represented by a color image, but also comprises feature information which can be represented by a normal vector diagram.

The elements of each characterization weight in the first mask weight can be adjusted to obtain adjusted first image features, the elements of each characterization weight in the second mask weight are similarly adjusted to obtain adjusted second image features, and the adjusted first image features and the adjusted second image features are summed or overlapped to obtain the target fusion feature.

S150, determining a defect detection result of the target object according to the target fusion characteristics.

The method comprises the steps of obtaining target fusion characteristics, processing the target fusion characteristics through a defect detection model to obtain a result output by the defect detection model as a defect detection result of a target object, wherein the defect detection result of the target object can comprise probability of a defect type of the target object (probability of the target object comprising defects of the defect type) and a selection frame for selecting the defects, the selection frame can be rectangular, and the defect detection model aims at a plurality of defect types of the object of the target type.

The method comprises the steps of obtaining image features of a color image of a third sample object (which can be different from a first sample object and a second sample object) belonging to the same category as a target object, obtaining image features of a normal vector diagram of the sample object as fourth sample image features, determining a first sample mask weight corresponding to the third sample image features and a second sample mask weight corresponding to the fourth sample image features through a weight predictor, fusing the third sample image features and the fourth sample image features through the first sample mask weight and the second sample mask weight to obtain a first sample fusion feature, and training an initial state defect detection model (a parameter initialized model) through the first sample fusion feature, a defect category of the sample object and a sample selection frame of the sample object (wherein the sample selection frame can be a color image for the sample object and/or a selection frame of a defect in the normal vector diagram of the sample object).

As a possible implementation manner, a threshold value may be set for each defect category, and a defect detection result of the target object is obtained according to the probability of the defect category output by the defect detection model and the comparison result of the threshold value. And when the probability of the defect type of the target object reaches the threshold value of the defect type, obtaining a defect detection result of the defect of the target object with the defect type.

For example, the target is a battery, the result output by the defect detection model is a tab area (the tab area is a metal conductor which leads out the positive electrode and the negative electrode from the battery core of the battery, so-called a contact point of the positive electrode and the negative electrode of the battery during charge and discharge, the contact point is not a copper sheet on the appearance of the battery which is seen by people, but is a connection inside the battery), the probability of bending is 0.7, the threshold value of the defect type is 0.65, and the defect detection result of the target is that the tab area of the battery is bent.

In this embodiment, a first mask weight corresponding to the first image feature and a second mask weight corresponding to the second image feature are determined, and the first image feature and the second image feature are fused according to the first mask weight and the second mask weight to obtain a target fusion feature, and the target fusion feature combines the image features of the color image and the normal vector image, so that the target fusion feature can accurately represent the three-dimensional structure of the target object, and the accuracy of the defect detection result of the target object obtained according to the target fusion feature is higher.

Referring to fig. 6, fig. 6 is a flowchart illustrating a defect detection method for an object according to another embodiment of the present application, where the method may be applied to an electronic device, and the electronic device may be the terminal 20 or the server 10 in fig. 1, and the method includes:

s210, determining a first image feature corresponding to the color image and a second image feature corresponding to the normal vector diagram through a feature extractor.

In this embodiment, the feature extractor includes a two-layer structure, the output of the first layer is used as the input of the second layer, each layer structure includes three processes of convolution processing (the convolution processing of the first layer structure is implemented by the third convolution network, the convolution processing of the second layer structure is implemented by the fourth convolution network), batch normalization processing, and activation processing, where each layer structure is denoted as ReLU (conn ()), where Conv is the convolution network, bn is batch normalization batch normalization, and ReLU is an activation function.

Extracting, by a feature extractor, a first image feature of a color image, comprising: convolving the color image through a third convolution network in the feature extractor to obtain a fifth convolution result; performing batch standardization processing on the fifth convolution result to obtain a fifth standardization result; activating the fifth standardized result to obtain a third activation result; convolving the third activation result through a fourth convolution network in the feature extractor to obtain a sixth convolution result; performing batch standardization processing on the sixth convolution result to obtain a sixth standardization result; and activating the sixth standardized result to obtain the first image feature.

The color image x is input into a feature extractor, the first layer structure of the feature extractor processes the color image to obtain a third activation result, and the third activation result is taken as the input of the second layer structure of the feature extractor to obtain the output of the second layer structure of the feature extractor as the first image feature f _α (x) The whole process can be expressed as: f (f) _α (x)∈R ^C×H×W Wherein f _α Is the operation process of the feature extractor.

Similarly, extracting, by the feature extractor, the second image feature of the vector specifically includes: convolving the normal vector diagram through a third convolution network in the feature extractor to obtain a seventh convolution result; performing batch standardization processing on the seventh convolution result to obtain a seventh standardization result; activating the seventh standardized result to obtain a fourth activated result; convolving the fourth activation result through a second convolution network in the feature extractor to obtain an eighth convolution result; performing batch standardization processing on the eighth convolution result to obtain an eighth standardization result; and activating the eighth standardized result to obtain a second image feature.

Normal vector diagram x _N Inputting the feature extractor, processing the normal vector diagram by the first layer structure of the feature extractor to obtain a fourth activation result, and taking the fourth activation result as the input of the second layer structure of the feature extractor to obtain the output of the second layer structure of the feature extractor as the second image feature f _α (x _N ) The whole process can be expressed as: f (f) _α (x _N )∈R ^C ^×H×W 。

S220, determining a first mask weight corresponding to the first image feature and a second mask weight corresponding to the second image feature through a weight predictor.

In this embodiment, the weight predictor includes a two-layer structure, the output of the first layer is used as the input of the second layer, each layer structure includes three processes of convolution processing (the convolution processing of the first layer structure is implemented by the first convolution network, the convolution processing of the second layer structure is implemented by the second convolution network), batch normalization processing, and activation processing, and at this time, each layer structure is represented as Sigmoid (bn (Conv (s))), where Conv is the convolution network, bn is batch normalization batch normalization, and Sigmoid is an activation function.

The specific process of determining the first mask weight may include: convolving the first image feature through a first convolution network to obtain a first convolution result; performing batch standardization processing on the first convolution result to obtain a first standardization result; activating the first standardized result to obtain a first activation result; convolving the first activation result through a second convolution network to obtain a second convolution result; performing batch standardization processing on the second convolution result to obtain a second standardization result; and activating the second standardized result to obtain the first mask weight.

First image feature f _α (x) Inputting a weight predictor, processing the first image feature by a first layer structure of the weight predictor to obtain a first activation result, and taking the first activation result as the input of a second layer structure of the weight predictor to obtain the output of the second layer structure of the weight predictor as a first mask weight f _β (f _α (x)) The whole process can be expressed as: f (f) _β (f _α (x))∈R ^H×W Wherein f _β Is the operation process of the weight predictor.

Similarly, the specific process of determining the second mask weight may include: convolving the second image feature through the first convolution network to obtain a third convolution result; performing batch standardization processing on the third convolution result to obtain a third standardization result; activating the third standardized result to obtain a second activation result; convolving the second activation result through a second convolution network to obtain a fourth convolution result; performing batch standardization processing on the second convolution result to obtain a fourth standardization result; and activating the fourth standardized result to obtain the second mask weight.

Second image feature f _α (x _N ) Inputting the weight predictor, processing the first image feature by the first layer structure of the weight predictor to obtain a second activation result, and taking the second activation result as the input of the second layer structure of the weight predictor to obtain the output of the second layer structure of the weight predictor as the second mask weight f _β (f _α (x _N ) The overall process can be expressed as: f (f) _β (f _α (x _N ))∈R ^H×W 。

And S230, fusing the first image features and the second image features according to the first mask weights and the second mask weights to obtain target fusion features.

In this embodiment, the sum of the first mask weight and the preset first value may be calculated as a first sum value; multiplying the first sum value and a first image feature element by element to obtain a first product, wherein the first image feature comprises a plurality of elements, and each element in the first image feature is a feature value of a pixel point corresponding to the element in the color image; calculating the sum of the second mask weight and a preset second value to be used as a second sum value; multiplying the second sum value and the normal vector graph feature element by element to obtain a second product, wherein the second image feature comprises a plurality of elements, and each element in the second image feature is a feature value of a pixel point corresponding to the element in the normal vector graph; the sum of the first product and the second product is calculated as a target fusion feature.

The fusion process can be expressed as formula one, which is as follows:

X＝f _F (x,x _N )＝f _α (x)⊙(1+f _β (f _α (x)))+f _α (x _N )⊙(1+f _β (f _α (x _N ) (1))

Wherein X is the target fusion feature, and by Element-wise multiplied by Element-wise Product symbol. Wherein the preset first value and the preset second value can be 1, and f is calculated according to the formula one _β (f _α (x _N ) And f) _β (f _α (x) Respectively 1) to prevent feature f when fusion processing according to the procedure described above _α (x) The values are too small to affect subsequent feature delivery.

S240, inputting the target fusion characteristics into a target detection model to obtain a target defect type of a target object and a target detection frame of the target defect type; and obtaining a defect detection result according to the target defect type and a target detection frame of the target defect type.

In this embodiment, the target detection frame refers to a selection frame for selecting a defect of the target object. The target detection model is obtained by training the initial detection model based on the sample image, the defect type corresponding to the sample image and the truth box corresponding to the sample image, wherein the truth box refers to a selection box for selecting the defect of an article (the article and the target object belong to the same type of article) in the sample image, and the truth box is obtained by labeling (can be a manually labeled selection box). The initial detection model may be Faster RCNN, FCOS, VFNet, and the like.

The target fusion characteristics can be input into a target detection model to obtain the target defect type output by the target detection modelTarget detection frame->The said crossThe process is expressed as follows: /> Wherein f _D Is the processing procedure of the target detection model.

After the target defect type and the target detection frame of the target defect type output by the target detection model are obtained, the target defect type and the target detection frame of the target defect type can be obtained as defect detection results.

In the present application, a parameter-initialized feature extractor may be obtained as an initial feature extractor, a parameter-initialized weight predictor may be obtained as an initial weight predictor, and a parameter-initialized detection model (for example, initialized fast RCNN, FCOS, and VFNet) may be obtained as an initial detection model.

Obtaining a color sample image of a fourth sample object with defects, a normal vector sample image of the fourth sample object, a true defect type of the fourth sample object and a truth box of the fourth sample object, extracting fifth sample image features corresponding to the color sample image of the fourth sample object and sixth sample image features corresponding to the normal vector sample image of the fourth sample object through an initial feature extractor, inputting the fifth sample image features into an initial weight predictor to obtain a third sample mask weight output by the initial weight predictor, inputting the sixth sample image features into the initial weight predictor to obtain a fourth sample mask weight output by the initial weight predictor, fusing the fifth sample image features and the sixth sample image features through the third sample mask weight and the fourth sample mask weight (fusing according to a method of a formula I), obtaining second sample fusion features, and processing the second sample fusion features through an initial detection model to obtain a predicted defect type and a predicted selection box of the fourth sample object.

And then, calculating a cross entropy loss value as a first loss value through the true defect type of the fourth sample article, the truth box of the fourth sample article, the predicted defect type of the fourth sample article and the predicted selection box, and training an initial feature extractor, an initial weight predictor and an initial detection model through the first loss value to obtain the feature extractor, the weight predictor and the target detection model.

It should be noted that, the defect types of the articles belonging to the same type as the target object may include a plurality of defect types, each defect type obtains a corresponding color sample image of the fourth sample article, a normal vector sample image of the fourth sample article, a true defect type of the fourth sample article, and a truth box of the fourth sample article, and trains the initial detection model through the color sample image of the fourth sample article, the normal vector sample image of the fourth sample article, the true defect type of the fourth sample article, and the truth box of the fourth sample article, and traverses all the defect types to obtain the target detection model.

For example, the target is a lithium battery, all defect types of the lithium battery comprise 10, a sample lithium battery under one defect type is obtained, an initial detection model is trained through a color sample image, a normal vector sample image, a true defect type and a true box of the sample lithium battery under the defect type, after sample training under the defect type is completed, one defect type is obtained from the rest 9 defect types, and the training process of the initial detection model is repeated through the color sample image, the normal vector sample image, the true defect type and the true box of the sample lithium battery under the defect type until all 10 defect types are traversed, so that the target detection model is obtained.

In this embodiment, the feature extractor obtained through training may extract the first image feature and the second image feature with higher accuracy, and determine the first mask weight and the second mask weight with higher accuracy through the weight predictor, so that the target fusion feature obtained according to the first image feature, the second image feature, the first mask weight and the second mask weight may more accurately represent the spatial shape of the target object, thereby improving the accuracy of the target fusion feature, and further improving the accuracy of the defect detection result.

Referring to fig. 7, fig. 7 is a flowchart illustrating a defect detection method for an object according to another embodiment of the present application, where the method may be applied to an electronic device, and the electronic device may be the terminal 20 or the server 10 in fig. 1, and the method includes:

s310, acquiring a first image feature corresponding to a color image of a target object and a second image feature corresponding to a normal vector diagram; and determining target fusion characteristics according to the first image characteristics and the second image characteristics.

The description of S310 refers to the descriptions of S210 to S230, and will not be repeated here.

S320, inputting the target fusion characteristics into a target detection model to obtain a target defect type of the target object and a target detection frame of the target defect type.

The description of S320 refers to the description of S240 above, and will not be repeated here.

S330, acquiring the characteristics of a target area selected by a target detection frame in the normal vector diagram as small sample characteristics.

After the target defect type and the target detection frame of the target defect type output by the target detection model are obtained, a local area selected by the target detection frame in the normal vector frame can be determined to be used as a target area, and feature extraction (for example, feature extraction is performed by the feature extractor) is performed on the target area, so that small sample features are obtained.

As one embodiment, small sample features may be extracted through networks of the resnet series (various variants of resnet), NAS network series (RegNet), mobilet series, dark net series, HRNet series, transformer series, and convnex, among others.

S340, determining a defect detection result of the target object according to the small sample characteristics.

After the small sample characteristics are obtained, further defect analysis is carried out on the small sample characteristics, and a defect detection result of the target object is obtained. The classification of the defect grade in the target area can be performed through the small sample characteristics, a grade classification result is obtained, and the grade classification result is used as a defect detection result of the target object.

For example, the target object is a battery, and the target defect class and the target detection frame are determined according to the above process, and the small sample characteristics of the battery are determined through the target detection frame, wherein the target defect class is classified into 3 grades: level 1, level 2 and level 3. And further processing the small sample characteristics to obtain a defect grade of 2, wherein the obtained defect detection result is that the battery has the defect of the target defect type, and the defect grade is 2.

And further processing the small sample characteristics through a small sample detection model to obtain the defect grade of the target object, wherein the defect grade is used as a defect detection result of the target object. The small sample detection model is obtained through sample images of different defect grades under the target defect category and defect grade training corresponding to the sample images of different grades.

In this embodiment, the target area is determined by the target detection frame, and the small sample features of the target area are further analyzed to determine the final defect detection result, so that the accuracy of the defect detection result is further improved.

Referring to fig. 8, fig. 8 is a flowchart illustrating a defect detection method for an object according to another embodiment of the present application, where the method may be applied to an electronic device, and the electronic device may be the terminal 20 or the server 10 in fig. 1, and the method includes:

S410, acquiring a first image feature corresponding to a color image of a target object and a second image feature corresponding to a normal vector diagram; determining a target fusion feature according to the first image feature and the second image feature; inputting the target fusion characteristics into a target detection model to obtain a target defect type of a target object and a target detection frame of the target defect type; and acquiring the characteristics of the target area selected by the target detection frame in the normal vector diagram as small sample characteristics.

The description of S410 refers to the descriptions of S310 to S330 above, and will not be repeated here.

S420, acquiring respective support characteristics of each support set in a plurality of support sets through a backbone network.

Wherein one support set corresponds to one of a plurality of defect levels, each support set including at least one normal vector diagram, the plurality of defect levels being for a target defect class. Wherein the number of sample normal vector maps in each support set may be the same. The sample normal vector map may refer to a normal vector map as a sample. The process of obtaining the normal vector diagram of the sample refers to the description of the process of obtaining the normal vector diagram, and will not be repeated.

For example, the target defect class is a, the target defect class includes 3 defect levels a1, a2, and a3, the defect level a1 corresponds to a support set including 10 sample normal vector diagrams, the defect level a2 corresponds to a support set including 10 sample normal vector diagrams, and the defect level a3 corresponds to a support set including 10 sample normal vector diagrams.

The backbone network may be used to extract the features of the target region and the sample normal vector diagram in the support set, which may be a resnet series (various variants of resnet), NAS network series (RegNet), mobilet series, dark net series, HRNet series, transformer series, convNeXt, and the like.

Extracting features of the target area through a backbone network to obtain small sample features, extracting features of each sample normal vector diagram in the support set through the backbone network to obtain image features of each sample normal vector diagram, and taking the image features as third image features of each sample normal vector diagram; the respective support features of each support set refer to the set of third image features of the respective sample normal vector diagram within that support set.

For example, the support set includes 20 sample normal vector images, and the support features of the support set refer to a set of 20 third image features corresponding to the 20 sample normal vector images.

S430, carrying out alignment processing on each support feature and the small sample feature to obtain a target alignment result of the defect level corresponding to each support feature.

And carrying out feature alignment treatment on the support features of each support set and the small sample features to obtain aligned results, wherein the aligned results are used as target alignment results of defect grades corresponding to each support feature. For example, for the defect level B1 of the defect class B, the support set performs feature alignment processing on the support feature corresponding to the support set and the small sample feature corresponding to the defect class B to obtain an alignment result, where the alignment result is used as a target alignment result of the defect level B1.

As an embodiment, S430 may include: determining a degree of correlation between each third image feature of the support features and the small sample feature as a respective target weight for each third image feature of the support features; and according to the respective target weights of the third image features in the support features, carrying out weighted summation on the third image features in the support features to obtain target alignment results of defect levels corresponding to the support features.

The support features corresponding to support set k can be expressed asWherein i is a third image feature corresponding to the ith normal vector diagram in the support set k, the small sample feature is represented as Q, and the target alignment result corresponding to the support set k(·) ^T Is a matrix transpose,/->c is the number of feature channels, h is the height, w is the width, and K is the total number of support sets. Wherein (1)>Calculate feature pair +.>Correlation (may be cosine similarity, etc.) between them; use of relevance +.>As a weight, pair->Recombining to obtain aligned characteristic P ^k 。

S440, determining the similarity between the target alignment result of each defect level and the small sample characteristic through a classifier, and taking the similarity as a prediction target value of each defect level.

The target alignment result and the small sample feature of each defect level are input into a classifier, and the similarity between the target alignment result and the small sample feature of each defect level is determined through the classifier (which may be a cosine similarity classifier) as a prediction target value of each defect level.

In the application, the backbone network initialized by the parameters can be obtained as an initial backbone network, and the classifier initialized by the parameters can be obtained as an initial classifier. For each preset defect category, a plurality of defect levels of the defect category are determined, and a plurality of normal vector diagrams of samples are acquired for each defect level as a sample set of the defect levels.

The training process for each preset defect class includes:

can obtain the sample normal vector diagram at y _b (y _b Refers to the labeled truth box) image block x _b As target samples, a data set consisting of target samples is denoted as D _total A total of J defect levels are included; for data set D _total Random data sampling is carried out, 80% of the data is divided into a training set D _train The remaining 20% of the data were used as test set D _test 。

From D _train Randomly extracting J-wayM-shot data as a suppltSet (data amount is J.times.M samples), and then taking D as a reference _test Extracting N samples from each of J defect levels as Queryset (data amount is J.times.N samples); repeating the process extracted by the steps for t times to obtain t batches of samples.

And extracting image features of the SupportSet through the initial backbone network for samples of each batch to obtain fifth image features, and extracting the image features of the Queryset through the initial backbone network to obtain sixth image features. And determining the correlation degree between the fourth image feature and the fifth image feature at the defect level aiming at each defect level, taking the correlation degree as the respective target weight of each fourth image feature, and carrying out weighted summation on each fourth image feature according to the respective target weight of each fourth image feature to obtain a sample alignment result of the defect level.

Determining, by an initial classifier, a similarity between a sample alignment result of each defect level and a fifth image feature as a sample prediction value of each defect levelSample prediction value according to each defect level +.>True value y of QuerySet _c The cross entropy loss is calculated to obtain the corresponding classifier loss, so that gradient descent training is performed to update the whole model (the initial backbone network and the initial classifier). Wherein, the loss formula is expressed as: />Wherein L is _M For cross entropy loss of classification model, CE (·) is cross entropy loss function, ++>Is the predicted result of the defect level, y _c Is the true value of the corresponding class label.

In this embodiment, the trained backbone network, classifier, and alignment process may be used as a small sample detection model. The small sample detection model is used for obtaining the prediction target value of each of the plurality of defect levels.

S450, determining a defect detection result of the target object according to the predicted target values of the defect levels.

The defect detection result of the target object can be determined according to the magnitude relation of the predicted target values of the defect levels. For example, a defect level at which the predicted target value is the largest is determined as the defect detection result of the target object.

In some embodiments, for each defect level, a level threshold may be set (the level threshold of each defect level may be different or the same), and the defect detection result of the target object is obtained through the predicted target values and the level thresholds of each of the plurality of defect levels: obtaining the defect grade with the maximum predicted target value as the selected defect grade; and if the selected defect grade does not reach the grade threshold value aiming at the target defect, obtaining a defect detection result of passing the target object quality inspection. If the selected defect level reaches the level threshold for the target defect, obtaining a defect detection result that the target substance fails to pass.

For example, the defect level of the target object includes 1 level, 2 level and 3 level, the obtained prediction target value includes 1 level prediction target value 0.2, 2 level prediction target value 0.5 and 3 level prediction target value 0.3, and if the level threshold of each defect level is 0.6, the 2 level corresponding to the obtained prediction target value 0.5 is the selected defect level, and the prediction target value 0.5 of the selected defect level 2 level does not exceed the corresponding level threshold 0.6, and the defect detection result that the target object quality inspection passes is determined.

For another example, the defect level of the target object includes 1 level, 2 level and 3 level, the obtained prediction target value includes 1 level prediction target value 0.2, 2 level prediction target value 0.8 and 3 level prediction target value 0.3, and the level threshold of each defect level is 0.6, then the 2 level corresponding to the obtained prediction target value 0.8 is the selected defect level, and the prediction target value 0.8 of the selected defect level 2 level exceeds the corresponding level threshold 0.6, so as to determine the defect detection result that the target object quality inspection fails.

In this embodiment, the support set is aligned with the small sample feature to obtain a target alignment result, where the target alignment result can effectively highlight the feature of the target area, so that the accuracy of feature expression of the target area is improved, the predicted target value obtained according to the target alignment result can more accurately represent the defect level of the target area, and the accuracy of the defect detection result is improved.

Meanwhile, only a small amount of data is required to be collected, and the small sample detection model can effectively identify the severity of the defect, so that the slight defect of the OK product is filtered, and the algorithm overkill is reduced (overkill refers to the false detection of the OK product as NG defective product, wherein the OK product is a good product, namely the defect-free or tolerable slight defect, and the NG product is a defective product, namely the defect).

In order to more clearly explain the technical solution of the present application, the defect detection method of the object of the present application is explained below in conjunction with an exemplary scenario in which. The target is a battery, and the total defect categories of the battery include the defect categories: D. e, F and G, each defect class comprises 3 defect levels, e.g. D comprises four defect levels D1, D2 and D3, E comprises four defect levels E1, E2 and E3, F comprises four defect levels F1, F2 and F3, G comprises four defect levels G1, G2 and G3.

1. Model training

An initial feature extractor, an initial weight predictor, an initial detection model, an initial backbone network, and an initial classifier are obtained.

For the defect class D, acquiring training samples of each of three defect levels in the defect class D, wherein the training samples of each defect level comprise: color image of the cell, normal vector diagram of the cell, true defect class of the cell, truth box of the cell (selection box for defects in normal vector diagram of the cell).

Extracting image features of a color image of the battery by an initial feature extractor to serve as first battery image features, and extracting image features of a normal vector diagram of the battery by the initial feature extractor to serve as second battery image features; and determining a first battery mask weight of the first battery image feature and a second battery mask weight of the second battery image feature through the initial weight predictor, and fusing the first battery image feature and the second battery image feature through the first battery mask weight and the second battery mask weight to obtain a fused battery feature.

Inputting the fused battery characteristics into an initial detection model to obtain the respective prediction probabilities of four defect categories output by the initial detection model and the prediction frames corresponding to the defect categories, determining a cross entropy loss value according to the respective prediction probabilities of the four defect categories output by the true defect category of the battery, the truth frame of the battery and the prediction frames corresponding to the defect categories, and carrying out parameter adjustment on the initial characteristic extractor, the initial weight predictor and the initial detection model through the determined loss value.

The defect classes E, F and G are traversed in this way, resulting in a trained feature extractor, weight predictor, and target detection model.

Aiming at the defect class D, acquiring a sample normal vector diagram of the battery, marking a truth box on the sample normal vector diagram by a manual marking method, and intercepting an area in the truth box as a target sample; and carrying out defect grade division on each target sample to obtain respective sample sets of three defect grades d1, d2 and d 3.

And determining a part of samples from the sample set of d1 as a sample support set of d1, determining a part of samples different from the sample support set of d1 from the sample set of d1 as a to-be-classified set of d1, traversing the sample sets of d2 and d3, and obtaining the sample support sets and the to-be-classified sets of d1, d2 and d 3.

For the defect level D1, extracting features of a target sample in a sample supporting set and a target sample in a to-be-classified set under the defect level D1 through an initial backbone network to obtain supporting features corresponding to the sample supporting set (including image features of each target sample in the sample supporting set) and to-be-classified features of the to-be-classified set (including image features of each target sample in the to-be-classified set), performing feature alignment processing on the supporting features and to-be-classified features under the defect level D1 to obtain sample alignment results corresponding to the defect level D1, and traversing D1, D2 and D3 under the defect level D to obtain sample alignment results of D1, D2 and D3.

Inputting the sample alignment results of d1, d2 and d3 as an initial classifier to obtain the predicted values of d1, d2 and d3 predicted by the initial classifier, calculating a cross entropy loss value according to the predicted values of d1, d2 and d3 and the true value of the class set to be classified, and training the initial backbone network and the initial classifier through the calculated loss value to obtain the backbone network and the classifier.

Thus, the defect classes E, F and G are traversed to obtain the backbone networks and classifiers of the defect classes D, E, F and G, respectively, i.e., the different defect classes correspond to the backbone networks and classifiers, respectively.

2. Defect detection

(1) The processing procedure corresponding to the multi-mode characteristic self-adaptive fusion and target detection model comprises the following steps:

the method comprises the steps of obtaining a battery Za to be detected as a target object, determining a color image and a normal vector image of the battery Za through an image acquisition device shown in fig. 3, extracting features of the color image of the battery Za through a feature extractor to obtain first image features, and extracting features of the normal vector image of the battery Za through the feature extractor to obtain second image features.

As shown in fig. 9, the first image feature is input into the weight predictor to obtain a corresponding first mask weight, the second image feature is input into the weight predictor to obtain a second mask weight, and the first image feature and the second image feature are fused through the first mask weight and the second mask weight to obtain a target fusion feature.

Inputting the target fusion characteristics into a target detection model to obtain a target defect type of a target object and a target detection frame of the target defect type, wherein the target defect type is D.

(2) The processing procedure of the small sample detection model:

and taking the area selected by the target detection frame in the Za normal vector diagram as a target area, and extracting the characteristics of the target area through the backbone network corresponding to the D to obtain the small sample characteristics Q.

Obtaining support sets corresponding to the defect levels D1, D2 and D3 respectively, wherein each support set comprises 10 sample normal vector images, and extracting features of each sample normal vector image in each support set through a backbone network corresponding to D to obtain support features of each support set, and obtaining the support features of each support set comprises respective third image features of each sample normal vector image in the support set.

Determining a degree of correlation between each third image feature in the support features of d1 and the small sample feature as a respective target weight for each third image feature in the support features of d 1;

as shown in fig. 9, the alignment process is continued: and according to the respective target weights of the third image features in the support features of d1, carrying out weighted summation on the third image features in the support features of d1 to obtain a target alignment result of d 1. Thus, traversing the support feature of d2 and the support feature of d3 results in the target table alignment result P of d1 ¹ Target alignment result P of d2 ² Target alignment result P for d3 ³ 。

d1 target table alignment result P ¹ Target alignment result P of d2 ² D3, respectively inputting the target alignment result of D3 and the small sample characteristic Q into the classifier corresponding to D to obtain the predicted target value of D1d2 predictive target value +.>And d3 +.>Wherein, the level thresholds of d1, d2 and d3 are all 0.7.

After the predicted target value of each defect level is determined, performing defect level judgment according to the predicted target value of each defect level and the level threshold value: and determining the defect level D3 corresponding to the 0.6 with the highest predicted target value as the selected defect level, determining that the class D slight defect does not exist in the battery Za when the predicted target value 0.6 of the selected defect level does not reach the corresponding level threshold value 0.7, and further determining that the quality of the battery Za passes through to be an OK product.

In the scene, the adopted luminosity three-dimensional imaging is used, and the color image and Normal Map (Normal vector diagram) with three-dimensional sense are collected, so that the three-dimensional sense of the defects can be highlighted, the accuracy of identifying various defects on the lithium battery can be improved, and the omission of an algorithm on NG (negative electrode) products is reduced.

Referring to fig. 10, fig. 10 is a block diagram illustrating a defect detecting apparatus for an object according to an embodiment of the application, where the apparatus 1000 includes:

An acquiring module 1010, configured to acquire a color image and a normal vector diagram of a target object;

an extraction module 1020 for extracting image features of the color image as first image features and extracting image features of the vector diagram as second image features;

the weight determining module 1030 is configured to determine a first mask weight corresponding to the first image feature and a second mask weight corresponding to the second image feature, where the first mask weight represents an importance level of the first image feature, and the second mask weight represents an importance level of the second image feature;

the fusion module 1040 is configured to fuse the first image feature and the second image feature according to the first mask weight and the second mask weight, so as to obtain a target fusion feature;

the result determining module 1050 is configured to determine a defect detection result of the target object according to the target fusion feature.

Optionally, the result determining module 1050 is further configured to input the target fusion feature into a target detection model, to obtain a target defect type of the target object and a target detection frame of the target defect type, where the target detection frame is a selection frame for selecting a defect of the target object, the target detection model is obtained by training the initial detection model based on the sample image, the defect type corresponding to the sample image, and a truth frame corresponding to the sample image, the truth frame is a selection frame for selecting a defect of an object in the sample image, and the truth frame is obtained by labeling; acquiring characteristics of a target area selected by a target detection frame in a normal vector diagram as small sample characteristics; and determining a defect detection result of the target object according to the characteristics of the small sample.

Optionally, the result determining module 1050 is further configured to obtain, through the backbone network, a respective support characteristic of each of the plurality of support sets; one support set corresponding to one of a plurality of defect levels, each support set including at least one normal vector diagram, the plurality of defect levels being for a target defect class; performing feature alignment processing on each support feature and the small sample feature to obtain a target alignment result of the defect level corresponding to each support feature; and determining a defect detection result of the target object according to the target alignment results of the defect levels.

Optionally, the support features of each support set include respective third image features of the respective sample normal vector diagrams in the support set; the result determining module 1050 is further configured to determine a correlation between each third image feature of the support features and the small sample feature as a respective target weight for each third image feature of the support features; and according to the respective target weights of the third image features in the support features, carrying out weighted summation on the third image features in the support features to obtain target alignment results of defect levels corresponding to the support features.

Optionally, the result determining module 1050 is further configured to determine, by using a classifier, a similarity between the target alignment result and the small sample feature of each defect level as a predicted target value of each defect level; and determining a defect detection result of the target object according to the predicted target values of the defect levels.

Optionally, the result determining module 1050 is further configured to obtain, as the selected defect level, the defect level with the largest predicted target value; and if the selected defect grade does not reach the grade threshold value aiming at the target defect, obtaining a defect detection result of passing the target object quality inspection.

Optionally, the fusion module 1040 is further configured to calculate a sum of the first mask weight and a preset first value, as a first sum value; multiplying the first sum value and a first image feature element by element to obtain a first product, wherein the first image feature comprises a plurality of elements, and each element in the first image feature is a feature value of a pixel point corresponding to the element in the color image; calculating the sum of the second mask weight and a preset second value to be used as a second sum value; multiplying the second sum value and the normal vector graph feature element by element to obtain a second product, wherein the second image feature comprises a plurality of elements, and each element in the second image feature is a feature value of a pixel point corresponding to the element in the normal vector graph; the sum of the first product and the second product is calculated as a target fusion feature.

Optionally, the weight determining module 1030 is further configured to convolve the first image feature with a first convolution network in the weight predictor to obtain a first convolution result; performing batch standardization processing on the first convolution result to obtain a first standardization result; activating the first standardized result to obtain a first activation result; convolving the first activation result through a second convolution network in the weight predictor to obtain a second convolution result; performing batch standardization processing on the second convolution result to obtain a second standardization result; and activating the second standardized result to obtain the first mask weight.

Optionally, the weight determining module 1030 is further configured to convolve the second image feature through a first convolution network in the weight predictor to obtain a third convolution result; performing batch standardization processing on the third convolution result to obtain a third standardization result; activating the third standardized result to obtain a second activation result; convolving the second activation result through a second convolution network in the weight predictor to obtain a fourth convolution result; performing batch standardization processing on the second convolution result to obtain a fourth standardization result; and activating the fourth standardized result to obtain the second mask weight.

Optionally, the extracting module 1020 is further configured to convolve the color image with a third convolution network in the feature extractor to obtain a fifth convolution result; performing batch standardization processing on the fifth convolution result to obtain a fifth standardization result; activating the fifth standardized result to obtain a third activation result; convolving the third activation result through a fourth convolution network in the feature extractor to obtain a sixth convolution result; performing batch standardization processing on the sixth convolution result to obtain a sixth standardization result; and activating the sixth standardized result to obtain the first image feature.

Optionally, the acquiring module 1010 is further configured to acquire polarized images in respective preset directions, where each of the polarized images in the preset directions is an image obtained by photographing the target object when the target object is irradiated by a light source disposed in the preset direction; and obtaining a normal vector diagram according to the photometric stereo synthesis algorithm and the polarized light images of each of the plurality of preset directions.

It should be noted that, in the present application, the device embodiment and the foregoing method embodiment correspond to each other, and specific principles in the device embodiment may refer to the content in the foregoing method embodiment, which is not described herein again.

Fig. 11 shows a block diagram of an electronic device for performing a defect detection method of an object according to an embodiment of the present application. The electronic device may be the terminal 20 or the server 10 in fig. 1, and it should be noted that, the computer system 1200 of the electronic device shown in fig. 11 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 11, the computer system 1200 includes a central processing unit (Central Processing Unit, CPU) 1201 which can perform various appropriate actions and processes, such as performing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a random access Memory (Random Access Memory, RAM) 1203. In the RAM 1203, various programs and data required for the system operation are also stored. The CPU1201, ROM1202, and RAM 1203 are connected to each other through a bus 1204. An Input/Output (I/O) interface 1205 is also connected to bus 1204.

The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker, etc.; a storage section 1208 including a hard disk or the like; and a communication section 1209 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. The drive 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 1210 as needed, so that a computer program read out therefrom is installed into the storage section 1208 as needed.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1209, and/or installed from the removable media 1211. When executed by a Central Processing Unit (CPU) 1201, performs the various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

As another aspect, the present application also provides a computer-readable storage medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable storage medium carries computer readable instructions which, when executed by a processor, implement the method of any of the above embodiments.

According to an aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the electronic device to perform the method of any of the embodiments described above.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause an electronic device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be appreciated by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for detecting defects in an object, the method comprising:

acquiring a color image and a normal vector diagram of a target object;

extracting image features of the color image as first image features, and extracting image features of the normal vector image as second image features;

determining a first mask weight corresponding to the first image feature and a second mask weight corresponding to the second image feature, wherein the first mask weight represents the importance degree of the first image feature, and the second mask weight represents the importance degree of the second image feature;

fusing the first image features and the second image features according to the first mask weights and the second mask weights to obtain target fusion features;

And determining a defect detection result of the target object according to the target fusion characteristic.

2. The method of claim 1, wherein determining a defect detection result for the target object based on the target fusion feature comprises:

inputting the target fusion characteristics into a target detection model to obtain a target defect type of the target object and a target detection frame of the target defect type, wherein the target detection frame refers to a selection frame for selecting defects of the target object, the target detection model is obtained by training an initial detection model based on a sample image, the defect type corresponding to the sample image and a truth frame corresponding to the sample image, the truth frame refers to a selection frame for selecting defects of articles in the sample image, and the truth frame is obtained by labeling;

acquiring characteristics of a target area selected by the target detection frame in the normal vector diagram as small sample characteristics;

and determining a defect detection result of the target object according to the small sample characteristics.

3. The method of claim 2, wherein determining the defect detection result of the target based on the small sample feature comprises:

Acquiring respective support characteristics of each support set in a plurality of support sets through a backbone network; a support set corresponding to one of a plurality of defect levels, each of said support sets including at least one normal vector diagram, said plurality of defect levels being for said target defect class;

performing feature alignment processing on each support feature and the small sample feature to obtain a target alignment result of a defect level corresponding to each support feature;

and determining a defect detection result of the target object according to the target alignment result of each of the defect levels.

4. A method according to claim 3, wherein the support features of each support set comprise respective third image features of respective sample normal vector maps in the support set;

performing feature alignment processing on each support feature and the small sample feature to obtain a target alignment result of a defect level corresponding to each support feature, including:

determining a degree of correlation between each of the third image features and the small sample feature as a respective target weight for each of the third image features;

And carrying out weighted summation on each third image feature in the support features according to the respective target weight of each third image feature in the support features to obtain a target alignment result of the defect level corresponding to the support features.

5. A method according to claim 3, wherein determining the defect detection result of the target object based on the target alignment result of each of the plurality of defect levels comprises:

determining the similarity between the target alignment result of each defect grade and the small sample characteristic through a classifier, and taking the similarity as a predicted target value of each defect grade;

and determining a defect detection result of the target object according to the predicted target values of the defect levels.

6. The method of claim 5, wherein determining a defect detection result for the target object based on the predicted target values for each of the plurality of defect levels comprises:

obtaining the defect grade with the maximum predicted target value as the selected defect grade;

and if the selected defect grade does not reach the grade threshold value for the target defect, obtaining a defect detection result of passing the target material inspection.

7. The method of claim 1, wherein the fusing the first image feature and the second image feature according to the first mask weight and the second mask weight to obtain a target fusion feature comprises:

calculating the sum of the first mask weight and a preset first value to be used as a first sum value;

multiplying the first sum value and the first image feature element by element to obtain a first product, wherein the first image feature comprises a plurality of elements, and each element in the first image feature is a feature value of a pixel point corresponding to the element in the color image;

calculating the sum of the second mask weight and a preset second value to be used as a second sum value;

multiplying the second sum value and the normal vector graph feature element by element to obtain a second product, wherein the second image feature comprises a plurality of elements, and each element in the second image feature is a feature value of a pixel point corresponding to the element in the normal vector graph;

and calculating the sum of the first product and the second product as the target fusion feature.

8. The method according to claim 1, wherein the method for obtaining the first mask weight includes:

Convolving the first image feature through a first convolution network in the weight predictor to obtain a first convolution result;

performing batch standardization processing on the first convolution result to obtain a first standardization result;

activating the first standardized result to obtain a first activation result;

convolving the first activation result through a second convolution network in the weight predictor to obtain a second convolution result;

performing batch standardization processing on the second convolution result to obtain a second standardization result;

and activating the second standardized result to obtain the first mask weight.

9. The method according to claim 1, wherein the second mask weight obtaining method includes:

convolving the second image feature through a first convolution network in the weight predictor to obtain a third convolution result;

performing batch standardization processing on the third convolution result to obtain a third standardization result;

activating the third standardized result to obtain a second activation result;

convolving the second activation result through a second convolution network in the weight predictor to obtain a fourth convolution result;

Performing batch standardization processing on the second convolution result to obtain a fourth standardization result;

and activating the fourth standardized result to obtain the second mask weight.

10. The method of claim 1, wherein the method of acquiring the first image feature comprises:

convolving the color image through a third convolution network in the feature extractor to obtain a fifth convolution result;

performing batch standardization processing on the fifth convolution result to obtain a fifth standardization result;

activating the fifth standardized result to obtain a third activation result;

convolving the third activation result through a fourth convolution network in the feature extractor to obtain a sixth convolution result;

performing batch standardization processing on the sixth convolution result to obtain a sixth standardization result;

and activating the sixth standardized result to obtain the first image feature.

11. The method of claim 1, wherein the method of obtaining the normal vector map comprises:

acquiring polarized light images of each of a plurality of preset directions, wherein each polarized light image of the preset direction is an image obtained by shooting the target object under the condition that the target object is irradiated by a light source arranged in the preset direction;

And obtaining the normal vector diagram according to a photometric stereo synthesis algorithm and the polarized light images of each of the plurality of preset directions.

12. The method according to claim 1, wherein the color image is an image obtained by photographing the object while simultaneously irradiating the object by light sources provided in a plurality of preset directions.

13. The method of claim 1, wherein the first mask weight and the second mask weight each comprise a mask weight corresponding to each local region in the target; mask weights corresponding to the local areas are related to the morphological structures of the local areas.

14. A defect detection apparatus for an object, the apparatus comprising:

the acquisition module is used for acquiring a color image and a normal vector diagram of the target object;

the extraction module is used for extracting the image characteristics of the color image to be used as first image characteristics and extracting the image characteristics of the normal vector image to be used as second image characteristics;

the weight determining module is used for determining a first mask weight corresponding to the first image feature and a second mask weight corresponding to the second image feature, wherein the first mask weight represents the importance degree of the first image feature, and the second mask weight represents the importance degree of the second image feature;

The fusion module is used for fusing the first image features and the second image features according to the first mask weights and the second mask weights to obtain target fusion features;

and the result determining module is used for determining the defect detection result of the target object according to the target fusion characteristics.

15. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-13.

16. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program code, which is callable by a processor for performing the method according to any one of claims 1-13.