CN116977261A

CN116977261A - Image processing method, image processing apparatus, electronic device, storage medium, and program product

Info

Publication number: CN116977261A
Application number: CN202310294347.0A
Authority: CN
Inventors: 王昌安
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-03-17
Filing date: 2023-03-17
Publication date: 2023-10-31

Abstract

The application provides an image processing method, an image processing device, electronic equipment, a storage medium and a program product, and relates to the fields of artificial intelligence, computer vision and the like. The method comprises the steps of obtaining an input image corresponding to a target lens element, wherein the input image comprises at least one first annotation information related to the defect of the target lens element; performing feature extraction on an input image through a first neural network model, and constructing correlation between at least two image blocks of the input image to obtain a feature extraction result; generating second labeling information corresponding to the input image and related to the defect of the target lens element based on the correlation and the first labeling information; based on the feature extraction result and the second labeling information, training the segmentation network model, and carrying out segmentation processing on the defects of the lens elements based on the trained segmentation network model so as to use the segmentation result for judging the defects of the lens elements, thereby realizing automatic detection of the defects of the lens elements and effectively improving the quality screening efficiency and accuracy.

Description

Image processing method, image processing apparatus, electronic device, storage medium, and program product

Technical Field

The present application relates to the field of image processing technology, and in particular, to an image processing method, an image processing apparatus, an electronic device, a storage medium, and a program product.

Background

With the development of industries such as 5G (5 th Generation Mobile Communication Technology, fifth generation mobile communication technology) industry and autopilot industry, industries such as consumer electronics and automotive electronics have come to a new development opportunity. The optical module serves as a visual basic component of the intelligent machine, plays a role of human eyes, and plays a role in environmental perception, automation intelligence and improvement of user experience.

In general, the lens element is an important component in the optical module, and the lens element is a camera lens, so that the common camera module is assembled by a plurality of different discrete components, and the lens is a first threshold through which external light enters the camera for imaging, so that the cleanliness of the lens element greatly influences the imaging quality. However, in actual production, the lens element part is extremely easy to leave dirt such as fingerprints and floating dust, which is a point where product defects appear in a concentrated way, and brings great challenges to the qualification rate of the production line.

At present, the lens elements are screened by manual visual observation, and the defect of the lens elements is judged by manual visual observation due to the general miniaturization of the volume of the optical module, so that the production line efficiency is greatly reduced by means of a microscope.

Disclosure of Invention

The embodiment of the application aims to solve the technical problem of low judging efficiency of the defects of lens elements.

According to an aspect of an embodiment of the present application, there is provided an image processing method including:

acquiring an input image corresponding to the target lens element, wherein the input image comprises at least one first annotation information related to the defect of the target lens element;

performing feature extraction on an input image through a first neural network model, and constructing correlation between at least two image blocks of the input image to obtain a feature extraction result containing the correlation;

performing annotation propagation processing based on the correlation and the first annotation information to generate second annotation information corresponding to the input image and related to the defect of the target lens element;

training the segmentation network model based on the feature extraction result and the second labeling information to obtain a trained segmentation network model;

and carrying out segmentation processing on the defects of the lens element based on the trained segmentation network model so as to use the segmentation result for judging the defects of the lens element.

According to another aspect of an embodiment of the present application, there is provided an image processing apparatus including:

the acquisition module is used for acquiring an input image corresponding to the target lens element, wherein the input image comprises at least one first annotation information related to the defect of the target lens element;

the processing module is used for extracting the characteristics of the input image through the first neural network model, constructing the correlation between at least two image blocks of the input image and obtaining a characteristic extraction result containing the correlation;

the generating module is used for carrying out annotation propagation processing based on the correlation and the first annotation information and generating second annotation information corresponding to the input image and related to the defect of the target lens element;

the training module is used for training the segmentation network model based on the feature extraction result and the second labeling information to obtain a trained segmentation network model;

a segmentation module for performing segmentation processing on the defects of the lens element based on the trained segmentation network model to use the segmentation result for judging the defects of the lens element

According to still another aspect of the embodiment of the present application, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory, the processor executing the computer program to implement the image processing method provided by the embodiment of the present application.

According to still another aspect of the embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method provided by the embodiments of the present application.

According to still another aspect of the embodiments of the present application, there is provided a computer program product, including a computer program, which when executed by a processor implements the image processing method provided by the embodiments of the present application.

The embodiment of the application provides an image processing method, an image processing device, electronic equipment, storage media and a program product, which are used for acquiring an input image corresponding to a target lens element, wherein the input image comprises at least one first annotation information related to the defect of the target lens element; performing feature extraction on an input image through a first neural network model, and constructing correlation between at least two image blocks of the input image to obtain a feature extraction result containing the correlation; performing annotation propagation processing based on the correlation and the first annotation information to generate second annotation information corresponding to the input image and related to the defect of the target lens element; based on the feature extraction result and the second labeling information, a segmentation network model is trained, and the defects of the lens elements are segmented based on the trained segmentation network model, so that the segmentation result is used for judging the defects of the lens elements, the defect detection of the lens elements can be automatically carried out through a computer vision technology, and the efficiency and the accuracy of quality screening are effectively improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

FIG. 1a is a schematic diagram of a contamination defect on a lens according to an embodiment of the present application;

FIG. 1b is a diagram illustrating an example of a lens defect labeling operation according to an embodiment of the present application;

fig. 2 is a schematic flow chart of an image processing method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a tag propagation method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a lens element defect segmentation frame according to an embodiment of the present application;

FIG. 5a is a schematic diagram of labeling information according to an embodiment of the present application;

FIG. 5b is a schematic diagram of another labeling information provided by an embodiment of the present application;

FIG. 5c is a schematic diagram of yet another labeling information provided by an embodiment of the present application;

fig. 6 is a schematic diagram of different areas on a lens point according to an embodiment of the present application;

fig. 7 is a schematic diagram of a lens defect determining method according to an embodiment of the present application;

fig. 8 is a flowchart of a lens defect detection method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present specification. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

In the image acquisition process of a common camera, after an external light source passes through a lens, an electric signal is generated on a CMOS (Complementary Metal-Oxide-Semiconductor) photosensitive element and is transmitted to an image processing center through a circuit in a specific transmission protocol, so that the lens is the first threshold through which external light enters the camera for imaging, and the cleanliness of the lens is critical to the quality of the whole module. However, the lens of the camera module is a component which is very easy to be stained with dust and dirt, as shown in fig. 1a, in order to improve the yield of the element, each large manufacturer usually performs dirt detection and quality screening of the lens.

The embodiment of the application provides that the pollution detection and quality screening can replace part of manpower and vision through a vision AI (Artificial Intelligence ) quality inspection system, so as to achieve the effects of reducing cost and improving efficiency.

However, for the defects of the lens part, due to the small volume of the camera module, the general small size of the foreign matters such as dust and dirt, etc., the large-area potential defects usually exist after the imaging of the industrial camera and need to be detected, the potential defects have different sizes and a large number (especially, the star defect of the world), the occurrence positions are relatively random, and great difficulty is brought to the data marking work required by the vision AI quality inspection system, as shown in fig. 1 b.

Aiming at least one technical problem or the place needing improvement, the application provides a lens element (such as a camera module lens, a plane mirror, a concave lens, a convex lens and the like possibly adopted by other optical modules) defect detection method based on weak supervision learning, which can improve the data marking efficiency and further improve the model iteration efficiency.

The technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application are described below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, or combined with each other, and the description will not be repeated for the same terms, similar features, similar implementation steps, and the like in different embodiments.

An embodiment of the present application provides an image processing method, as shown in fig. 2, where the method includes:

step S201: acquiring an input image corresponding to the target lens element, wherein the input image comprises at least one first annotation information related to the defect of the target lens element;

in the embodiment of the present application, the target lens element refers to a lens element that has undergone at least one defect detection (for example, but not limited to, manual detection), and has been determined to have at least one defect. The input image corresponding to the target lens element refers to a captured image for the target lens element or an image obtained by processing the captured image. At least one defect of all defects of the target lens element displayed in the input image is marked, i.e. the input image comprises at least one first marking information related to the defect of the target lens element, which first marking information is also understood as the actual marking information of the defect of the lens element.

In practical applications, the number of the target lens elements may be one or more, and the number of the input images corresponding to each target lens element may be one or more, so that the processing manner of each input image is similar, and the same processing procedure will not be repeated.

In the embodiment of the application, the input image including at least one first annotation information is used as a training sample for training the model, which can be understood as a model training method for weak supervision annotation, namely the provided annotation information does not need to cover all defect areas, and the labor annotation burden can be greatly reduced.

Step S202: performing feature extraction on an input image through a first neural network model, and constructing correlation between at least two image blocks of the input image to obtain a feature extraction result containing the correlation;

in the embodiment of the present application, the first neural network model is a neural network capable of performing feature extraction on an input image and modeling a relationship between image blocks, for example, a graph roll-up network, such as a ViG (Vision Graph Neural Network, vision GNN, visual pattern neural network) network, or a Pyramid (Pyramid) ViG, may be used, but is not limited thereto.

The correlation between image blocks may refer to correlation of high-level semantic layers, correlation of texture, detail, or any combination thereof.

Specifically, for a given input image, the first neural network model uniformly divides the input image into a plurality of image blocks of equal size, performs feature extraction based on each image block, constructs a correlation between at least two image blocks, and outputs a feature extraction result.

Step S203: performing annotation propagation processing based on the correlation and the first annotation information to generate second annotation information corresponding to the input image and related to the defect of the target lens element;

in the embodiment of the present application, the correlation between the image blocks acquired in step S202 is utilized to predict the unlabeled data of the input image based on the first labeling information, so as to complete the generation of the second labeling information related to the defect of the target lens element, thereby expanding the region of the labeling information. The second labeling information may be understood as predictive labeling information of a lens element defect or a pseudo tag.

Step S204: training the segmentation network model based on the feature extraction result and the second labeling information to obtain a trained segmentation network model;

In the detection flow of the defects of the lens element, considering that the defect segmentation based on the depth model has the greatest influence on the defect detection effect, the embodiment of the application is mainly applied to the training of the segmentation network model based on the deep learning. However, the existing training methods rely on large-scale full-labeling data for training, which puts high demands on the defect labeling cost of the lens element area. The embodiment of the application can complete the training of the segmentation network model based on the weak labels with smaller label cost.

The network type adopted by the split network model can be set by a person skilled in the art according to actual situations, for example, a common convolutional network can be adopted, and other types of neural networks can also be adopted.

In the embodiment of the application, the second labeling information is used for training the segmentation network model, so that the segmentation network model with higher robustness and better generalization can be obtained.

Alternatively, the segmented network model may be trained based on the feature extraction result, the first annotation information, and the second annotation information.

As an example, to enable an initial segmentation network model to have some defect discrimination capability, the segmentation network model may be pre-trained by first trying out a small number of labeled data points (i.e., first labeling information). And then generating second labeling information by using the steps, and repeating training on the model after the second labeling information is acquired. After the training end condition is met (e.g., the training reaches a predetermined number of rounds or reaches a predetermined effect), the above steps may be used again to generate new second labeling information, and the training of the model may be repeated.

That is, the above-described processes of step S201 to step S204 may be iterated until the finally obtained second labeling information becomes stable, and the accuracy of the finally segmented network model tends to be stable.

Step S205: the method comprises the steps of performing segmentation processing on defects of the lens element based on a trained segmentation network model, and using segmentation results for judging the defects of the lens element.

In the embodiment of the application, the trained segmentation network model is used as a defect segmentation master model for carrying out segmentation processing on defects of the lens element, and the step mainly finds out all potential defect areas and has higher recall rate. Further, based on the output model prediction result (i.e. the segmentation result), the lens element defect is determined by a post-processing module based on rules, for example, according to the defect standard requirement, further post-processing determination is performed by combining the appearance characteristics of a specific suspected defect, and finally, a conclusion is given as to whether the defect belongs to the defect and/or the defect type, so that the NG (good article, i.e. the defective article) can be determined.

The embodiment of the application provides a frontal image processing method, which can automatically detect the defects of lens elements by utilizing the steps, and effectively improve the efficiency and accuracy of quality screening.

In addition, the embodiment of the application models the relation between the image blocks through the first neural network model, can more fully utilize the existing point labeling information to carry out labeling propagation processing, generate second labeling information, train the segmentation network model to update network parameters, and greatly reduce the requirement on the labeling data amount required by the training stage, thereby realizing efficient and quick online model iteration and better meeting the requirement on quick iteration of products such as consumer electronics.

In the embodiment of the present application, a feasible implementation manner is provided for the step S203, which specifically may include the steps of:

step S2031: based on the processing process of the first neural network model on the input image, counting the correlation to obtain a correlation counting result;

in the processing of the input image by the first neural network model, that is, in the process of uniformly dividing the input image into a plurality of image blocks with equal sizes for a given input image and constructing the correlation between at least two image blocks, the correlation between the image block features can be reflected by the processing data of the first neural network. In the embodiment of the application, the processing process of the whole first neural network model is analyzed, and the data is comprehensively processed to comprehensively evaluate (measure) the correlation among the image blocks, so as to obtain a correlation statistical result.

Taking the example that the first neural network model is a ViG network as an example, each image block is taken as a graph node at each layer in the ViG network to calculate a plurality of nearest neighbor nodes, and a graph structure network is constructed according to the calculated nearest neighbor nodes, wherein the graph structure of each layer reflects the correlation among the image block characteristics at the layer. The correlation of the low-level features of the ViG network is often related to the texture, color and other information of the image, and the correlation of the high-level features of the ViG network is often closely related to the semantic information corresponding to the image block. According to the embodiment of the application, the whole ViG backbone network can be analyzed from top to bottom according to the rule, and the graph structures of all layers are synthesized to comprehensively evaluate (measure) the correlation among the image blocks, so as to obtain a correlation statistical result.

In the embodiment of the application, the correlation statistical result can be understood as the correlation evidence, and the correlation among the image blocks is reflected by the correlation statistical result, so that the correlation among the image blocks is more visual.

Step S2032: and performing annotation propagation processing based on the correlation statistics result and the first annotation information to generate second annotation information corresponding to the input image and related to the defect of the target lens element.

In the embodiment of the present application, the correlation statistics obtained in step S2031 is used to predict, based on the first labeling information, on the unlabeled data of the input image, to complete the generation of the second labeling information related to the defect of the target lens element, thereby expanding the region of the labeling information.

In the embodiment of the present application, a feasible implementation manner is provided for step S2031, which may specifically include:

step SA: acquiring a two-dimensional histogram consistent with the number of the image blocks;

step SB: based on the processing procedure of the first neural network model on the input image, every two image blocks with correlation meeting the first condition are counted to the corresponding positions of the two-dimensional histogram, and a correlation counting result can be obtained.

In the processing of the input image by the first neural network model, the correlation of the two image blocks meets the first condition, which may mean that one image block belongs to a predetermined number of image blocks closest to the other image block at the feature level, or that the distance between the two image blocks at the feature level is smaller than or equal to the predetermined distance, etc., and those skilled in the art may set the calculation mode of the distance between the first condition and the image block according to the actual situation. For ease of description, two image blocks whose relatedness satisfies the first condition may be referred to as neighbor nodes hereinafter.

In the embodiment of the present application, a two-dimensional histogram H consistent with the total number of image blocks may be maintained, and if the ith image block and the jth image block are neighbor nodes, the jth column of the ith row and the jth column of the two-dimensional histogram may be counted, for example, the data of the ith row and the jth column may be counted as 1, but not limited thereto. After traversing all the image blocks, a correlation statistical result is obtained and can be understood as an H matrix.

In the embodiment of the present application, if the first neural network model includes a plurality of network layers, the histograms may be accumulated. Specifically, for each layer, if the ith and jth image blocks are neighbor nodes, the value of the ith row and jth column of the two-dimensional histogram is added by 1, i.e., H [ i, j ] =h [ i, j ] +1. After traversing all layers of the first neural network model, a complete statistic of correlation between image blocks is obtained. Further, the two-dimensional histogram may be normalized to avoid some nodes with strong discrimination capability from affecting the subsequent process of generating the second labeling information, for example, but not limited to, line-by-line normalization, and other normalization criteria may be adopted. Through the above process, the correlation statistical result among the image blocks can be obtained.

In the embodiment of the present application, another alternative implementation manner is provided for the step SB, specifically, the method may include the steps of: the processing procedure of each layer of the first neural network model on the input image is divided into at least two sub-processing procedures, and based on each sub-processing procedure, every two image blocks with correlation meeting a first condition are counted to corresponding positions of a sub-two-dimensional histogram corresponding to each sub-processing procedure; based on the numerical value meeting the second condition in the sub two-dimensional histogram of each layer, counting the corresponding position of the two-dimensional histogram to obtain a correlation statistical result.

Specifically, if the first neural network model includes multiple network layers, and each network layer divides the input image processing process into multiple sub-processing processes (multiple heads), since the expression capability of the same layer in the first neural network model for correlation between image blocks can be understood to be consistent, for example, the expression capability of the same layer in the first neural network model for correlation between image blocks is represented by color texture similarity or semantic similarity, for example, the process of accumulating histograms of multiple network layers can normalize the correlation inside multiple heads in the same layer, that is, multiple heads in each layer are counted separately to form a sub-two-dimensional histogram H (multiple heads in each layer are accumulated into one sub-two-dimensional histogram H), if the accumulated value corresponding to each head in H meets the second condition, for example, the accumulated value in H is greater than 1/3 of the total number of heads, but not limited thereto, the embodiment of the application can be set according to practical situations, and the second condition is not limited thereto. The process can enable the acquired H matrix to be more stable, and particularly aims at the condition that the correlation of two image blocks meets a first condition by adopting a K-Nearest Neighbor (KNN) algorithm.

In the embodiment of the application, if the first neural network model comprises a plurality of network layers, when the histograms are accumulated, the tendency of the similarity of different types can be adjusted by a weighting method between different layers, and the method can be suitable for different practical application scenes. As an example, assuming that the feature correlation of the lower-layer network processing is mainly based on image block texture information, and the feature correlation of the higher-layer network processing is mainly based on semantic information corresponding to the image block, if the image is relatively simple, only the texture gray scale or the like is changed, the lower layer may be given more weight, and vice versa. In practical application, a person skilled in the art may allocate the weight value of each layer according to the practical situation, and the embodiment of the present application is not limited herein.

In the embodiment of the present application, a feasible implementation manner is provided for step S2032, which specifically may include the steps of: taking the first labeling information as labeled information, and repeatedly executing the following labeling propagation process: maintaining the marked information unchanged, and transmitting the marked information to the related image blocks according to the correlation statistics result, wherein in the marking transmission process, each current image block updates the marked information of the current image block according to the marked information of the related image block based on the correlation statistics result, and the marked information updated by each image block is used as new marked information to execute the next marking transmission process; and after the label propagation process is converged, obtaining second label information corresponding to the input image and related to the defect of the target lens element based on the label information finally updated by each image block.

In particular, since there is at least one first labeling information associated with a defect of the target lens element corresponding to the input image, how to effectively spread a small amount of labeling information to a larger extent, thereby increasing the magnitude of the effective label (i.e., labeling data that can be used for training, including, for example, the labeled first labeling information and the determined second labeling information) is critical to improving the accuracy of the segmentation model.

In the embodiment of the application, the generation of the prediction annotation information is completed by utilizing the correlation among the acquired image blocks. Specifically, the above process may be accomplished by annotation propagation (also referred to as tag propagation): different image blocks serve as different nodes, labels are transmitted among the different nodes, and the similarity among the nodes can adopt an H matrix (correlation statistical result).

In this process, the labels of the marked data (first labeling information) are kept unchanged, so that they are transferred to the unmarked data (other image blocks) according to the H matrix.

In each iteration of the propagation process, each node updates its own node's labeling information based on the H-matrix, based on the labeling information of the relevant node (e.g., the neighboring node described above).

As an example, as shown in fig. 3, for each image block, assume that a five-pointed star is the current image block (the present node), and surrounding circles and triangles are related image blocks (e.g., the above-described neighboring nodes), where a circle corresponds to one type of label and a triangle corresponds to another type of label. If the node label is updated using a weighted sum, the node label can be updated to be circular since the circular label weight is 0.2+0.8+0.1=1.1 and the triangle label weight is 0.4+0.6=1. If the node labels are updated using a weighted average, the circular label weight is (0.2+0.8+0.1)/3=0.367 and the triangle label weight is (0.4+0.6)/2=0.5, so the node label can be updated to a triangle.

Eventually, when the iteration ends, each node will be assigned to a certain class, while similar nodes will be classified into the same class, i.e. similar nodes will have a common label.

The tag propagation process is based primarily on the assumption that the data stream forms a distribution, i.e., the high-dimensional data is actually mapped onto the high-dimensional space by a low-dimensional manifold. In the high-dimensional space, similar data points are gathered together, and in the embodiment of the application, the relation between different image blocks in the high-dimensional space is evaluated through an H matrix, and label propagation is performed based on the relation, so that the existing labeling information can be more fully utilized.

In the embodiment of the present application, a feasible implementation manner is provided for the step S202, which specifically may include the steps of:

step S2021: for each image block of an input image, determining a neighbor set formed by a preset number of neighbor image blocks closest to the image block at a characteristic level, and constructing a path between the image block and each neighbor image block in the neighbor set to obtain a modeling diagram;

specifically, for a given input image, assuming that its size is H×W×3, it is uniformly divided into M (M is a natural number of ≡2) image blocks of equal size, for example, 16×16 image blocks (the partial division may be filled with 0 at the image edge), and the pixel value or feature of each image block may be expressed as X= [ X ] ₁ ,x ₂ ,…,x _n ]。

Wherein each image block is treated as a separate node in the first neural network model, each node being denoted as v _i Then all image blocks may form a set of nodes v= { V ₁ ,v ₂ ,…,v _n }。

For each node v _i At the feature level and calculateThe nearest K (i.e. a predetermined number, which can be set by those skilled in the art according to the actual situation, the embodiment of the present application is not limited herein) is a set N (v _i ) May be referred to simply as a neighbor set. For any one of the neighbor image blocks in the neighbor set (for convenience of description, the neighbor image blocks in the neighbor set may be hereinafter referred to as neighbor nodes) v _j ∈N(v _i ) Constructing a slave v _i To v _j Edge (path) e of (a) _ji So that all image blocks can be represented as one graph (i.e., modeling graph) g= (V, E), where V represents all nodes and E represents all edges.

Step S2022: and carrying out graph convolution operation on the modeling graph to obtain a feature extraction result containing correlation.

After the graph construction is complete, a graph convolution (Graph convolution), also known as a graph convolution transformation operation, may be used on the modeled graph G, through which information exchange between nodes can be accomplished.

In the embodiment of the application, the correlation between the image blocks is better expressed by means of the excellent structural modeling capability of the image structure.

In the embodiment of the present application, a feasible implementation manner is provided for the step S2022, which specifically may include the steps of: for each image block in the modeling diagram, aggregating the context information of each neighbor image block in the neighbor set of the image block to obtain an aggregate feature, and updating the image block based on the aggregate feature; and obtaining a feature extraction result containing correlation based on each updated image block.

Namely, context information is acquired from the neighbor nodes through a graph rolling operation, and change updating of the self node characteristics is completed, so that information exchange between the nodes can be realized. Specifically, the specific form of this process is as follows:

G′＝F(G,W)＝Update(Agggreate(G,W _agg ),W _update )

where F (·) represents a graph rolling operation, and F (G, W) represents a graph rolling operation on the modeling graph G based on the learned weights W.

Agggreate represents an aggregation process, namely, for each image block (graph node) of G, aggregating the context information of each neighbor image block (neighbor node) in the neighbor set of the image block to obtain an aggregation feature; w (W) _agg Representing weights (learnable) to obtain context information from neighboring nodes.

Update represents an Update procedure, i.e. based on the aggregate feature aggreate (G, W _agg ) Updating the image block; w (W) _update Representing the weights (learnable) to update the own node characteristics.

For a particular image block (graph node) x _i The process may be expressed as follows:

x′ _i ＝h(x _i ,g(x _i ,N(x _i ),W _agg ),W _update )

wherein N (x) _i ) Represents x _i Neighbor image blocks (neighbor nodes), W within neighbor sets _agg Weights (learnable) representing acquisition of context information from neighboring nodes, g (·) represents the aggregation process, i.e. for image block x _i Combining weights W _agg The context information of each neighbor image block (neighbor node) in the image block neighbor set is aggregated to obtain an aggregate feature x' _i 。

W _update Representing the weight (leavable) of updating the own node characteristics, h (·) represents the update procedure, i.e. based on the aggregate characteristics x " _i And combine weights W _update For image block x _i Updating to obtain updated image block x' _i 。

In an alternative embodiment, if the first neural network model employs a ViG network, the maximum relative convolution may be used as the convolution operation, and the specific form is as follows:

g(·)＝x″ _i ＝max({x _i -x _j |j∈N(x _i )})

h(·)＝x′ _i ＝x″ _i W _update

wherein, the meaning of each expression can be referred to the above description, and is not repeated here.

In other alternative embodiments, the convolution of the drawing may take other forms, and it will be appreciated that the choice of the particular form of convolution of the drawing does not affect the implementation of the present solution, and that suitable modifications based on the above examples are also applicable to the present application and are therefore intended to be within the scope of the present application.

In the embodiment of the present application, a feasible implementation manner is provided for the step of updating the image block based on the aggregation feature in the step S2022, which specifically may include the steps of: dividing the aggregate feature into at least two sub-aggregate features; based on at least two sub-aggregation features, updating the image blocks in parallel; and splicing the parallel updating results.

In particular, the process can be understood as a multi-headed update operation, i.e., using a multi-headed (head) network in the update process, aggregating features x " _i Dividing into at least two sub-aggregate features [ head ] ¹ ，head ² ，…,head ^d ]The at least two sub-aggregation features are respectively subjected to feature transformation by adopting at least two corresponding groups of different volumes so as to update the image block in parallel, and the parallel update results are spliced to be used as final output, namely:

in the embodiment of the application, the efficiency of updating the image block can be improved through multi-head updating operation.

In the embodiment of the present application, if the first neural network model includes a plurality of network layers, step S2021 may specifically include: at least one network layer of the first neural network model, aiming at each image block of the input image, determining a neighbor set formed by a preset number of neighbor image blocks closest to the image block at a characteristic layer, and constructing a path between the image block and each neighbor image block in the neighbor set to obtain a modeling diagram respectively corresponding to the at least one network layer.

I.e., each layer in the first neural network model, calculates for each image block (i.e., each graph node) a plurality of nearest neighbor nodes based on the layer characteristics, and constructs a modeling graph accordingly. Wherein the graph structure of each layer can reflect the correlation between image block features on that layer.

Taking the first neural network model as an example, taking the ViG network as an example, the low-level characteristic correlation of the ViG network is often related to the texture, color and other information of the image, and the high-level characteristic correlation of the ViG network is closely related to the semantic information corresponding to the image block. According to the rule, each layer of the whole ViG network can be analyzed respectively, and the graph structures of each layer are synthesized to comprehensively evaluate the correlation between the image blocks.

In the embodiment of the present application, each layer of the first neural network model may be constructed by using the flow of step S2021, and the manner in which each layer is constructed by using the flow of step S2021 may be referred to the description of step S2021, which is not repeated herein.

Further, step S2022 may specifically include: carrying out graph rolling operation on the modeling graphs respectively corresponding to at least one network layer; and splicing the graph convolution results respectively corresponding to at least one network layer to obtain a feature extraction result containing correlation.

In other words, in the embodiment of the present application, each layer of the first neural network model may perform the graph rolling operation by using the method of step S2022, and the specific method may refer to the description of step S2022, which is not repeated herein.

In the embodiment of the application, the graph rolling operation of each layer can be expressed in an abstract way as follows:

X′＝GraphConv(X)

Wherein GraphConv represents a graph convolution operation of a layer, X represents a modeling graph corresponding to a network layer, and X' represents a graph convolution result corresponding to the network layer.

In the embodiment of the application, in order to avoid the phenomenon that the feature extraction result is excessively smooth and the nodes of different categories cannot be distinguished, more feature transformation and nonlinear activation functions can be introduced to maintain the specificity among the node features.

As an example, a linear layer may be applied to map node features before and after the graph rolling manipulation, respectively, to increase the diversity of features. For example, a nonlinear activation function may be used after the graph convolution manipulation, which may be abstractly expressed as:

Y＝σ(GraphConv(XW _in ))W _out +X

wherein σ is an activation function, e.g. ReLU, geLU, etc., W _in And W is _out Is the weight of the full connection layer (Fully connected layer, FC), with +x representing the residual connection for avoiding overfitting.

Also by way of example, a feed forward network (Feed Forward Network, FFN) may also be applied to further guarantee the capability of feature transformation, mitigate the occurrence of overcomplete phenomena, and may be represented in abstract terms:

Z＝σ(YW ₁ )W ₂ +Y

wherein σ is the activation function, W ₁ And W is ₂ Is the weight of the full connection layer.

For this example, Z may represent the output of the FFN layer updating the current node characteristics after the completion of the graph rolling operation, for alleviating excessive smoothing in the graph rolling operation.

In other implementations, other feature transformations and/or nonlinear activation functions are also possible. It will be appreciated by those skilled in the art that the above-described several feature transformations and non-linear activations are merely illustrative of and not limiting on the embodiments of the present application, and that appropriate modifications based on these examples may be applied to the present application and are intended to be included within the scope of the present application.

Based on at least one embodiment described above, a schematic diagram of a segmentation framework provided in the embodiment of the present application is shown in fig. 4, where the segmentation framework includes a backbone network, i.e. a first neural network model, for example, but not limited to, a ViG network may be used, which is used to perform feature extraction on an input image and model a relationship between image blocks. Optionally, the input image is uniformly divided into a plurality of image blocks with equal size, a neighbor set formed by a preset number of neighbor image blocks closest to the image blocks at the characteristic level is determined for each image block of the input image, and a path between the image block and each neighbor image block in the neighbor set is constructed to obtain a modeling diagram. And carrying out graph convolution operation and characteristic change processing on the modeling graph corresponding to each layer of the L network layers of the first neural network model to obtain graph convolution results respectively corresponding to the L network layers.

Since the L layers of the graph convolution network from low to high have great differences in the expressive power of semantic features or detail features, the graph convolution result of at least one layer output can be used as the feature input of the segmentation prediction. For example, in fig. 4, an embodiment is shown, where the output convolution results from the 4 layers 6, 12, 18, and 24 from low to high may be used to obtain the feature input of the segmentation prediction, where the output convolution results are respectively transformed to the original two-dimensional space to correspond to the same space size, and then the transformed features are stitched along the channel dimension.

Followed by a segmentation network for use as a final defect segmentation model. And finally, sending the spliced features into a segmentation network model to conduct segmentation prediction on the defects, and training the segmentation network model by combining the generated second labeling information.

In the embodiment of the application, a great amount of larger or smaller dirt is easily distributed on the lens element (such as a camera module lens), so that the workload of manually performing full-scale labeling on the pixel level is very large, but the gray features of the lens element are relatively simple, and the scheme provides a model training method based on weak supervision labeling, so that the labor labeling burden can be greatly reduced.

Specifically, the labeling information includes at least one of the following:

(1) The point annotation information, for example as shown in fig. 5a, i.e. for at least one point,

(2) Graffiti labeling information, for example as shown in fig. 5b, is optionally labeled for trunks and/or edges.

(3) The block annotation information is, for example, as shown in fig. 5c, i.e. the annotation is made for at least one block region.

In view of the defects of high labeling cost and the like in the full labeling training scheme, the method adopts a segmentation model training method based on weak labeling, and can greatly reduce the data labeling cost. It will be appreciated that weak annotation can save a lot of annotation costs compared to full annotation, for example, point annotation is a more challenging form of various weak supervision annotation modes, and since point annotation can save 90% of annotation costs compared to full annotation, its practical value is very high. In practical application, the more the number of labels is, the higher the model precision is correspondingly. The smaller the number of labels, the smaller the label cost.

In the embodiment of the application, a small amount of weak annotation information can be more fully utilized to carry out label propagation. After the label is propagated, the network parameters are updated by training the segmentation network based on the generated pseudo labels until the finally obtained pseudo labels become stable, and the prediction precision of the segmentation model also reaches an ideal level.

In the embodiment of the present application, an optional implementation manner is provided for step S201, which specifically may include the steps of: acquiring a photographed image for a target lens element; and mapping the shot image to the template image based on a preset template image to obtain an input image.

Taking the example that the lens element is a camera lens, after the camera is mounted on the terminal device, the outer ring portion is shielded under the shielding of the light inlet channel of the device, for example, as shown in fig. 6, the ring area between the outer dotted circle 1 and the middle dotted circle 2 is invisible to the user on the terminal device, so that the defect standard is relatively loose, for example, taking the defect judged as dust as an example, the standard may be that dust of 0.5mm cannot exceed 3, and the like. Whereas the circular area between the middle dashed circle 2 and the inner dashed circle 3 is visible on the terminal equipment, a more stringent criterion is applied, for example, that a defect of dust is judged, which may be that 0.1mm dust cannot exceed 1, etc.

In order to accurately position the areas of different standards on the lens element, the photographed images need to be registered first and uniformly transformed into a template map. Optionally, for a given captured image and template image, the image is first bilateral filtered to reduce sources of error in extracting feature points. Feature points of the image, such as SURF (Speeded Up Robust Features, accelerated robustness feature) feature points, are then extracted, respectively, but are not limited thereto. The mapping matrix is calculated by using RANSAC (Random Sample Consensus, random sample matching) algorithm, and the captured image is mapped to the template image based on the mapping matrix to obtain the input image, for example, but not limited to, the area of the dashed circle 1 with the inscribed circle outside in fig. 6.

In the embodiment of the present application, after obtaining the segmentation result, the method may further include the steps of: acquiring a predefined mask corresponding to the template image; each sub-region of the segmentation result is extracted based on the predefined mask to use the segmentation result of each sub-region for defect determination based on a post-processing rule corresponding to the predefined mask, respectively.

Similarly, when the trained segmentation network model is applied to detect the shot image of the lens element to be detected, the shot image can be subjected to image registration by the method so as to acquire the lens area to be detected, and then the pre-defined mask of the template image can be directly used for judging the post-processing rule.

Specifically, different sub-regions may be extracted in combination with a predefined mask of the template image, so that different processing rules are applied subsequently, for example, if fig. 6 shows a shot image of a lens to be detected, a first processing rule is applied to a circular region between the outer dashed circle 1 and the middle dashed circle 2, and a second processing rule is applied to a circular region between the middle dashed circle 2 and the inner dashed circle 3, etc.

In the embodiment of the present application, as shown in fig. 7, taking a lens element as a camera lens as an example, for a shot image of a given point location of a lens portion, image registration is performed first, then a potential defect area is extracted by an image segmentation method based on deep learning, and for a segmentation result, whether foreign matters such as dirt or scratches meeting a defect standard exist is determined by using methods based on logic rules, appearance shape determination, and the like. As examples, apparent features that may be relied upon are, but are not limited to, length and width, area, roundness, brightness, contrast, and the like.

The technical scheme provided by the embodiment of the application can be used as a part of the 3C quality inspection capability, applied to the quality inspection of the camera module of the industrial AI quality inspection platform, used for analyzing and judging suspected defects of the lens element area, and used for giving out sample-level defect judgment together with analysis results of other points.

Based on at least one embodiment of the present disclosure, in the embodiment of the present disclosure, a first neural network model is a ViG network, a pixel-level point labeling manner is adopted by a labeling manner, and a lens element is a lens, which provides a complete implementation process of lens defect detection, specifically, as shown in fig. 8, the implementation process mainly includes the following procedures:

step S801: acquiring a first shooting image aiming at a target lens;

step S802: registering the first shot image to a template image to obtain an input image;

step S803: acquiring point marking information (namely first marking information related to the defect of the target lens) corresponding to the input image;

step S804: dividing an input image into a plurality of image blocks;

step S805: each layer of the ViG network carries out graph construction to obtain modeling graphs corresponding to each layer respectively, and each layer carries out graph convolution operation on the corresponding modeling graphs to extract characteristics of an input image and construct correlation among image blocks;

Step S806: and transforming and splicing the characteristics output by the 6 th, 12 th, 18 th and 24 th layers of the ViG network to obtain a characteristic extraction result.

Step S807: acquiring a two-dimensional histogram consistent with the number of image blocks of an input image, and generating a correlation statistical result H matrix based on the correlation constructed in the step 5;

step S808: based on the point labeling information obtained in the step 3, carrying out label propagation by combining with the H matrix to generate a pseudo label;

step S809: training the segmentation network model by repeating the feature extraction result and the pseudo tag obtained in the steps to obtain a trained segmentation network model;

step S810: acquiring a second shooting image aiming at a lens to be detected;

step S811: registering the second shot image to the template image to obtain an image to be processed;

step S812: detecting an image to be processed by utilizing the segmentation network model trained in the step 9, and obtaining a defect segmentation result;

step S813: acquiring a predefined mask corresponding to the template image;

step S814: extracting each sub-region of the segmentation result based on a predefined mask;

step S815: the segmentation result of each sub-region is used for judging the defect based on the post-processing rule corresponding to the predefined mask.

Wherein, the above steps are not detailed, and reference is made to the above description, and are not repeated herein.

According to the camera module lens defect detection method based on the graph rolling network, provided by the embodiment of the application, in view of the defects of the full-label training scheme, the segmentation model training method based on the point labels is provided, so that the data label cost can be greatly reduced. In order to fully utilize the existing small quantity of labeling information, a depth model based on graph convolution is used as a main network, the relationship between image blocks is modeled through a graph convolution network ViG, meanwhile, the correlation between the image blocks contained in the image blocks is mined from top to bottom, the correlation matrixes between the image blocks are obtained by utilizing different expression capacities of different layers, so that the correlation between the different image blocks is comprehensively evaluated, and label propagation is carried out on the basis of the correlation matrixes to obtain pseudo labels of pixel levels, so that the existing point labeling information can be utilized more fully, and the area of an effective label is enlarged. After tag propagation, the split network is trained to update network parameters based on the pseudo tags. The method for iteratively training the segmentation network model by using the pseudo tag enables the final segmentation network model to be converged to ideal precision.

And because the excellent capability of modeling the correlation between image blocks of the graph rolling network is fully utilized, the end-to-end training of the whole network can be realized without going beyond, and meanwhile, the label propagation can be dynamically carried out, so that the model iteration efficiency is greatly improved.

Meanwhile, the mirror background on the lens is relatively simple, so that high enough prediction precision can be obtained.

A large number of experiments show that the technical scheme provided by the embodiment of the application can achieve the training effect comparable to the full-supervision training for the components such as the camera module lens, and effectively saves the labor cost.

The embodiment of the application can be applied to various scenes, and relates to the fields of artificial intelligence (such as computer vision technology, machine learning and the like), cloud technology (such as model training or online prediction can be performed by adopting cloud computing and the like), intelligent transportation (which can involve the use of a camera module) and the like.

Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Computer Vision (CV) is a science of how to "look" a machine, and more specifically, a camera and a Computer are used to replace human eyes to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicle, robot, intelligent customer service, internet of vehicles, automatic driving, intelligent transportation, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and become more and more important value.

The intelligent transportation system (Intelligent Traffic System, ITS), also called intelligent transportation system (Intelligent Transportation System), is a comprehensive transportation system which uses advanced scientific technology (information technology, computer technology, data communication technology, sensor technology, electronic control technology, automatic control theory, operation study, artificial intelligence, etc.) effectively and comprehensively for transportation, service control and vehicle manufacturing, and enhances the connection among vehicles, roads and users, thereby forming a comprehensive transportation system for guaranteeing safety, improving efficiency, improving environment and saving energy.

Cloud computing (closed computing) refers to the delivery and usage mode of an IT infrastructure, meaning that required resources are obtained in an on-demand, easily scalable manner through a network; generalized cloud computing refers to the delivery and usage patterns of services, meaning that the required services are obtained in an on-demand, easily scalable manner over a network. Such services may be IT, software, internet related, or other services. Cloud Computing is a product of fusion of traditional computer and network technology developments such as Grid Computing (Grid Computing), distributed Computing (distributed Computing), parallel Computing (Parallel Computing), utility Computing (Utility Computing), network storage (Network Storage Technologies), virtualization (Virtualization), load balancing (Load balancing), and the like.

With the development of the internet, real-time data flow and diversification of connected devices, and the promotion of demands of search services, social networks, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Unlike the previous parallel distributed computing, the generation of cloud computing will promote the revolutionary transformation of the whole internet mode and enterprise management mode in concept.

An embodiment of the present application provides an image processing apparatus, as shown in fig. 9, the image processing apparatus 90 may include: acquisition module 901, processing module 902, generation module 903, training module 904, and segmentation module 905, wherein,

the acquiring module 901 is configured to acquire an input image corresponding to a target lens element, where the input image includes at least one first labeling information related to a defect of the target lens element;

the processing module 902 is configured to perform feature extraction on an input image through a first neural network model, and construct a correlation between at least two image blocks of the input image, so as to obtain a feature extraction result including the correlation;

the generating module 903 is configured to perform label propagation processing based on the correlation and the first label information, and generate second label information corresponding to the input image and related to the defect of the target lens element;

The training module 904 is configured to train the segmentation network model based on the feature extraction result and the second labeling information, to obtain a trained segmentation network model;

the segmentation module 905 is configured to perform a segmentation process on the defect of the lens element based on the trained segmentation network model, so as to use the segmentation result for determining the defect of the lens element.

In an alternative embodiment, the generating module 903 is specifically configured to, when performing the label propagation process based on the correlation and the first label information, generate second label information corresponding to the input image and related to the defect of the target lens element:

based on the processing process of the first neural network model on the input image, counting the correlation to obtain a correlation counting result;

based on the correlation statistics and the first annotation information, second annotation information corresponding to the input image and related to the defect of the target lens element is generated.

In an alternative embodiment, the generating module 903 is configured to, when performing a process for processing an input image based on the first neural network model, calculate a correlation, and obtain a correlation statistical result, specifically:

acquiring a two-dimensional histogram consistent with the number of the image blocks;

and based on the processing process of the first neural network model on the input image, counting every two image blocks with correlation meeting a first condition to corresponding positions of the two-dimensional histogram, and obtaining a correlation counting result.

In an alternative embodiment, the generating module 903 is specifically configured to, when performing statistics on each two image blocks having a correlation that satisfies the first condition to corresponding positions of the two-dimensional histogram during a processing procedure of the input image based on the first neural network model, obtain a correlation statistics result:

the processing procedure of each layer of the first neural network model on the input image is divided into at least two sub-processing procedures, and based on each sub-processing procedure, every two image blocks with correlation meeting a first condition are counted to corresponding positions of a sub-two-dimensional histogram corresponding to each sub-processing procedure;

based on the numerical value meeting the second condition in the sub two-dimensional histogram of each layer, counting the corresponding position of the two-dimensional histogram to obtain a correlation statistical result.

In an alternative embodiment, the generating module 903 is configured to, when performing label propagation processing based on the correlation statistics and the first label information, generate second label information corresponding to the input image and related to the defect of the target lens element, specifically:

taking the first labeling information as labeled information, and repeatedly executing the following labeling propagation process: maintaining the marked information unchanged, and transmitting the marked information to the related image blocks according to the correlation statistics result, wherein in the marking transmission process, each current image block updates the marked information of the current image block according to the marked information of the related image block based on the correlation statistics result, and the marked information updated by each image block is used as new marked information to execute the next marking transmission process;

And after the label propagation process is converged, obtaining second label information corresponding to the input image and related to the defect of the target lens element based on the label information finally updated by each image block.

In an alternative embodiment, the processing module 902 is configured to perform feature extraction on the input image and construct a correlation between at least two image blocks of the input image, specifically:

for each image block of an input image, determining a neighbor set formed by a preset number of neighbor image blocks closest to the image block at a characteristic level, and constructing a path between the image block and each neighbor image block in the neighbor set to obtain a modeling diagram;

and carrying out graph convolution operation on the modeling graph to obtain a feature extraction result containing correlation.

In an alternative embodiment, the processing module 902 is specifically configured to, when configured to perform a graph convolution operation on the modeling graph to obtain a feature extraction result including correlation:

for each image block in the modeling diagram, aggregating the context information of each neighbor image block in the neighbor set of the image block to obtain an aggregate feature, and updating the image block based on the aggregate feature;

and obtaining a feature extraction result containing correlation based on each updated image block.

In an alternative embodiment, the processing module 902, when configured to update the image block based on the aggregate characteristics, is specifically configured to:

dividing the aggregate feature into at least two sub-aggregate features;

based on at least two sub-aggregation features, updating the image blocks in parallel;

and splicing the parallel updating results.

In an alternative embodiment, the processing module 902 is configured to determine, for each image block of the input image, a neighbor set formed by a predetermined number of neighbor image blocks closest to the image block at a feature level, and construct a path between the image block and each neighbor image block in the neighbor set, so as to obtain a modeling graph, where the processing module is specifically configured to:

determining a neighbor set formed by a preset number of neighbor image blocks closest to the image block at a characteristic layer aiming at each image block of an input image at least one network layer of a first neural network model, and constructing a path between the image block and each neighbor image block in the neighbor set to obtain a modeling diagram respectively corresponding to at least one network layer;

Carrying out graph rolling operation on the modeling graphs respectively corresponding to at least one network layer;

and splicing the graph convolution results respectively corresponding to at least one network layer to obtain a feature extraction result containing correlation.

In an alternative embodiment, the annotation information comprises at least one of:

point marking information;

graffiti labeling information;

and (5) block marking information.

In an alternative embodiment, the acquiring module 901, when configured to acquire an input image corresponding to the target lens element, is specifically configured to:

acquiring a photographed image for a target lens element;

mapping the shooting image to a template image based on a preset template image to obtain an input image;

in an alternative embodiment, the apparatus further comprises a determination module for:

acquiring a predefined mask corresponding to the template image;

each sub-region of the segmentation result is extracted based on the predefined mask to use the segmentation result of each sub-region for defect determination based on a post-processing rule corresponding to the predefined mask, respectively.

The device of the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the device of the embodiment of the present application correspond to steps in the method of the embodiment of the present application, and detailed functional descriptions and resulting beneficial effects of each module of the device may be specifically referred to the descriptions in the corresponding methods shown in the foregoing, which are not repeated herein.

An embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory, where the processor executes the computer program to implement the steps of the foregoing method embodiments.

Optionally, the electronic device may refer to a server, which may be an independent physical server, or may be a server cluster formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing (enclosed computing) services such as big data and an artificial intelligence platform.

Alternatively, the electronic device may refer to a terminal device, where the terminal (may also be referred to as a user terminal or user device) may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart voice interaction device (e.g., a smart speaker), a wearable electronic device (e.g., a smart watch), a vehicle-mounted terminal, a smart home appliance (e.g., a smart television), an AR (Augmented Reality )/VR (Virtual Reality) device, and the like.

Alternatively, the electronic device may refer to a combination of a server and a terminal device, e.g. the server and the terminal cooperate to train the model and/or to apply the model. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

In an alternative embodiment, an electronic device is provided, as shown in fig. 10, the electronic device 1000 shown in fig. 10 includes: a processor 1001 and a memory 1003. The processor 1001 is coupled to the memory 1003, such as via a bus 1002. Optionally, the electronic device 1000 may further include a transceiver 1004, where the transceiver 1004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 1004 is not limited to one, and the structure of the electronic device 1000 is not limited to the embodiment of the present application.

The processor 1001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 1001 may also be a combination that implements computing functionality, such as a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 1002 may include a path to transfer information between the components. Bus 1002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The bus 1002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 10, but not only one bus or one type of bus.

The Memory 1003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer.

The memory 1003 is used to store a computer program for executing an embodiment of the present application, and is controlled to be executed by the processor 1001. The processor 1001 is arranged to execute a computer program stored in the memory 1003 to implement the steps shown in the foregoing method embodiments.

Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the foregoing method embodiments and corresponding content.

The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program can realize the steps and corresponding contents of the embodiment of the method when being executed by a processor.

The terms "first," "second," "1," "2," and the like in the description and in the claims and drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that the embodiments of the application described herein may be implemented in other sequences than those illustrated or otherwise described.

It should be understood that, although various operation steps are indicated by arrows in the flowcharts of the embodiments of the present application, the order in which these steps are implemented is not limited to the order indicated by the arrows. In some implementations of embodiments of the application, the implementation steps in the flowcharts may be performed in other orders as desired, unless explicitly stated herein. Furthermore, some or all of the steps in the flowcharts may include multiple sub-steps or multiple stages based on the actual implementation scenario. Some or all of these sub-steps or phases may be performed at the same time, or each of these sub-steps or phases may be performed at different times, respectively. In the case of different execution time, the execution sequence of the sub-steps or stages can be flexibly configured according to the requirement, which is not limited by the embodiment of the present application.

The foregoing is only an optional implementation manner of some implementation scenarios of the present application, and it should be noted that, for those skilled in the art, other similar implementation manners based on the technical ideas of the present application are adopted without departing from the technical ideas of the scheme of the present application, which also belongs to the protection scope of the embodiments of the present application.

Claims

1. An image processing method, comprising:

acquiring an input image corresponding to a target lens element, wherein the input image comprises at least one first annotation information related to a defect of the target lens element;

performing feature extraction on the input image through a first neural network model, and constructing correlation between at least two image blocks of the input image to obtain a feature extraction result containing the correlation;

training a segmentation network model based on the feature extraction result and the second labeling information to obtain a trained segmentation network model;

2. The image processing method according to claim 1, wherein the performing the annotation propagation process based on the correlation and the first annotation information generates second annotation information related to the defect of the target lens element corresponding to the input image, comprising:

and performing annotation propagation processing based on the correlation statistics result and the first annotation information, and generating second annotation information corresponding to the input image and related to the defect of the target lens element.

3. The image processing method according to claim 2, wherein the processing procedure of the input image based on the first neural network model counts the correlation to obtain a correlation statistic result, including:

and counting every two image blocks with correlation meeting a first condition to corresponding positions of the two-dimensional histogram based on the processing process of the first neural network model on the input image, and obtaining a correlation statistical result.

4. The image processing method according to claim 3, wherein the processing of the input image based on the first neural network model includes counting each two image blocks whose correlation satisfies a first condition to a corresponding position of the two-dimensional histogram, so as to obtain a correlation statistic result, including:

dividing the processing process of each layer of the first neural network model on the input image into at least two sub-processing processes, and counting every two image blocks with correlation meeting a first condition to corresponding positions of sub-two-dimensional histograms corresponding to the sub-processing processes based on each sub-processing process;

and counting the corresponding positions of the two-dimensional histograms based on the numerical values meeting the second condition in the sub-two-dimensional histograms of each layer, and obtaining the correlation statistical result.

5. The image processing method according to claim 2, wherein performing an annotation propagation process based on the correlation statistic and the first annotation information, generating second annotation information related to the defect of the target lens element corresponding to the input image, comprises:

taking the first labeling information as labeled information, and repeatedly executing the following labeling propagation process: maintaining the marked information unchanged, and transmitting the marked information to related image blocks according to the correlation statistics result, wherein in the marking transmission process, each current image block updates the marked information of the current image block according to the marked information of the related image block based on the correlation statistics result, and the marked information updated by each image block is used as new marked information to execute the next marking transmission process;

6. The image processing method according to any one of claims 1 to 5, wherein the feature extraction of the input image and the construction of the correlation between at least two image blocks of the input image includes:

determining a neighbor set formed by a preset number of neighbor image blocks closest to the image block at a characteristic level aiming at each image block of the input image, and constructing a path between the image block and each neighbor image block in the neighbor set to obtain a modeling diagram;

and carrying out graph rolling operation on the modeling graph to obtain a feature extraction result containing the correlation.

7. The image processing method according to claim 6, wherein the performing a graph rolling operation on the modeling graph to obtain a feature extraction result including the correlation includes:

And obtaining a feature extraction result containing the correlation based on each updated image block.

8. The image processing method according to claim 7, wherein updating the image block based on the aggregation feature comprises:

dividing the aggregated features into at least two sub-aggregated features;

based on the at least two sub-aggregation features, updating the image block in parallel;

and splicing the parallel updating results.

9. The image processing method according to claim 6, wherein for each image block of the input image, determining a neighbor set composed of a predetermined number of neighbor image blocks closest to the image block at a feature level, and constructing a path between the image block and each neighbor image block in the neighbor set, to obtain a modeling map, comprising:

determining a neighbor set formed by a preset number of neighbor image blocks closest to the image block at a characteristic layer aiming at each image block of the input image at least one network layer of the first neural network model, and constructing a path between the image block and each neighbor image block in the neighbor set to obtain a modeling diagram respectively corresponding to the at least one network layer;

Performing a graph convolution operation on the modeling graph to obtain a feature extraction result containing the correlation, including:

carrying out graph rolling operation on the modeling graphs respectively corresponding to the at least one network layer;

and splicing the graph convolution results respectively corresponding to at least one network layer to obtain a feature extraction result containing the correlation.

10. The image processing method according to any one of claims 1 to 5, wherein the annotation information includes at least one of:

point marking information;

graffiti labeling information;

and (5) block marking information.

11. The method according to any one of claims 1 to 5, wherein the acquiring the input image corresponding to the target lens element includes:

acquiring a captured image for the target lens element;

mapping the shooting image to a template image based on a preset template image to obtain the input image;

the method further comprises the steps of:

acquiring a predefined mask corresponding to the template image;

and extracting each subarea of the segmentation result based on the predefined mask, so as to respectively use the segmentation result of each subarea for judging defects based on a post-processing rule corresponding to the predefined mask.

12. An image processing apparatus, comprising:

an acquisition module, configured to acquire an input image corresponding to a target lens element, where the input image includes at least one first annotation information related to a defect of the target lens element;

the processing module is used for extracting the characteristics of the input image through a first neural network model, constructing the correlation between at least two image blocks of the input image and obtaining a characteristic extraction result containing the correlation;

the generation module is used for carrying out annotation propagation processing based on the correlation and the first annotation information and generating second annotation information corresponding to the input image and related to the defect of the target lens element;

and the segmentation module is used for carrying out segmentation processing on the defects of the lens element based on the trained segmentation network model so as to use the segmentation result for judging the defects of the lens element.

13. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the method of any one of claims 1-11.

14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-11.

15. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any of claims 1-11.