CN113205474A

CN113205474A - Screen detection and screen detection model training method, device and equipment

Info

Publication number: CN113205474A
Application number: CN202010042468.2A
Authority: CN
Inventors: 叶磊; 王靓伟
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-01-15
Filing date: 2020-01-15
Publication date: 2021-08-03

Abstract

The embodiment of the application provides a screen detection and screen detection model training method, a device and equipment, wherein the screen detection method comprises the following steps: acquiring a feature vector of each pixel point in a first image, wherein the first image is an image obtained by shooting a screen to be detected; classifying the pixel points in the first image according to the feature vector of each pixel point in the first image, and obtaining the detection result of the screen to be detected according to the classification result. The scheme provided by the embodiment of the application can be used for carrying out defect detection and quality control on the screen under the scenes in the production and manufacturing of the screen or in the scenes before and after the screen assembling link of the screen equipment, and the accuracy of screen detection is improved.

Description

Screen detection and screen detection model training method, device and equipment

Technical Field

The application relates to the technical field of image processing, in particular to a screen detection and screen detection model training method, device and equipment.

Background

In the production process of each electronic device screen, the screen may be affected by an assembly process or other reasons, so that defects occur on the screen.

At present, defects on a screen are mainly detected through manual detection or various screen detection algorithms. However, when the defects on the screen are detected by manual detection or a screen detection algorithm, the accuracy of detection for smaller screen defects is lower, and it is difficult to detect the tiny defects including only a few pixel points.

Disclosure of Invention

The embodiment of the application provides a screen detection method, a screen detection device and screen detection model training equipment, and the accuracy of detecting tiny defects in a screen is improved.

In a first aspect, an embodiment of the present application provides a screen detection method, where when a screen needs to be detected, a feature vector of each pixel point in a first image may be first obtained, the pixel points in the first image are classified according to the feature vector of each pixel point in the first image, and a detection result of the screen to be detected is obtained according to the classification result, where the first image is an image obtained by shooting the screen to be detected.

In the process, after a first image is obtained by shooting a screen to be detected, the feature vector of each pixel point in the first image is obtained, then the pixel points in the first image are classified according to the feature vectors of the pixel points to obtain the detection result of the screen to be detected, the detection process is carried out aiming at the pixel points in the first image, independent judgment is carried out based on the feature vector of each pixel point, pixel-level segmentation of the first image can be realized, therefore, the tiny screen defect of the pixel level in the screen to be detected can be detected, and the detection accuracy of the tiny defect in screen defect detection is improved.

In a possible implementation manner, the feature vector of each pixel point in the first image may be obtained as follows: determining a plurality of three-dimensional characteristic images according to a first image, wherein the first image comprises M pixel points in the transverse direction, the first image comprises N pixel points in the longitudinal direction, each three-dimensional characteristic image comprises M pixel points in the transverse direction, each three-dimensional characteristic image comprises N pixel points in the longitudinal direction, the number of channels of each three-dimensional characteristic image is C, and C is a preset category number; and acquiring a feature vector of each pixel point in the first image according to the plurality of three-dimensional feature images.

In one possible implementation, the plurality of three-dimensional feature images may be determined by: performing multiple feature extraction processing on the first image to obtain multiple three-dimensional feature images; the feature extraction operations in each two times of feature extraction processing are different in number, and the feature extraction operations include convolution operations and sampling operations.

In the process, a plurality of three-dimensional feature images are obtained by performing feature extraction processing on the first image, wherein the feature extraction operation times in each feature extraction processing are different, the obtained plurality of three-dimensional feature images reflect features of different levels in the first image, and feature vectors of pixel points in the first image obtained according to the plurality of three-dimensional feature images are more beneficial to subsequent classification of the pixel points.

In a possible implementation manner, performing feature extraction processing on a first image to obtain a three-dimensional feature image for any one time of feature extraction processing includes: performing convolution operation and downsampling operation on the first image to obtain K downsampled feature images, wherein the size of the ith downsampled feature image is

i, sequentially taking 1,2, … … and K; and performing convolution operation and up-sampling operation according to the K down-sampling characteristic images to obtain a three-dimensional characteristic image.

In one possible implementation, K down-sampled feature images may be obtained by: performing convolution operation on the first image to obtain a first downsampling characteristic image; and sequentially performing i times of first operation on the first down-sampling feature image to obtain an i +1 th down-sampling feature image, wherein i sequentially takes 1,2, … … and K-1, and the first operation comprises the down-sampling operation and the convolution operation.

In the above process, for any feature extraction process, after performing a convolution operation and a downsampling operation on the first image once, a downsampled feature image with a different scale is obtained. After convolution operation is carried out on the first image to obtain a first downsampling feature image, the first downsampling feature image is subjected to first operation for different times to obtain downsampling feature images under different scales, and therefore features of the first image under different scales are extracted.

In one possible implementation, the three-dimensional feature image may be obtained by: performing downsampling operation, convolution operation and upsampling operation on the Kth downsampling feature image to obtain a Kth upsampling feature image; sequentially carrying out merging operation, convolution operation and upsampling operation on the ith up-sampling characteristic sample image and the ith down-sampling characteristic image to obtain an ith-1 up-sampling characteristic image, and sequentially taking K, K-1, … … and 2 for i; and performing convolution operation on the first up-sampling image to obtain a three-dimensional characteristic image.

In the above process, by performing the down-sampling operation, the convolution operation, and the up-sampling operation on the up-sampling feature image, the i-th down-sampling feature image and the i-th up-sampling feature image can be combined before the up-sampling operation, so that the features in the i-th down-sampling feature image are retained to the next layer as much as possible.

In a possible implementation manner, the feature vector of each pixel point in the first image may be obtained as follows: determining a target three-dimensional characteristic image according to pixel values of pixel points in each three-dimensional characteristic image, wherein the target three-dimensional characteristic image transversely comprises M pixel points, the target three-dimensional image longitudinally comprises N pixel points, and the number of channels of the target three-dimensional characteristic image is C; and determining the characteristic vector of each pixel point in the first image according to the pixel value of each pixel point in the target three-dimensional characteristic image.

In one possible implementation, the target three-dimensional feature image may be determined by: determining the pixel values of M x N pixels of the x channel of the target three-dimensional characteristic image according to the pixel values of M x N pixels of the x channel in each three-dimensional characteristic image, wherein x is 1,2, … … and C in sequence; the pixel value of the (a, b) th pixel point in the x-th channel of the target three-dimensional characteristic image is as follows: the maximum value of pixel values of (a, b) th pixel points in the x-th channel of the three-dimensional characteristic images is larger than the maximum value of pixel values of the (a, b) th pixel points in the x-th channel of the three-dimensional characteristic images, a is a positive integer smaller than or equal to M, and b is a positive integer smaller than or equal to N.

In a possible implementation manner, for the (a, b) th pixel point in the first image, the feature vector of the (a, b) th pixel point in the first image may be determined by the following method: and determining the characteristic vector of the (a, b) th pixel point in the first image according to the values of the (a, b) th pixel point in the C channels of the target three-dimensional characteristic image.

In the process, a feature vector of each pixel point in the first image is obtained through a pixel value of the pixel point in each three-dimensional feature image, a target three-dimensional feature image is obtained according to each three-dimensional feature image, the size of each three-dimensional feature image and the size of the target three-dimensional feature image are both M x N, the number of channels of each three-dimensional feature image and the number of channels of the target three-dimensional feature image are both C, and the pixel value of the pixel point on any channel in the target three-dimensional feature image is the maximum value of the pixel point at the same position on the corresponding channel of each target three-dimensional feature image. And obtaining the feature vector of each pixel point in the first image through the pixel value of each pixel point on the target three-dimensional feature image on the C channels.

In a possible implementation manner, the feature vector of each pixel point in the first image may be obtained in another manner as follows: inputting the first image into a feature extraction model to obtain a feature vector of each pixel point in the first image, wherein the feature extraction model is obtained by learning multiple groups of first samples, each group of first samples comprises the first sample image and the feature vector of the pixel point on the first sample image, and the first sample image is an image obtained by shooting a first sample screen.

Through the training of a plurality of groups of first samples, the feature extraction model can have the function of extracting the feature vectors of the pixel points in the image, at the moment, the first image is input into the feature extraction model, and the feature vector of each pixel point in the first image output by the feature extraction model can be obtained.

In a possible implementation manner, the pixel points in the first image may be classified in the following manner, and the detection result of the screen to be detected is obtained according to the classification result: inputting the feature vectors of the pixel points in the first image into a preset model to obtain the category of each pixel point in the first image, wherein the preset model is obtained by learning multiple groups of second samples, each group of second samples comprises the feature vectors and labeling information of the pixel points on the second sample image, the second sample image is an image obtained by shooting a second sample screen, the labeling information is information labeling the category of the pixel points on the second sample image, the category of any pixel point in the first image is one of preset categories, and the number of the preset categories is C; and obtaining a detection result of the screen to be detected according to the category of each pixel point in the first image.

In the above process, the preset model is a model obtained by pre-training, wherein the preset model is obtained by learning a plurality of groups of second samples, each group of second samples includes feature vectors and label information of pixel points on a second sample image, the second sample image is an image obtained by shooting a second sample screen, the label information is information labeled with the category of the pixel points on the second sample image, and the category of the pixel points is one of C preset categories. According to the learning of the second training sample, the preset model can have the function of classifying the pixel points in the image, at the moment, the feature vectors of the pixel points in the first image are input into the preset model, and the category of each pixel point in the first image output by the preset model can be obtained. Further, if the feature vector of each pixel point in the first image is obtained according to the feature extraction model, the feature extraction model and the preset model can be used as two parts of one model at this time, and the total model is trained. After training is completed, the first image is input into a total model, the total model firstly extracts the feature vector of each pixel point in the first image, and then the pixel points in the first image are classified according to the feature vector of each pixel point, so that defect detection in a screen to be detected is realized.

In a possible implementation manner, the category of each pixel point in the first image may be obtained by the following method: inputting the feature vectors of the pixel points in the first image into a preset model to obtain C-1 first output images and a second output image, wherein each first output image indicates a defect type, the gray value of the (C, d) th pixel point in each first output image i is used for indicating the probability that the defect type of the (C, d) th pixel point on the first image is the defect type indicated by the first output image i, the gray value of the (C, d) th pixel point in the second output image is used for indicating the probability that the type of the (C, d) th pixel point on the first image is normal, C is a positive integer smaller than or equal to M, and d is a positive integer smaller than or equal to N; and obtaining the category of each pixel point in the first image according to the gray value of each pixel point on the C-1 first output images and the gray value of each pixel point on the second output image.

In a possible implementation manner, the category of each pixel point in the first image may be obtained by the following method: aiming at the (C, d) th pixel point in the first image, acquiring the gray value of the (C, d) th pixel point on each first output image in the C-1 first output images and the gray value of the (C, d) th pixel point on the second output image; determining a pixel point with the maximum gray value according to the gray value of the (c, d) th pixel point on each first output image and the gray value of the (c, d) th pixel point on the second output image; if the image where the pixel point with the maximum gray value is located is the second output image, determining that the category of the (c, d) th pixel point in the first image is normal; otherwise, determining the defect type indicated by the first output image where the pixel point with the maximum gray value is located as the type of the (c, d) th pixel point in the first image.

In the above process, a method for determining a pixel point category in the first image is shown. The preset model outputs C images, the size of each output image is M x N, each first output image indicates a defect type, the gray value of each pixel point in the first output image indicates the probability that the pixel point at the same position in the first image is the defect type indicated by the first output image, the second output image indicates a normal pixel, and the gray value of each pixel point in the second output image indicates the probability that the pixel point at the same position in the first image is the normal pixel. After C-1 first output images and one second output image are obtained, the category of each pixel point in the first images can be determined.

In a second aspect, an embodiment of the present application provides a screen detection model training method, including: acquiring a training sample, wherein the training sample comprises a sample image and marking information of pixel points in the sample image, and the marking information of the pixel points in the sample image is information marking the types of the pixel points in the sample image; inputting the sample image into a screen detection model to obtain the training output category of pixel points in the sample image; and adjusting parameters of the screen detection model according to the training output categories of the pixel points in the sample image and the labeling information of the pixel points in the sample image until the error between the training output categories of the pixel points in the sample image and the labeling information of the pixel points in the sample image is less than or equal to a preset error, and obtaining the trained screen detection model.

In the process, the screen detection model is trained through the training sample, the sample image is processed through the screen detection model, after the training output category of the pixel points in the sample image is obtained, parameters in the screen detection model are adjusted according to the training output category and the actual category of the pixel points in the sample image until the error between the training output category and the actual category of the pixel points in the sample image is small, the model training is completed, and the screen detection model which is trained at the moment can classify the input image into pixel points.

In one possible implementation, the screen detection model includes a feature extraction network and a classification network; the training output category of the pixel points in the sample image can be obtained by the following method: carrying out feature extraction processing on a sample image according to a feature extraction network to obtain a plurality of sample three-dimensional feature images, wherein the sample image comprises M pixel points in the transverse direction, the sample image comprises N pixel points in the longitudinal direction, each sample three-dimensional feature image comprises M pixel points in the transverse direction, each sample three-dimensional feature image comprises N pixel points in the longitudinal direction, the number of channels of each sample three-dimensional feature image is C, and C is the number of categories for classifying the pixel points in the sample image; and processing the plurality of sample three-dimensional characteristic images according to the classification network to obtain the training output categories of the pixel points in the sample images.

In one possible implementation, the plurality of sample three-dimensional feature images may be obtained by: carrying out multiple times of feature extraction processing on the sample images according to the feature extraction network to obtain a plurality of sample three-dimensional feature images; the feature extraction operations in each two times of feature extraction processing are different in number, and the feature extraction operations include convolution operations and sampling operations.

In the above process, a plurality of sample three-dimensional feature images are obtained by performing feature extraction processing on the sample images, wherein the feature extraction operation times in each feature extraction processing are different, so that features of different levels in the sample images are extracted.

In one possible implementation, the feature extraction network includes a convolutional layer, a pooling layer, and an upsampling layer; for any one time of feature extraction processing, a sample three-dimensional feature image can be obtained by the following method: performing convolution operation and down-sampling operation on the sample image according to the convolution layer and the pooling layer to obtain K sample down-sampling feature images, wherein the size of the i-th sample down-sampling feature image is

i, sequentially taking 1,2, … … and K; and performing convolution operation and up-sampling operation on the K sample down-sampling feature images according to the convolution layer and the up-sampling layer to obtain a sample three-dimensional feature image.

In one possible implementation, the K sample downsampled feature images may be obtained by: performing convolution operation on the sample image according to the convolution layer to obtain a first sample downsampling characteristic image; and sequentially performing i times of first operation on the first sample downsampling feature image according to the convolutional layer and the pooling layer to obtain an i +1 th sample downsampling feature image, wherein i is 1,2, … … and K-1 in sequence, and the first operation comprises downsampling operation and convolution operation.

In the above process, for any one time of feature extraction processing, after performing convolution operation and down-sampling operation on the sample image once, a sample down-sampling feature image with different scales is obtained. After convolution operation is carried out on the sample image to obtain a first sample downsampling feature image, the first operation of different times is carried out on the first sample downsampling feature image to obtain sample downsampling feature images under different scales, and therefore features of the sample image under different scales are extracted.

In one possible implementation, the sample three-dimensional feature image may be obtained by: performing down-sampling operation, convolution operation and up-sampling operation on the K sample down-sampling feature image according to the pooling layer, the convolution layer and the up-sampling layer to obtain a K sample up-sampling feature image; sequentially carrying out merging operation, convolution operation and up-sampling operation on the ith sample up-sampling characteristic sample image and the ith sample down-sampling characteristic image according to the convolutional layer and the up-sampling layer to obtain an i-1 sample up-sampling characteristic image, and sequentially taking K, K-1, … … and 2 for i; and performing convolution operation on the first sample up-sampling image according to the convolution layer to obtain a sample three-dimensional characteristic image.

In the above process, by performing a downsampling operation, a convolution operation and an upsampling operation on the sample upsampling feature image, the ith sample downsampling feature image and the ith sample upsampling feature image can be combined before the upsampling operation, so that the features in the ith sample downsampling feature image are retained to the next layer as much as possible.

In a third aspect, an embodiment of the present application provides a screen detecting apparatus, including:

the device comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for acquiring a feature vector of each pixel point in a first image, and the first image is an image obtained by shooting a screen to be detected;

and the classification module is used for classifying the pixel points in the first image according to the feature vector of each pixel point in the first image and obtaining the detection result of the screen to be detected according to the classification result.

In a possible implementation manner, the obtaining module is specifically configured to:

determining a plurality of three-dimensional characteristic images according to the first image, wherein the first image comprises M pixel points in the transverse direction, the first image comprises N pixel points in the longitudinal direction, each three-dimensional characteristic image comprises M pixel points in the transverse direction, each three-dimensional characteristic image comprises N pixel points in the longitudinal direction, the number of channels of each three-dimensional characteristic image is C, and C is a preset category number;

and acquiring a feature vector of each pixel point in the first image according to the plurality of three-dimensional feature images.

performing multiple feature extraction processing on the first image to obtain multiple three-dimensional feature images;

the times of feature extraction operations included in each two times of feature extraction processing are different, and the feature extraction operations include convolution operations and sampling operations.

In a possible implementation manner, for any one time of feature extraction processing, the obtaining module is specifically configured to:

performing convolution operation and downsampling operation on the first image to obtain K downsampled feature images, wherein the size of the ith downsampled feature image is

Sequentially taking 1,2, … … and K from the i;

and performing convolution operation and up-sampling operation according to the K down-sampling feature images to obtain the three-dimensional feature image.

performing convolution operation on the first image to obtain a first downsampling characteristic image;

and sequentially performing i times of first operation on the first downsampling feature image to obtain an i +1 th downsampling feature image, wherein i is 1,2, … … and K-1 in sequence, and the first operation comprises downsampling operation and convolution operation.

performing downsampling operation, convolution operation and upsampling operation on the Kth downsampling feature image to obtain a Kth upsampling feature image;

sequentially carrying out merging operation, convolution operation and upsampling operation on the ith up-sampling characteristic sample image and the ith down-sampling characteristic image to obtain an ith-1 up-sampling characteristic image, wherein the i sequentially takes K, K-1, … … and 2;

and performing convolution operation on the first up-sampling image to obtain the three-dimensional characteristic image.

determining a target three-dimensional characteristic image according to pixel values of pixels in each three-dimensional characteristic image, wherein the target three-dimensional characteristic image comprises M pixels in the transverse direction, the target three-dimensional image comprises N pixels in the longitudinal direction, and the number of channels of the target three-dimensional characteristic image is C;

and determining the characteristic vector of each pixel point in the first image according to the pixel value of each pixel point in the target three-dimensional characteristic image.

determining the pixel values of M x N pixels of the x channel of the target three-dimensional characteristic image according to the pixel values of M x N pixels of the x channel in each three-dimensional characteristic image, wherein x is 1,2, … … and C in sequence;

wherein, the pixel value of the (a, b) th pixel point in the x-th channel of the target three-dimensional characteristic image is: the maximum value of pixel values of (a, b) th pixel points in the x-th channel of the three-dimensional feature images is obtained, wherein a is a positive integer smaller than or equal to M, and b is a positive integer smaller than or equal to N.

In a possible implementation manner, for the (a, b) th pixel point in the first image, the obtaining module is specifically configured to:

and determining the characteristic vector of the (a, b) th pixel point in the first image according to the value of the (a, b) th pixel point in the C channels of the target three-dimensional characteristic image.

inputting the first image into a feature extraction model to obtain a feature vector of each pixel point in the first image, wherein the feature extraction model is obtained by learning multiple groups of first samples, each group of first samples comprises a first sample image and the feature vector of the pixel point on the first sample image, and the first sample image is an image obtained by shooting a first sample screen.

In a possible implementation manner, the classification module is specifically configured to:

inputting the feature vectors of the pixel points in the first image into a preset model to obtain the category of each pixel point in the first image, wherein the preset model is obtained by learning multiple groups of second samples, each group of second samples comprises the feature vectors and labeling information of the pixel points on the second sample image, the second sample image is an image obtained by shooting a second sample screen, the labeling information is information labeling the categories of the pixel points on the second sample image, the category of any pixel point in the first image is one of preset categories, and the number of the preset categories is C;

and obtaining a detection result of the screen to be detected according to the category of each pixel point in the first image.

inputting the feature vectors of the pixel points in the first image into the preset model to obtain C-1 first output images and a second output image, wherein each first output image indicates a defect type, the gray value of the (C, d) th pixel point in each first output image i is used for indicating the probability that the defect type of the (C, d) th pixel point on the first image is the defect type indicated by the first output image i, the gray value of the (C, d) th pixel point in the second output image is used for indicating the probability that the type of the (C, d) th pixel point on the first image is normal, C is a positive integer smaller than or equal to M, and d is a positive integer smaller than or equal to N;

and obtaining the category of each pixel point in the first image according to the gray value of each pixel point on the C-1 first output images and the gray value of each pixel point on the second output image.

aiming at the (C, d) th pixel point in the first image, acquiring the gray value of the (C, d) th pixel point on each first output image in C-1 first output images and the gray value of the (C, d) th pixel point on the second output image;

determining a pixel point with the maximum gray value according to the gray value of the (c, d) th pixel point on each first output image and the gray value of the (c, d) th pixel point on the second output image;

if the image where the pixel point with the maximum gray value is located is the second output image, determining that the category of the (c, d) th pixel point in the first image is normal;

otherwise, determining the defect type indicated by the first output image where the pixel point with the maximum gray value is located as the type of the (c, d) th pixel point in the first image.

In a fourth aspect, an embodiment of the present application provides a screen detection model training apparatus, including:

the training module is used for acquiring a training sample, wherein the training sample comprises a sample image and marking information of pixel points in the sample image, and the marking information of the pixel points in the sample image is information marking the types of the pixel points in the sample image;

the processing module is used for inputting the sample image into a screen detection model to obtain the training output category of the pixel points in the sample image;

and the adjusting module is used for adjusting the parameters of the screen detection model according to the training output categories of the pixels in the sample image and the labeling information of the pixels in the sample image until the error between the training output categories of the pixels in the sample image and the labeling information of the pixels in the sample image is less than or equal to a preset error, so that the trained screen detection model is obtained.

In one possible implementation, the screen detection model includes a feature extraction network and a classification network; the processing module is specifically configured to:

carrying out feature extraction processing on the sample image according to the feature extraction network to obtain a plurality of sample three-dimensional feature images, wherein the sample image comprises M pixel points in the transverse direction, the sample image comprises N pixel points in the longitudinal direction, each sample three-dimensional feature image comprises M pixel points in the transverse direction, each sample three-dimensional feature image comprises N pixel points in the longitudinal direction, the number of channels of each sample three-dimensional feature image is C, and C is the number of categories for classifying the pixel points in the sample image;

and processing the plurality of sample three-dimensional characteristic images according to the classification network to obtain the training output categories of the pixel points in the sample images.

In a possible implementation manner, the processing module is specifically configured to:

carrying out multiple times of feature extraction processing on the sample images according to the feature extraction network to obtain a plurality of sample three-dimensional feature images;

In one possible implementation, the feature extraction network includes a convolutional layer, a pooling layer, and an upsampling layer; for any one time of feature extraction processing, the processing module is specifically configured to:

performing convolution operation and down-sampling operation on the sample image according to the convolution layer and the pooling layer to obtain K sample down-sampling feature images, wherein the size of the ith sample down-sampling feature image is

Sequentially taking 1,2, … … and K from the i;

and performing convolution operation and up-sampling operation on the K sample down-sampling feature images according to the convolution layer and the up-sampling layer to obtain the sample three-dimensional feature images.

performing convolution operation on the sample image according to the convolution layer to obtain a first sample downsampling characteristic image;

and sequentially executing i times of first operation on the first sample downsampling feature image according to the convolution layer and the pooling layer to obtain an i +1 th sample downsampling feature image, wherein i is 1,2, … … and K-1 in sequence, and the first operation comprises downsampling operation and convolution operation.

performing down-sampling operation, convolution operation and up-sampling operation on the K sample down-sampling feature image according to the pooling layer, the convolution layer and the up-sampling layer to obtain a K sample up-sampling feature image;

sequentially carrying out merging operation, convolution operation and up-sampling operation on the ith sample up-sampling characteristic sample image and the ith sample down-sampling characteristic image according to the convolution layer and the up-sampling layer to obtain an i-1 sample up-sampling characteristic image, wherein the i sequentially takes K, K-1, … … and 2;

and performing convolution operation on the first sample up-sampling image according to the convolution layer to obtain the sample three-dimensional characteristic image.

In a fifth aspect, an embodiment of the present application provides a screen detecting apparatus, including: a memory storing a computer program and a processor running the computer program to perform the screen detection method according to any one of the first aspect.

In a sixth aspect, an embodiment of the present application provides a screen detection model training apparatus, including: a memory storing a computer program and a processor running the computer program to perform the screen detection model training method according to any one of the second aspect.

In a seventh aspect, an embodiment of the present application provides a screen detecting system, including an image capturing device and a screen detecting device, where:

the image acquisition equipment is used for shooting a screen to be detected to obtain a first image and sending the first image to the screen detection equipment;

the screen detection device is configured to process the first image according to the method of any one of the first aspect to obtain a detection result of the screen to be detected.

In an eighth aspect, embodiments of the present application provide a computer-readable storage medium, which includes a computer program, when executed by one or more processors, implementing the screen detection method of any one of the first aspects, or when executed by one or more processors, implementing the screen detection model training method of any one of the second aspects.

According to the screen detection and screen detection model training method, device and equipment provided by the embodiment of the application, the feature vector of each pixel point in the first image is obtained at first, and then the pixel points in the first image are classified according to the feature vector of each pixel point in the first image to obtain the classification result. Because the first image is an image obtained by shooting the screen to be detected, the detection result of the screen to be detected can be obtained according to the classification result. The detection process is carried out on the pixels in the first image, independent judgment is carried out on the basis of the characteristic vector of each pixel, and pixel-level segmentation of the first image can be achieved, so that pixel-level tiny screen defects in a screen to be detected can be detected, and the accuracy of screen defect detection is improved.

Drawings

FIG. 1 is a schematic diagram of a hybrid pixel provided in an embodiment of the present application;

FIG. 2 is a first diagram illustrating a screen inspection method provided by the prior art;

FIG. 3 is a diagram illustrating a second screen inspection method provided in the prior art;

fig. 4 is a schematic view of an application scenario of a screen detection method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a method for obtaining feature vectors of pixel points according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of feature extraction processing provided in the embodiment of the present application;

FIG. 7 is a schematic diagram of a convolution operation provided in an embodiment of the present application;

FIG. 8 is a schematic diagram of a downsampling operation provided by an embodiment of the present application;

fig. 9 is a schematic diagram of an upsampling operation provided in an embodiment of the present application;

FIG. 10 is a schematic diagram of a three-dimensional feature image of a determined target according to an embodiment of the present application;

fig. 11 is a schematic flowchart of another method for obtaining feature vectors of pixel points according to an embodiment of the present disclosure;

fig. 12 is a schematic flowchart of classifying pixel points in a first image according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of an output image provided by an embodiment of the present application;

fig. 14 is a first schematic view illustrating defect type detection of a pixel point according to an embodiment of the present disclosure;

fig. 15 is a schematic diagram illustrating defect type detection of a pixel point according to an embodiment of the present application;

fig. 16 is a schematic flowchart of a screen detection method according to an embodiment of the present application;

FIG. 17 is a diagram illustrating a screen detection module according to an embodiment of the present application;

FIG. 18 is a flowchart illustrating a screen inspection model training method according to an embodiment of the present disclosure;

FIG. 19 is a schematic diagram of a training sample provided in an embodiment of the present application;

FIG. 20 is a schematic structural diagram of a screen inspection model according to an embodiment of the present application;

FIG. 21 is a schematic diagram of a primary feature extraction provided in an embodiment of the present application;

FIG. 22 is a first image provided by an embodiment of the present application;

FIG. 23A is a schematic view of line defect detection provided in the embodiments of the present application;

FIG. 23B is a first schematic view illustrating point defect detection according to an embodiment of the present disclosure;

FIG. 23C is a schematic diagram illustrating a point defect detection according to an embodiment of the present application;

fig. 24 is a schematic structural diagram of a screen detecting device according to an embodiment of the present application;

FIG. 25 is a schematic structural diagram of a screen test model training apparatus according to an embodiment of the present disclosure;

fig. 26 is a schematic structural diagram of a screen inspection system according to an embodiment of the present application;

fig. 27 is a schematic hardware structure diagram of a screen detecting device according to an embodiment of the present application;

fig. 28 is a schematic hardware structure diagram of a screen inspection model training apparatus according to an embodiment of the present application.

Detailed Description

First, the concept related to the present application will be explained.

A TFT: thin Film Transistor (TFT).

LCD: a Liquid Crystal Display, which is an active matrix Liquid Crystal Display driven by TFTs.

An OLED: organic Light-Emitting Diode (OLED).

Screen Defect (Screen Defect): display screens are typically composed of multiple layers of material and substrates that are bonded together, it is almost impossible to bond all of these layers together with absolute precision each time, and various seams, migrates, contaminants, bubbles or other imperfections may be introduced, for example, defects on LCDs may include: impurities or impurity particles in a liquid crystal matrix, unevenness in distribution of an LCD matrix during a manufacturing process, unevenness in thickness of TFTs, unevenness in a space between substrates, unevenness in brightness distribution of a backlight light source, LCD panel defects, and the like. These factors may cause the light to pass through the display inconsistently, which is inconvenient for the user, and this phenomenon is a screen defect.

Image Segmentation (Image Segmentation): image segmentation refers to the process of subdividing a digital image into a plurality of image sub-regions (sets of pixel points).

Defect Detection (Defect Inspection): generally, the defect of an article is positioned and classified, and the defect detection in the application aims at the defect detection on a screen.

mura defect: the mura defect is a common visual defect in the TFT-LCD, refers to the phenomenon of various traces caused by uneven brightness of a display screen, and is represented as a low-contrast and uneven-brightness area with fuzzy edges, and the area of the mura defect is usually larger than 1 pixel point.

Pixel: also known as pixels or pels, are the smallest units that make up a digitized image.

Mixing pixels: the image signals obtained by the sensor are recorded in units of picture elements, and if only one substance type is contained in one picture element, the picture element is called a pure picture element, however, in most cases, a plurality of substance types are often contained in one picture element, and the picture element is a mixed picture element. For example, for a liquid crystal display, the principle of imaging is due to the activation of liquid crystals of different brightness, each liquid crystal corresponding to a picture element in the liquid crystal display. When the camera is used for shooting the liquid crystal display, each pixel point in the obtained image comprises a plurality of liquid crystals in the liquid crystal display, namely, a plurality of pixel points in the liquid crystal display correspond to, and the plurality of pixel points are types of various substances contained in one pixel element in the image shot by the camera. The mixed picture element will be described below with reference to fig. 1.

Fig. 1 is a schematic diagram of a hybrid image element provided in an embodiment of the present application, and as shown in fig. 1, includes a first image 11 of a screen to be detected and a second image 12 of a camera. In the first imaging 11, each frame is a picture element, i.e. a pixel point, of the screen to be detected. For ease of illustration, each picture element in the first image 11 is differently filled in fig. 1.

In the second imaging 12, each box is a pixel of the camera, i.e. a pixel point. Because the screen to be detected has own resolution, the camera also has own resolution, and when the resolutions of the screen to be detected and the camera are different, the number of pixel points included in the screen to be detected and the number of pixel points included in the imaging of the camera are different under the same imaging size.

The screen to be detected is photographed by using the camera to obtain the image 13, and it can be seen that when the resolution of the screen to be detected and the resolution of the camera are different, a pixel point in the image 13 corresponds to a plurality of pixel points in the first image 11. Taking the pixel 14 in the center of the image 13 as an example, the pixel 14 corresponds to the image of four pixels in the first image 11, that is, one pixel in the image 13 contains multiple material types, which is the mixed pixel problem.

Precision: the accuracy is used for measuring the proportion of the detected screen defects which are real defects.

Recalling: recalls are used to measure the proportion of real screen defects detected.

Halcon: an algorithm library provides various screen detection algorithms.

In the screen production process, the process is complex, the yield is extremely high, and the problem that defective products flow into mobile phones, televisions and other equipment is difficult to avoid. Meanwhile, in the process of assembling screens produced by equipment with screens such as mobile phones and televisions, due to the reason of an assembling process, the screens are likely to be damaged by mistake, so that in the process of manufacturing the equipment such as the mobile phones and the televisions, before and after the screen is assembled, the screens need to be subjected to defect detection, and finished products passing the test flow into the next process, so that defective products are prevented from flowing into the next process. If the problem screens are not detected in time, poor screens flow into the market along with finished products of mobile phones, televisions and other equipment, and the use of various equipment is greatly influenced.

Fig. 2 is a schematic diagram of a screen inspection provided in the prior art, as shown in fig. 2, because gray levels of a defective pixel point and surrounding pixel points are not uniform, a calculation gradient is adopted to perform segmentation of the defective pixel point. Specifically, an image is input, the input image is preprocessed, then gradient image calculation is performed, a connected domain and a connected area are searched, and finally, comparison with a preset threshold value is performed, so that whether each pixel point belongs to a defective pixel point or not is judged.

The main drawbacks of the scheme illustrated in fig. 2 include: firstly, the difference between the defective pixel point and the surrounding pixel points is sometimes not large, and it is difficult to find a proper threshold value to ensure high precision and recall of detection; secondly, the communication area difference of different defect types is huge, the defects of 1-2 pixel points are as small as the defects of thousands of pixel points, even if the defects of the same type (such as scraping, line defects and the like) exist, the occupied area cannot be determined, a proper threshold value is difficult to find, and the precision and the recall ratio are high.

Fig. 3 is a schematic diagram of a screen inspection according to the prior art, and as shown in fig. 3, the screen inspection is performed by comparing the difference between a normal image and a defective image. Specifically, a picture is input first, and then fourier transform is performed to convert the input image into a frequency domain. The defects are high-frequency noise, the noise can be filtered through frequency domain filtering, then non-defect image reconstruction is carried out through inverse Fourier transform, and the defects are found through comparing the difference between an input image and the non-defect image, wherein the defects comprise point mura defects, area-shaped mura defects and linear mura defects.

In recent years, the resolution of a screen is higher and higher, the liquid crystal arrangement of a liquid crystal screen is more and more complex, the resolution of a camera for detection is generally larger than the size of liquid crystal, and the problem of mixed pixels is easily caused during imaging, so that the same screen presents obvious texture characteristics, defective pixels are only represented as local non-uniformity, the gray level difference with pixels in other normal areas is possibly not large, and defects are difficult to separate from a frequency domain. Therefore, the screen detection accuracy rate of the scheme is low, and the production requirement cannot be met.

Due to the production process and the pursuit of ultra-high resolution and ultra-high image quality, the quality of the screen is strictly critical. The existing screen detection scheme still has the problem of low accuracy, the false detection rate and the omission factor are high, manual detection has to be added in order not to influence the product quality, and the manual detection not only influences the production efficiency, but also has the defects of strong subjectivity, inconsistent standards, increased cost and the like.

In order to overcome the above disadvantages and improve the accuracy and recall rate of automatic inspection, an embodiment of the present application provides a screen detection method, which is used for automatically detecting a screen defect and positioning and classifying and labeling the position of the defect.

An application scenario of the embodiment of the present application is described below with reference to fig. 4.

Fig. 4 is a schematic view of an application scenario of the screen inspection method according to the embodiment of the present application, and as shown in fig. 4, the screen inspection method includes a conveyor belt 41, a screen 42, a vision camera 43, a robot arm 44, and a client 45, where the screen 42 is placed on the conveyor belt 41 and moves along with the movement of the conveyor belt 41.

The vision camera 43 is used to photograph the screen 42, and when the screen 42 moves to a predetermined position along with the conveyor belt 41, the vision camera 43 photographs an image of the screen 42 and then transmits the photographed image to the client 45. The client 45 detects defects on the screen 42 from the image sent by the vision camera 43. When a defect is detected on the screen 42, the screen is determined to be defective, and the robot arm 44 is controlled to intercept the defective screen and prevent the defective screen from entering the next process. If the client detects no defects on screen 42, no control instructions are sent to robotic arm 44 to intercept screen 42.

In the scenario illustrated in fig. 4, the visual camera 43 and the client 45 are two independent devices, and in some scenarios, the visual camera 43 and the client 45 may be provided in one device, which is a device having an image capturing function and sufficient processing and computing capabilities.

Since the screen 42 needs to be detected for defects, the vision camera 43 needs to take a corresponding image of the screen 42. When the image captured by the vision camera 43 includes other regions besides the region corresponding to the screen 42, the image needs to be preprocessed to remove the other regions and only reserve the region corresponding to the screen 42, and then the preprocessed image is analyzed to determine whether a screen defect exists on the screen 42.

The application scenario illustrated in fig. 4 can be applied to the screen manufacturing process or before and after the screen assembly process of the screen device, so as to control the quality of the screen. The defect of the screen is detected before the screen is assembled, and the defective screen caused by process problems can be detected. And intercepting the detected defective screen before assembling the screen, assembling the screen without the detected defect, and detecting the defect of the screen again after the assembly is finished. The screen in which the defect is not detected before the screen is assembled and the defect is detected after the screen is assembled can be determined as the defect occurring during the screen assembly.

When the defect detection is performed on the screen 42, the screen 42 may be in a screen-off state or a screen-on state. When the screen 42 is in the off-screen state, the liquid crystal molecules in the screen 42 are not activated, and the detection is mainly performed on whether there is a scratch on the screen 42. When the screen 42 is in a bright state, the liquid crystal molecules in the screen 42 are activated, and at this time, the image of the screen 42 can be captured by the vision camera 43, and the point defect, the line defect, the light leakage defect, and the like in the screen 42 can be detected one by one. In the following embodiments of the present application, the case when the visual camera 43 shoots the screen 42 and the case when the screen 42 is in a bright state are all taken as an example for explanation.

The technical means shown in the present application will be described in detail below with reference to specific examples. It should be noted that the following embodiments may exist independently or may be combined with each other, and description of the same or similar contents is not repeated in different embodiments.

For ease of understanding, two ways of obtaining the feature vector of each pixel point in the first image are first introduced. Fig. 5-10 illustrate an embodiment of obtaining a feature vector of a pixel, and fig. 11 illustrates an embodiment of obtaining a feature vector of a pixel.

Fig. 5 is a schematic flowchart of a method for obtaining a feature vector of a pixel according to an embodiment of the present application, and as shown in fig. 5, the method may include:

s51, determining a plurality of three-dimensional feature images from the first image.

The first image is an image obtained by shooting a screen to be detected. Optionally, when other areas except for the screen to be detected are shot in the shot first image, the shot first image may be preprocessed, and only the image area related to the screen to be detected is reserved.

The size of the first image is M N, namely the first image comprises M pixel points in the transverse direction, the first image comprises N pixel points in the longitudinal direction, and M and N are positive integers larger than 0. In the plurality of three-dimensional characteristic images, each three-dimensional characteristic image also comprises M pixel points in the transverse direction and N pixel points in the longitudinal direction, and the number of channels of each three-dimensional characteristic image is C, wherein C is the number of preset categories. The preset category number C is a preset numerical value, and after the pixels in the first image are classified subsequently, the category of any pixel in the first image is one of the C preset categories.

Optionally, the first image may be subjected to multiple feature extraction processes to obtain multiple three-dimensional feature images, where the feature extraction operations included in each two feature extraction processes are different in number, and the feature extraction operations include a convolution operation and a sampling operation.

The feature extraction processing is performed on the first image for different times, so that features of the first image in different scales can be extracted, and the number of times of performing the feature extraction processing on the first image can be set according to actual needs, which is not particularly limited in the present application. The primary feature extraction process will be explained below.

The number of times of performing the feature extraction processing on the first image may be determined according to actual needs, for example, feature extraction may be performed on the first image 1 time, 2 times, or 3 times, where for any one time of the feature extraction processing, specifically, a convolution operation and a downsampling operation are performed on the first image to obtain K downsampled feature images, and the size of the ith downsampled feature image is equal to that of the first downsampled feature image

And i takes 1,2, … … and K in sequence. And then, performing convolution operation and up-sampling operation according to the K down-sampling feature images to obtain a three-dimensional feature image, wherein K is a positive integer, and the value of K can be preset.

The feature extraction process will be exemplified below with reference to fig. 6.

Fig. 6 is a schematic diagram of feature extraction processing according to an embodiment of the present application, and as shown in fig. 6, a convolution operation is first performed on a first image to obtain a first downsampled feature image.

Then, a first operation is performed on the first downsampled feature image for i times to obtain an i +1 th downsampled feature image, i sequentially takes 1,2, … …, and K-1, and the first operation includes a downsampling operation and a convolution operation. In fig. 6, a first down-sampling operation is performed on a first down-sampled feature image for 1 time to obtain a first scale image, and then a convolution operation is performed on the first scale image to obtain a second down-sampled feature image; and then, performing 1 down-sampling operation on the second down-sampling feature image to obtain a second scale image, and performing 1 convolution operation on the second scale image to obtain a third down-sampling feature image. In fig. 6, K is 3, the size of the first down-sampled feature image is equal to the size of the first image, which is M × N, and the size of the second down-sampled feature image is M × N

The third down-sampled feature image has a size of

After K down-sampling feature images are obtained, performing down-sampling operation, convolution operation and up-sampling operation on the K down-sampling feature image to obtain a K up-sampling feature image. As shown in fig. 6, a downsampling operation is performed on the third downsampled feature image to obtain a third scale image, a convolution operation is performed on the third scale image to obtain a fourth downsampled feature image, and an upsampling operation is performed on the fourth downsampled feature image to obtain a third upsampled feature image.

And sequentially carrying out merging operation, convolution operation and upsampling operation on the ith up-sampling characteristic sample image and the ith down-sampling characteristic image to obtain an ith-1 up-sampling characteristic image, and sequentially taking K, K-1, … … and 2 for i. As shown in fig. 6, the third up-sampled feature image and the third down-sampled feature image are subjected to a merging operation and a convolution operation to obtain a second convolution-merged image, and then the second convolution-merged image is subjected to an up-sampling operation to obtain a second up-sampled feature image. And performing a merging operation on the third down-sampling feature image and the third up-sampling feature image, namely splicing the third down-sampling feature image and the third up-sampling feature image. For example, if the size of the third down-sampled feature image is X × Y × C1 and the size of the third up-sampled feature image is X × Y × C2, the size of the image after the merging operation is performed is X × Y (C1+ C2).

And then carrying out merging operation and convolution operation on the second down-sampling characteristic image and the second up-sampling characteristic image to obtain a first convolution merged image, and then carrying out up-sampling operation on the first convolution merged image to obtain a first up-sampling characteristic image.

And performing convolution operation on the first up-sampling image to obtain a three-dimensional characteristic image. As shown in fig. 6, the first down-sampled feature image and the first up-sampled feature image are finally subjected to a combination operation and a convolution operation to obtain a three-dimensional feature image.

In fig. 6, the convolution operation and the sampling operation on the image are involved, and the convolution operation and the sampling operation on the image will be described below with reference to fig. 7 to 9, respectively.

Fig. 7 is a schematic diagram of a convolution operation provided in an embodiment of the present application, and as shown in fig. 7, the convolution operation includes a left first image and a right convolution kernel, where the first image in the example of fig. 7 is a three-channel image including three channels of RGB, the size of the first image is 8 × 3, where 8 × 8 is the length and width of the first image, and indicates that both the horizontal direction and the vertical direction of the first image include 8 pixel points, 3 indicates three channels of the first image, each pixel point on the first image has a corresponding pixel value on each of the three channels, and fig. 7 only shows 9 pixel values on the upper left corner of one channel of the first image.

The size of the convolution kernel in the example of fig. 7 is 3 × 3, which means that the length and width of the convolution kernel are both 3 and the depth is also 3, wherein the depth of the convolution kernel when performing the convolution operation on the first image needs to be equal to the number of channels of the first image.

The first image is convolved according to a convolution kernel, which comprises 3 weight matrices of 3 x 3, one of which is shown in fig. 7

Because there is no pixel around the pixel point at the top left corner in the first image, the pixel point with pixel value 0 can be filled around the pixel point, and then the value of the first image after convolution processing on the channel can be obtained by multiplying and adding the elements at the corresponding position according to the fact that the center of the weight matrix in the convolution kernel is just opposite to the pixel point at the top left corner in the first image.

For example, in FIG. 7, the weight matrix is used

And carrying out the processing on the pixels in the black frame in the first image:

0*1+0*0+0*(-1)+0*1+2*0+3*(-1)+0*1+3*0+1*(-1)＝-4。

fig. 7 illustrates processing performed on a pixel in the first image according to a weight matrix in the convolution kernel, and when the first image includes multiple channels, similar processing may be performed on the pixel of each channel in the first image according to the weight matrix in the convolution kernel to obtain a first downsampled feature image. It should be noted that, in fig. 7, the pixel value of the pixel point in the first image and the weight matrix in the convolution kernel are both examples, and do not constitute a limitation to the pixel value of the pixel point in the first image and the convolution kernel.

Optionally, the convolution kernel for performing convolution processing on the first image in the convolution operation may include one or more convolution kernels, where if the convolution kernel is one, the number of channels of the obtained first downsampled feature image is 1, and if the convolution kernel is multiple, the number of channels of the obtained first downsampled feature image is multiple, and the number of channels of the first downsampled feature image is equal to the number of convolution kernels.

The process of convolving a first image to obtain a first downsampled feature image is illustrated in fig. 7, and is similar to the example of fig. 7 when convolving other images. When convolution operations are performed on different images, the selected convolution kernels may be different, and the number of the convolution kernels may also be different.

Fig. 8 is a schematic diagram of a downsampling operation according to an embodiment of the present application, and as shown in fig. 8, the downsampling operation includes a first downsampled feature image 81, where the first downsampled feature image 81 is obtained by performing a convolution operation on a first image, and fig. 8 is described by taking only the downsampling operation of the first downsampled feature image 81 as an example.

The first down-sampled feature image 81 is an 8 × 8 image, and includes 64 pixels, and the pixel value of each pixel is labeled in fig. 8. In fig. 8, only the pixel values of one channel per pixel point in the first downsampled feature image 81 are indicated, and if the first downsampled feature image 81 includes a plurality of channels, the pixel values of each channel may be processed in the same manner.

A downsampling operation is performed once on the first downsampled feature image 81, and in fig. 8, every 4 pixels of the first downsampled feature image 81 are converted into one pixel in the first scale image 82. In the downsampling operation, the pixel values of the pixels in the first scale image 82 can be obtained from the pixel values of every four pixels in the first downsampled feature image 81. For example, an average value may be obtained for every four pixel points in the first down-sampling feature image 81, so as to obtain a pixel value of one pixel point in the first scale image 82; the maximum value of every four pixel points in the first down-sampled feature image 81 may also be used as the pixel value of the corresponding pixel point in the first scale image 82. Fig. 8 illustrates that the maximum value of every four pixel points in the first down-sampling feature image 81 is used as the pixel value of the corresponding pixel point in the first scale image 82, so as to obtain the first scale image 82.

For example, in fig. 8, the pixel values of four pixel points at the upper left corner of the first downsampled feature image 81 are sequentially 100, 120, 210, and 110, and the maximum value among the pixel values of the four pixel points is 210, then the pixel value of the corresponding pixel point at the upper left corner in the first scale image 82 at this time is 210.

According to the above method, the pixel value of each pixel point in the first scale image 82 is obtained, and is marked in fig. 8. It can be seen that the first down-sampled feature image 81 is an 8 × 8 image, and the first scale image 82 obtained by the above conversion is an 4 × 4 image.

Fig. 8 illustrates a process of performing a down-sampling operation on a first down-sampled feature image to obtain a first scale image. After the first scale image is obtained, a second downsampled feature image can be obtained by performing convolution processing on the first scale image, and the convolution process is similar to the convolution operation illustrated in fig. 7. After the second downsampling feature image is obtained, a process of obtaining a third downsampling feature image according to the second downsampling feature image until K downsampling feature images are obtained is similar to a process of obtaining the second downsampling feature image according to the first downsampling feature image, and details are not repeated here.

Fig. 9 is a schematic diagram of an upsampling operation provided in the embodiment of the present application, as shown in fig. 9, including a first convolution combined image 91 and a first upsampled feature image 92, where the first upsampled feature image 92 is obtained by performing an upsampling operation on the first convolution combined image 91. The first convolution merged image 91 is a2 x 2 image, each small box represents a pixel, and the pixel values of the pixels are shown in fig. 9. Taking the pixel point at the upper left corner of the first convolution merged image 91 as an example, the pixel point has a pixel value of 5.

After the upsampling operation is performed on the first convolution merged image 91, the pixel point at the upper left corner of the first convolution merged image 91 corresponds to the four pixel points at the upper left corner of the first upsampling feature image 92, and the pixel values of the four pixel points at the upper left corner of the first upsampling feature image 92 are related to the pixel values of the pixel point at the upper left corner of the first convolution merged image 91.

For example, a possible implementation manner is that the pixel values of four pixel points at the upper left corner of the first upsampled feature image 92 are all equal to the pixel values of pixel points at the upper left corner of the first convolution combined image 91, and are all 5, as shown in fig. 9, according to this manner, the pixel values of each pixel point on the first upsampled feature image 92 can be obtained.

It should be noted that the way of obtaining the pixel value of the pixel point after the upsampling operation in the example in fig. 9 is merely an example, and the actual way of obtaining the pixel value may be determined as needed. Fig. 9 illustrates a process of performing an upsampling operation on the first convolution combined image to obtain a first upsampled feature image, and the process of performing the upsampling operation on the ith convolution combined image to obtain the ith upsampled feature image is similar to the example of fig. 9 and is not described herein again.

And S52, acquiring a feature vector of each pixel point in the first image according to the plurality of three-dimensional feature images.

Specifically, a target three-dimensional feature image is determined according to pixel values of pixel points in each three-dimensional feature image, the target three-dimensional feature image comprises M pixel points in the transverse direction, the target three-dimensional image comprises N pixel points in the longitudinal direction, and the number of channels of the target three-dimensional feature image is C. And then, determining a feature vector of each pixel point in the first image according to the pixel value of each pixel point in the target three-dimensional feature image.

Fig. 10 is a schematic diagram of a three-dimensional feature image for determining a target provided in the embodiment of the present application, and as shown in fig. 10, the three-dimensional feature image includes 3 three-dimensional feature images, which are a three-dimensional feature image 101, a three-dimensional feature image 102, and a three-dimensional feature image 103, where the number of channels of each three-dimensional feature image is 3, and each three-dimensional feature image includes 5 pixel points in the horizontal direction and 4 pixel points in the vertical direction.

the pixel value of the (a, b) th pixel point in the x-th channel of the target three-dimensional characteristic image is as follows: the maximum value of pixel values of (a, b) th pixel points in the x-th channel of the three-dimensional characteristic images is larger than the maximum value of pixel values of the (a, b) th pixel points in the x-th channel of the three-dimensional characteristic images, a is a positive integer smaller than or equal to M, and b is a positive integer smaller than or equal to N.

As shown in fig. 10, taking the (1,1) th pixel point on the first channel as an example, the pixel value of the (1,1) th pixel point on the first channel of the three-dimensional feature image 101 is 2, the pixel value of the (1,1) th pixel point on the first channel of the three-dimensional feature image 102 is 73, the pixel value of the (1,1) th pixel point on the first channel of the three-dimensional feature image 103 is 33, and the maximum value among the three pixel values is 73, then the pixel value of the (1,1) th pixel point on the first channel in the target three-dimensional feature image 104 is 73. According to the example of fig. 10, the pixel value of each pixel point on each channel of the target three-dimensional feature image 104 is obtained, so that the target three-dimensional feature image is determined.

After the target three-dimensional feature image is obtained, the feature vector of each pixel point in the first image can be determined according to the pixel value of each pixel point in the target three-dimensional feature image. Specifically, for the (a, b) th pixel point in the first image, the feature vector of the (a, b) th pixel point in the first image is determined according to the value of the (a, b) th pixel point in the C channels of the target three-dimensional feature image. For example, to obtain the feature vector of the (1,1) th pixel point in the first image, the pixel value of the (1,1) th pixel point in each channel of the target three-dimensional feature image is obtained. For example, the target three-dimensional feature image includes 3 channels, the pixel values of the (1,1) th pixel points in the 3 channels are 73, 86, and 46 in sequence, and the feature vector of the (1,1) th pixel point in the first image is (73, 86, 46).

Fig. 5-10 illustrate one method of obtaining feature vectors for pixel points, and another method will be described below.

Fig. 11 is a schematic flowchart of another method for obtaining feature vectors of pixel points according to an embodiment of the present application, and as shown in fig. 11, the method may include:

and S111, acquiring a feature extraction model.

The feature extraction model in the embodiment of the application is obtained by learning multiple groups of first samples, and each group of first samples comprises a first sample image and feature vectors of pixel points in the first sample image. The first sample image is an image obtained by shooting the first sample screen, and if the shot image includes other areas besides the area corresponding to the first sample screen, the first sample image needs to be preprocessed, so that the first sample image only includes the area corresponding to the first sample screen.

And S112, inputting the first image into the feature extraction model to obtain a feature vector of each pixel point in the first image.

After the feature extraction model is trained, the feature extraction model has the function of obtaining feature vectors of pixel points of the image, at the moment, the first image is input into the feature extraction model, and the feature vector of each pixel point in the first image output by the feature extraction model can be obtained.

After the feature vector of each pixel point in the first image is obtained, the pixel points in the first image need to be classified according to the feature vector of the pixel points in the first image, and the process will be described with reference to fig. 12.

Fig. 12 is a schematic flowchart of a process of classifying pixel points in a first image according to an embodiment of the present application, and as shown in fig. 12, the method may include:

s121, inputting the feature vectors of the pixel points in the first image into a preset model to obtain the category of each pixel point in the first image.

The preset model is obtained by learning a plurality of groups of second samples, each group of second samples comprises a second sample image and marking information, the second sample images are images obtained by shooting a second sample screen, and the marking information is information marking the types of pixel points on the second sample images. The second sample image and the first sample image may be the same image or different images.

In the embodiment of the present application, the types of the pixel points are various, for example, the types may be normal types, or may be a point defect type, a line defect type, a light leakage defect type, and the like. The point defect refers to a defect that a region on a screen is small and relates to a plurality of pixel points; the line defect refers to a defect of a long and narrow shape on the screen, and the like.

In the process of training the preset model, in order to enable the preset model to have the function of identifying various different defect types on the screen to be detected, samples containing different defect types need to be adopted for training. The second sample image needs to include different types of pixel points such as point defects and line defects, and also needs to include normal types of pixel points. The category number of the pixel points in the preset model is C, and C is the preset category number.

After determining that the plurality of groups of second samples are obtained, the plurality of groups of second samples may be input to a preset model, and the preset model may learn the plurality of groups of second samples. The multiple groups of second samples comprise multiple second sample images, and the multiple second sample images comprise normal pixel points and abnormal pixel points of different defect types, so that the preset model can learn to obtain the characteristics of the normal pixel points and the characteristics of the pixel points of various defect types. Meanwhile, because the sample comprises labeling information of different defect types, the pixel points can be classified according to the learned characteristics of normal pixel points and the characteristics of the pixel points of various defect types.

After the preset model learns the plurality of groups of second samples, the preset model has a function of classifying the pixel points. The classification of the pixel points is realized according to the characteristics of different pixel points. After the preset model is trained, the feature vectors of the pixel points in the first image are input into the preset model, and the category of each pixel point in the first image output by the preset model can be obtained, wherein the category of each pixel point is one of C preset categories.

Specifically, feature vectors of pixel points in a first image are input into a preset model, and C-1 first output images and a second output image are obtained, wherein each first output image indicates a defect type, the gray value of the (C, d) th pixel point in each first output image i is used for indicating the probability that the defect type of the (C, d) th pixel point on the first image is the defect type indicated by the first output image i, the gray value of the (C, d) th pixel point in the second output image is used for indicating the probability that the type of the (C, d) th pixel point on the first image is normal, C is a positive integer smaller than or equal to M, and d is a positive integer smaller than or equal to N; and then, obtaining the category of each pixel point in the first image according to the gray value of each pixel point on the C-1 first output images and the gray value of each pixel point on the second output image.

In the embodiment of the application, after the feature vectors of the pixel points in the first image are input into the preset model, C-1 first output images and a second output image are obtained, where C is a preset category number, each first output image in the C-1 first output images indicates a defect category, the gray value of the pixel point in each first output image is used for indicating that the defect category of the pixel point at the corresponding position on the first image is the probability of the defect category indicated by the first output image, and the gray value of the pixel point in the second output image is used for indicating that the category of the pixel point at the corresponding position on the first image is the normal probability. And then, obtaining the category of each pixel point in the first image according to the gray value of each pixel point on the C-1 first output images and the gray value of each pixel point on the second output image.

The defect types of the pixel points in the screen include, for example, a point defect, a line defect, and a light leakage defect, and the point defect further includes a black point defect, a white point defect, and the like. For the defect type to be detected, different defect types on the sample image in the sample are marked differently when the preset model is trained. After the preset model training is completed, inputting the feature vector of each pixel point in the first image into the preset model to obtain C-1 first output images and a second output image, wherein C is the preset category number. For example, only detecting a line defect in the first image according to a preset model, wherein the pixel point category comprises a normal category and a line defect category, and C is equal to 2; for example, a white point defect, a black point defect, a line defect, and a light leakage defect in the first image are detected according to a preset model, and if the pixel point categories include normal, white point defect, black point defect, line defect, and light leakage defect, C is equal to 5. After the training of the preset model is finished, C is a fixed value.

Each pixel point in each first output image corresponds to each pixel point in the first image one by one, and each pixel point in the second output image corresponds to each pixel point in the first image one by one. In the embodiment of the application, the first output image and the second output image are processed to a certain extent, the gray value of each pixel point in each first output image reflects the probability that the defect type of the pixel point is the defect type indicated by the first output image, and the gray value of each pixel point in the second output image reflects the probability that the pixel point is a normal pixel point.

This process will be described below with reference to fig. 13.

Fig. 13 is a schematic diagram of an output image provided in an embodiment of the present application, as shown in fig. 13, including three first output images and one second output image, that is, a first output image 131, a first output image 132, a first output image 133, and a second output image 134. In the example of fig. 13, the line defect, the point defect, and the light leakage defect are exemplified. The line defect is a defect of an elongated region on the screen to be detected, and may relate to an abnormality of a plurality of pixels, for example, if there is an abnormality of pixels in a 5 × 200 region on the screen, the region is a line defect. The point defect is a defect of a small area on a screen to be detected, and may relate to an abnormality of several or dozens of pixel points. Light leakage defects generally occur at edge portions of the screen, wherein one possible cause of the light leakage defects is that the edge portions are not compressed when the screen is assembled, causing the light leakage defects to occur. If other types of defects are included in practice, corresponding treatment can be carried out.

In fig. 13, the first output image 131 is an output image obtained by detecting a line defect, and the gray value of each pixel point on the first output image 131 is used to indicate the probability that the defect type of the pixel point is the line defect; the first output image 132 is an output image obtained by detecting point defects, and the gray value of each pixel point on the first output image 132 is used for indicating the probability that the defect type of the pixel point is a point defect; the first output image 133 is an output image obtained by detecting the light leakage defect, and the gray value of each pixel point on the first output image 133 is used for indicating the probability that the defect type of the pixel point is the light leakage defect; the second output image 134 is an output image obtained by detecting normal pixel points, and the gray value of each pixel point on the second output image 134 is used for indicating the probability that the pixel point is a normal pixel point.

Since the range of the gray value of the pixel point is between 0 and 255, a possible implementation manner is to determine the gray value of the pixel point on the first output image or the second output image according to the probability of the category to which the pixel point belongs on the first output image or the second output image. For example, if the probability of a certain pixel being a line defect is 0.8, the probability of a point defect is 0.06, the probability of a light leakage defect is 0.06, and the probability of a normal pixel is 0.08, the gray value of the pixel on the first output image 131 is:

0.8*255＝204；

the gray scale value of the pixel point on the first output image 132 is:

0.06*255＝15.3；

the gray scale value of the pixel point on the first output image 133 is:

0.06*255＝15.3；

the gray scale value of the pixel point on the second output image 134 is:

0.08*255＝20.4。

optionally, rounding the gray value obtained by the calculation may be performed to obtain the gray value on the actual output image.

Fig. 14 is a schematic view illustrating defect type detection of a pixel according to an embodiment of the present application, as shown in fig. 14, the defect type detection includes a first image 140, a first output image 131, a first output image 132, a first output image 133, and a second output image 134, where for convenience of description, each square in fig. 14 represents a pixel, a color of a pixel in the first image 140 does not represent a gray scale of a pixel in the first image, and colors of pixels in the first output image 131, the first output image 132, the first output image 133, and the second output image 134 represent a gray scale of a pixel.

Specifically, for the (C, d) th pixel point in the first image, the gray value of the (C, d) th pixel point on each first output image in the C-1 first output images and the gray value of the (C, d) th pixel point on the second output image are obtained. And then, determining the pixel point with the maximum gray value according to the gray value of the (c, d) th pixel point on each first output image and the gray value of the (c, d) th pixel point on the second output image.

Taking four pixel points in the first image 140 as an example, to determine the category of each pixel point on the first image 140, three corresponding pixel points in the first output image and one corresponding pixel point in the second output image are found. As shown in fig. 14, it is now necessary to detect the defect types of the pixel points A, B, C and D in the first image 140. First, according to the positions A, B, C and D in the first image 140, the corresponding pixel points a1, B1, C1 and D1 in the first output image 131, the corresponding pixel points a2, B2, C2 and D2 in the first output image 132, the corresponding pixel points A3, B3, C3 and D3 in the first output image 133, and the corresponding pixel points a4, B4, C4 and D4 in the second output image 134 are determined, respectively.

If the image where the pixel point with the maximum gray value is located is the second output image, determining that the category of the (c, d) th pixel point in the first image is normal; otherwise, determining the defect type indicated by the first output image where the pixel point with the maximum gray value is located as the type of the (c, d) th pixel point in the first image.

Fig. 15 is a schematic diagram illustrating defect type detection of a pixel point b provided in the embodiment of the present application, and as shown in fig. 15, for a pixel point b, four corresponding pixel points a1, a2, A3, and a4 in an output image are respectively, and gray values of the four pixel points are sequentially 200, 12, 15, and 28. Of the gray values of the four pixels, the gray value of a1 is the highest, a1 is a pixel on the first output image 131, and the first output image 131 is an output image obtained by detecting a line defect, so that the probability that the pixel a is a line defect is the largest, and the category of the pixel a is determined to be the line defect.

Similarly, for the pixel point B, the four corresponding pixel points in the output image are respectively B1, B2, B3 and B4, and the gray values of the four pixel points are 20, 196, 10 and 24 in sequence. Among the gray values of the four pixels, the gray value of B2 is the highest, B2 is a pixel on the first output image 132, and the first output image 132 is an output image obtained by detecting a point defect, so that the probability that the pixel is a point defect is the highest, and the category of the pixel is determined to be the point defect.

For the pixel point C, the four corresponding pixel points in the output image are respectively C1, C2, C3 and C4, and the gray values of the four pixel points are 5, 10, 216 and 24 in sequence. Of the gray values of the four pixels, the gray value of C3 is the highest, C3 is a pixel on the first output image 133, and the first output image 133 is an output image obtained by detecting the light leakage defect, so that the probability that the pixel is the light leakage defect is the largest, and the category of the pixel is determined to be the light leakage defect.

For the pixel point D, four pixel points corresponding to the pixel point D in the output image are D1, D2, D3 and D4, and the gray values of the four pixel points are 5, 10, 20 and 220 in sequence. Of the gray values of the four pixels, the gray value of D4 is the highest, D4 is a pixel on the second output image 134, and the second output image 134 is an output image obtained by detecting a normal type pixel, so that the probability that the pixel is of the normal type is the highest, and the type of the pixel is determined to be normal.

For the defect types of the pixel points, the examples in fig. 13 to fig. 15 are described by taking the line defect, the point defect, and the light leakage defect as examples, and in an actual situation, if other types of defects are included, such as a black point defect and a white point defect in the point defects, the processing manner is similar to this, and details are not described here again.

The feature vector of each pixel point in the first image indicated in S121 may be obtained by the method in the embodiments illustrated in fig. 5 to 10, or may be obtained by the method in the embodiment illustrated in fig. 11.

It should be noted that, if the method in the embodiment illustrated in fig. 11 is selected to obtain the feature vector of each pixel point in the first image, the feature vector of each pixel point in the first image is obtained through the feature extraction model, and then the category of each pixel point in the first image is obtained through the preset model. Optionally, in this embodiment of the application, the feature extraction model and the preset model may be trained as one model.

And S122, obtaining a detection result of the screen to be detected according to the category of each pixel point in the first image.

After the category of each pixel point in the first image is obtained according to the above manner, the specific defect and the corresponding defect category in the first image can be obtained, and then the detection result of the screen to be detected is correspondingly obtained, including the number of defects existing on the screen to be detected, the defect category of each defect, the specific position of each defect on the screen to be detected, and the like.

The screen detection method provided by the embodiment of the application comprises the steps of firstly obtaining the feature vector of each pixel point in a first image, and then classifying the pixel points in the first image according to the feature vector of each pixel point in the first image to obtain a classification result. Because the first image is an image obtained by shooting the screen to be detected, the detection result of the screen to be detected can be obtained according to the classification result. The detection process is carried out on the pixels in the first image, independent judgment is carried out on the basis of the characteristic vector of each pixel, and pixel-level segmentation of the first image can be achieved, so that pixel-level tiny screen defects in a screen to be detected can be detected, and the accuracy of screen defect detection is improved. The scheme can be used for carrying out automatic defect detection on the screen, does not need manual preset threshold values, does not need manual intervention, does not need resampling on the image, and is not limited by the size of the image and the image acquisition equipment.

Next, a screen inspection method will be described with reference to fig. 16.

Fig. 16 is a schematic flowchart of a screen detection method provided in an embodiment of the present application, and as shown in fig. 16, the method may include:

s161, obtaining a feature vector of each pixel point in a first image, wherein the first image is an image obtained by shooting a screen to be detected.

It should be noted that the feature vector of each pixel point in the first image may be obtained by the method shown in the embodiments of fig. 5 to fig. 10, or the feature vector of each pixel point in the first image may be obtained by the method shown in fig. 11, which is not described herein again.

S162, classifying the pixel points in the first image according to the feature vector of each pixel point in the first image, and obtaining the detection result of the screen to be detected according to the classification result.

It should be noted that the method shown in the embodiments of fig. 12 to 15 may be used to classify the pixel points in the first image to obtain the detection result of the screen to be detected, and details are not repeated here.

Optionally, the scheme provided in the embodiment of the present application mainly includes four modules, and fig. 17 is a schematic diagram of a screen detection module provided in the embodiment of the present application, and as shown in fig. 17, the scheme includes a data module, a feature extraction module, a training module, and a prediction module, where,

the data module is used for acquiring data, specifically, the data module acquires an image of a sample screen and provides annotation information of the image, and the annotation information can be, for example, a mask image with the same size as the image, and marks corresponding defect types at corresponding defect positions according to specified colors.

The feature extraction module is configured to extract a feature vector of each pixel point on the first image, where the manner of extracting the feature vector of the pixel point is as described in the above embodiment, and each pixel point corresponds to one feature vector.

The training module is used for training a preset model according to the feature vector of each pixel point on the image and the mask map labeled with the defect type, so that the multi-scale feature vector of each pixel point can be output as a correct defect type through the preset model.

The prediction module is used for predicting according to the characteristic vectors of the pixel points and the trained preset model.

And inputting the characteristic vectors of the pixel points in the first image into the preset model by using the preset model trained by the training module to obtain the category of each pixel point in the first image, and obtaining the defect category of the screen to be detected according to the category of each pixel point in the first image. If the mode adopted by the training model is that the feature vector of the pixel point is obtained by the feature extraction model, the training model can execute the function of extracting the feature vector of the pixel point and classifying the pixel point according to the feature vector. At the moment, the first image is input into the feature extraction model to obtain a feature vector of each pixel point in the first image, the feature vectors are classified by adopting a preset model, and category information is output and used for predicting the defect category of the first image.

Fig. 18 is a schematic flowchart of a screen inspection model training method provided in the embodiment of the present application, and as shown in fig. 18, the method includes:

s181, a training sample is obtained, wherein the training sample comprises a sample image and labeling information of pixel points in the sample image, and the labeling information of the pixel points in the sample image is information labeling the category of the pixel points in the sample image.

Before training the screen test model, training samples are first obtained. In the embodiment of the application, the training samples comprise one or more groups, each group of training samples comprises a sample image and marking information of pixel points in the sample image, and the marking information marks the category of the pixel points in the sample image.

Fig. 19 is a schematic diagram of a training sample provided in an embodiment of the present application, as shown in fig. 19, the left side is a sample image 191, the sample image 191 is an image captured on a sample screen, the right side is a defect mask 192 of the sample image 191, the defect mask 192 is labeled with defect information on the sample image 191, and the defect information is labeled in different ways for different defects. The sample image 191 and the defect mask map 192 constitute a set of training samples.

And S182, inputting the sample image into a screen detection model to obtain the training output category of the pixel points in the sample image.

After the training samples are obtained, the sample images are input into a screen detection model aiming at a group of training samples, the screen detection model can process the sample images, such as convolution processing and sampling processing, and then training output categories of pixel points in the sample images are obtained.

And S183, adjusting parameters of the screen detection model according to the training output categories of the pixels in the sample image and the labeling information of the pixels in the sample image until the error between the training output categories of the pixels in the sample image and the labeling information of the pixels in the sample image is less than or equal to a preset error, and obtaining the trained screen detection model.

The training output category is a category obtained by classifying the pixel points in the sample image by the screen detection model, and the training output category may have an error with the category of the pixel points in the sample image labeled in the labeling information, so that the parameters of the screen detection model are adjusted according to the training output category of the pixel points in the sample image and the labeling information of the pixel points in the sample image. And adjusting parameters in the screen detection model for multiple times, so that the trained screen detection model is obtained when the error between the training output category of the pixel points in the sample image and the labeling information of the pixel points in the sample image is less than or equal to a preset error.

Optionally, the screen detection model in the embodiment of the present application includes a feature extraction network and a classification network, where the feature extraction network is a feature extraction model of a multi-scale U-Net structure. Fig. 20 is a schematic structural diagram of a screen detection model provided in an embodiment of the present application, and as shown in fig. 20, a feature extraction network is formed by a multi-scale U-Net network structure, in which a sample image is subjected to multiple feature extraction processes to obtain a plurality of sample three-dimensional feature images, where the sample image includes M pixel points in a horizontal direction, the sample image includes N pixel points in a vertical direction, each sample three-dimensional feature image includes M pixel points in the horizontal direction, each sample three-dimensional feature image includes N pixel points in the vertical direction, the number of channels of each sample three-dimensional feature image is C, and C is the number of categories for classifying the pixel points in the sample image. Each branch in fig. 20 is a feature extraction process. And after obtaining a plurality of sample three-dimensional characteristic images, processing the plurality of sample three-dimensional characteristic images according to the classification network to obtain the training output categories of the pixel points in the sample images.

Optionally, feature extraction processing may be performed on the sample image multiple times according to the feature extraction network to obtain multiple sample three-dimensional feature images, where the feature extraction operations included in each two times of feature extraction processing are different in number, and the feature extraction operations include convolution operation and sampling operation.

The feature extraction network comprises a convolution layer, a pooling layer and an upper sampling layer; for any one time of feature extraction processing, convolution operation and down-sampling operation can be performed on the sample image according to the convolution layer and the pooling layer to obtain K sample down-sampling feature images, wherein the size of the ith sample down-sampling feature image is

i, sequentially taking 1,2, … … and K; and then, performing convolution operation and up-sampling operation on the K sample down-sampling feature images according to the convolution layer and the up-sampling layer to obtain a sample three-dimensional feature image.

An optional implementation manner for obtaining the K sample downsampling feature images is to perform convolution operation on the sample images according to the convolution layer to obtain a first sample downsampling feature image; and sequentially performing i times of first operation on the first sample downsampling feature image according to the convolutional layer and the pooling layer to obtain an i +1 th sample downsampling feature image, wherein i is 1,2, … … and K-1 in sequence, and the first operation comprises downsampling operation and convolution operation.

The first column of boxes in fig. 20 represents a convolution of 1x1 for upscaling and adding non-linearity. After the feature extraction is performed for a plurality of times, global pooling is performed, the results of each feature extraction are fused and output as a final feature map, and f (z) in fig. 20 is the result of classifying and outputting the pixel points.

After K down-sampling feature images are obtained, optionally, down-sampling operation, convolution operation and up-sampling operation can be performed on the K sample down-sampling feature image according to the pooling layer, the convolution layer and the up-sampling layer to obtain a K sample up-sampling feature image; then, carrying out combination operation, convolution operation and up-sampling operation on the ith sample up-sampling feature sample image and the ith sample down-sampling feature image in sequence according to the convolution layer and the up-sampling layer to obtain an i-1 sample up-sampling feature image, and sequentially taking K, K-1, … …,2 for i; and performing convolution operation on the first sample up-sampling image according to the convolution layer to obtain a sample three-dimensional characteristic image.

Although U-Net has the capability of extracting partial multi-scale features, the single U-Net structure has difficulty in achieving good effect on the problem of large scale difference such as screen defects. Therefore, the embodiment of the application designs a multi-scale U-Net combined feature extraction network. The characteristic extraction network is utilized to carry out characteristic extraction processing on the sample image for multiple times, and the characteristic vectors of the pixel points of the sample image under different receptive fields are extracted to distinguish defect types, so that the problem of extracting the characteristic vectors of various defect types can be solved. And finally, performing global pooling on the plurality of sample three-dimensional characteristic images, and classifying pixel points in the sample images by using a classification network.

Fig. 21 is a schematic diagram of a feature extraction process provided in the embodiment of the present application, and as shown in fig. 21, the feature extraction process is composed of two parts, the first part (left side) is a feature extraction part of U-Net, and is composed of an underlying convolution network, and this part has a new scale every time it passes through a pooling layer. The second part (the right side) is an up-sampling part of the U-Net, and each time up-sampling is carried out, the up-sampling part and the U-Net feature extraction part are fused and spliced in the same scale as the number of channels corresponding to the U-Net feature extraction part, so that features of the scale can be reserved to the next layer as much as possible. Through multi-level feature extraction, the information of the original sample image can be kept as much as possible, so that the subsequent classification is more accurate.

To verify the effect of the solution of the present application, the following test was performed.

The first table is a defect statistical table detected by using the screen detection scheme provided by the embodiment of the application, and as shown in the first table, through data testing of a production line, the obtained statistical results are as follows:

watch 1

In table one, four defects, namely black dots, white dots, lines and light leakage, are poor in Halcon detection capability of the conventional visual detection algorithm. Black point and light leak, Halcon does not have the detectability, white point and line defect, and 3 and 5 have been detected respectively to Halcon, are far less than the scheme of this application. Since light leakage belongs to subjective defects and is influenced by the marked samples, precision information is more important.

Fig. 22 is a first image obtained by shooting a screen to be detected with a camera according to an embodiment of the present application. And processing the first image to obtain a defect detection result on the screen to be detected. Fig. 23A is a schematic view of line defect detection provided in the embodiment of the present application, fig. 23B is a schematic view of point defect detection provided in the embodiment of the present application, and fig. 23C is a schematic view of point defect detection provided in the embodiment of the present application.

Fig. 24 is a schematic structural diagram of a screen detecting apparatus according to an embodiment of the present application, and as shown in fig. 24, the screen detecting apparatus 24 may include an obtaining module 241 and a classifying module 242, where:

the obtaining module 241 is configured to obtain a feature vector of each pixel point in a first image, where the first image is an image obtained by shooting a screen to be detected;

the classification module 242 is configured to classify the pixel points in the first image according to the feature vector of each pixel point in the first image, and obtain a detection result of the screen to be detected according to the classification result.

In a possible implementation manner, the obtaining module 241 is specifically configured to:

In a possible implementation manner, for any feature extraction process, the obtaining module 241 is specifically configured to:

Sequentially taking 1,2, … … and K from the i;

In a possible implementation manner, for the (a, b) th pixel point in the first image, the obtaining module 241 is specifically configured to:

In a possible implementation manner, the classification module 242 is specifically configured to:

It should be noted that the screen detection apparatus shown in the embodiment of the present application can execute the screen detection method shown in the above method embodiment, and the implementation principle and the beneficial effect thereof are similar, and are not described herein again.

Fig. 25 is a schematic structural diagram of a screen test model training apparatus according to an embodiment of the present application, and as shown in fig. 25, the screen test model training apparatus 25 includes a training module 251, a processing module 252, and an adjusting module 253, where:

the training module 251 is configured to obtain a training sample, where the training sample includes a sample image and labeling information of a pixel point in the sample image, and the labeling information of the pixel point in the sample image is information labeling a category of the pixel point in the sample image;

the processing module 252 is configured to input the sample image to a screen detection model, and obtain a training output category of a pixel point in the sample image;

the adjusting module 253 is configured to adjust parameters of the screen detection model according to the training output categories of the pixels in the sample image and the labeling information of the pixels in the sample image, until an error between the training output categories of the pixels in the sample image and the labeling information of the pixels in the sample image is less than or equal to a preset error, to obtain a trained screen detection model.

In one possible implementation, the screen detection model includes a feature extraction network and a classification network; the processing module 252 is specifically configured to:

In a possible implementation manner, the processing module 252 is specifically configured to:

In one possible implementation, the feature extraction network includes a convolutional layer, a pooling layer, and an upsampling layer; for any feature extraction process, the processing module 252 is specifically configured to:

Sequentially taking 1,2, … … and K from the i;

The screen detection model training device shown in the embodiment of the application can execute the screen detection model training method shown in the embodiment of the method, the realization principle and the beneficial effect are similar, and the details are not repeated here.

Fig. 26 is a schematic structural diagram of a screen detection system provided in an embodiment of the present application, and as shown in fig. 26, the screen detection system includes an image capturing device 261 and a screen detection device 262, where:

the image acquisition device 261 is configured to shoot a screen to be detected, obtain a first image, and send the first image to the screen detection device 262;

the screen detecting device 262 is configured to process the first image according to the screen detecting method in the foregoing embodiment, so as to obtain a detecting result of the screen to be detected.

The image capturing device 261 is a device for capturing a first image of the screen to be detected, and the image capturing device 261 may be a camera or other device with image capturing capability. The screen detection device 262 is used to perform the actions performed by the acquisition module and classification model in the example of fig. 24. In some embodiments, the image capturing device 261 and the screen detecting device 262 are two independent devices, and in some embodiments, the image capturing device 261 and the screen detecting device 262 may also be integrated into one device, which is not limited in this embodiment.

Fig. 27 is a schematic hardware structure diagram of a screen detecting device according to an embodiment of the present application. Referring to fig. 27, the screen detecting apparatus 270 includes: a memory 271 and a processor 272, wherein the memory 271 and the processor 272 communicate; illustratively, the memory 271 and the processor 272 may communicate via the communication bus 273, the memory 271 being used for storing a computer program, the processor 272 executing the computer program to implement the steps of the screen detection method in the above embodiments. Reference may be made in particular to the description relating to the method embodiments described above.

Alternatively, the memory 271 may be separate or integrated with the processor 272.

The screen detection device provided in this embodiment may be used to execute the screen detection method in the foregoing embodiments, and the implementation principle and technical effect are similar, which are not described herein again.

Optionally, the Processor may be a Central Processing Unit (CPU), or may be another general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in the incorporated application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor.

Fig. 28 is a schematic hardware structure diagram of a screen inspection model training apparatus according to an embodiment of the present application. Referring to fig. 28, the screen test model training apparatus 280 includes: memory 281 and processor 282, wherein memory 281 and processor 282 are in communication; illustratively, the memory 281 and the processor 282 may be in communication via a communication bus 283, the memory 281 being configured to store computer programs, and the processor 282 executing the computer programs to implement the steps of the screen detection model training method in the above-described embodiments. Reference may be made in particular to the description relating to the method embodiments described above.

Alternatively, the memory 281 may be separate or integrated with the processor 282.

The screen test model training device provided in this embodiment may be used to execute the screen test model training method in the foregoing embodiments, and the implementation principle and technical effect are similar, which are not described herein again.

The present application provides a computer-readable storage medium for storing a computer program for implementing the screen detection method described in the above embodiment, or implementing the screen detection model training method described in the above embodiment.

An embodiment of the present application further provides a chip or an integrated circuit, including: a memory and a processor;

the memory for storing program instructions and sometimes intermediate data;

the processor is configured to call the program instructions stored in the memory to implement the screen detection method or the screen detection model training method as described above.

Alternatively, the memory may be separate or integrated with the processor. In some embodiments, the memory may also be located outside of the chip or integrated circuit.

An embodiment of the present application further provides a program product, where the program product includes a computer program, where the computer program is stored in a storage medium, and the computer program is used to implement the screen detection method or the screen detection model training method described above.

All or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The aforementioned program may be stored in a readable memory. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned memory (storage medium) includes: read-only memory (ROM), RAM, flash memory, hard disk, solid state disk, magnetic tape (magnetic tape), floppy disk (flexible disk), optical disk (optical disk), and any combination thereof.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

In the present application, the terms "include" and variations thereof may refer to non-limiting inclusions; the term "or" and variations thereof may mean "and/or". The terms "first," "second," and the like in this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. In the present application, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Claims

1. A screen detection method, comprising:

acquiring a feature vector of each pixel point in a first image, wherein the first image is an image obtained by shooting a screen to be detected;

classifying the pixel points in the first image according to the feature vector of each pixel point in the first image, and obtaining the detection result of the screen to be detected according to the classification result.

2. The method of claim 1, wherein obtaining the feature vector of each pixel point in the first image comprises:

3. The method of claim 2, wherein determining a plurality of three-dimensional feature images from the first image comprises:

4. The method according to claim 3, wherein performing feature extraction processing on the first image to obtain the three-dimensional feature image for any one time of feature extraction processing comprises:

Sequentially taking 1,2, … … and K from the i;

5. The method of claim 4, wherein performing a convolution operation and a downsampling operation on the first image to obtain K downsampled feature images comprises:

6. The method according to claim 4 or 5, wherein performing a convolution operation and an upsampling operation on the K downsampled feature images to obtain the three-dimensional feature image comprises:

7. The method according to any one of claims 2 to 6, wherein obtaining the feature vector of each pixel point in the first image according to the plurality of three-dimensional feature images comprises:

8. The method of claim 7, wherein determining the target three-dimensional feature image according to pixel values of pixel points in each three-dimensional feature image comprises:

9. The method of claim 8, wherein determining the feature vector of the (a, b) th pixel point in the first image according to the pixel value of each pixel point in the target three-dimensional feature image for the (a, b) th pixel point in the first image comprises:

10. The method of claim 1, wherein obtaining the feature vector of each pixel point in the first image comprises:

inputting the first image into a feature extraction model to obtain a feature vector of each pixel point in the first image, wherein the feature extraction model is obtained by learning multiple groups of first samples, each group of first samples comprises a first sample image and the feature vector of the pixel point in the first sample image, and the first sample image is an image obtained by shooting a first sample screen.

11. The method according to any one of claims 2 to 10, wherein classifying the pixel points in the first image according to the feature vector of each pixel point in the first image, and obtaining the detection result of the screen to be detected according to the classification result comprises:

12. The method of claim 11, wherein inputting the feature vectors of the pixels in the first image into a preset model to obtain the category of each pixel in the first image comprises:

13. The method of claim 12, wherein obtaining the category of each pixel point in the first image according to the gray-level value of each pixel point in the C-1 first output images and the gray-level value of each pixel point in the second output image comprises:

14. A screen detection model training method is characterized by comprising the following steps:

acquiring a training sample, wherein the training sample comprises a sample image and marking information of pixel points in the sample image, and the marking information of the pixel points in the sample image is information marking the category of the pixel points in the sample image;

inputting the sample image into a screen detection model to obtain the training output category of the pixel points in the sample image;

and adjusting parameters of the screen detection model according to the training output categories of the pixel points in the sample image and the labeling information of the pixel points in the sample image until the error between the training output categories of the pixel points in the sample image and the labeling information of the pixel points in the sample image is less than or equal to a preset error, and obtaining the trained screen detection model.

15. The method of claim 14, wherein the screen detection model comprises a feature extraction network and a classification network; inputting the sample image into a screen detection model to obtain the training output category of the pixel points in the sample image, wherein the training output category comprises the following steps:

16. The method of claim 15, wherein performing feature extraction on the sample image according to the feature extraction network to obtain a plurality of sample three-dimensional feature images comprises:

17. The method of claim 16, wherein the feature extraction network comprises a convolutional layer, a pooling layer, and an upsampling layer; for any one time of feature extraction processing, performing feature extraction processing on the sample image according to the feature extraction network to obtain the sample three-dimensional feature image, including:

Sequentially taking 1,2, … … and K from the i;

18. The method of claim 17, wherein performing a convolution operation and a downsampling operation on the sample image according to the convolution layer and the pooling layer to obtain K sample downsampled feature images comprises:

19. The method of claim 17 or 18, wherein performing a convolution operation and an upsampling operation on the K sample downsampled feature images according to the convolution layer and the upsampling layer to obtain the sample three-dimensional feature image comprises:

20. A screen detecting apparatus, comprising:

21. The apparatus of claim 20, wherein the obtaining module is specifically configured to:

22. The apparatus of claim 21, wherein the obtaining module is specifically configured to:

23. The apparatus of claim 22, wherein for any one feature extraction process, the obtaining module is specifically configured to:

Sequentially taking 1,2, … … and K from the i;

24. The apparatus of claim 23, wherein the obtaining module is specifically configured to:

25. The apparatus according to claim 23 or 24, wherein the obtaining module is specifically configured to:

26. The apparatus according to any one of claims 21 to 25, wherein the obtaining module is specifically configured to:

27. The apparatus of claim 26, wherein the obtaining module is specifically configured to:

28. The apparatus of claim 27, wherein for an (a, b) th pixel point in the first image, the obtaining module is specifically configured to:

29. The apparatus of claim 20, wherein the obtaining module is specifically configured to:

30. The apparatus according to any of claims 21-29, wherein the classification module is specifically configured to:

31. The apparatus of claim 30, wherein the classification module is specifically configured to:

32. The apparatus of claim 31, wherein the classification module is specifically configured to:

33. A screen test model training device, comprising:

34. The apparatus of claim 33, wherein the screen detection model comprises a feature extraction network and a classification network; the processing module is specifically configured to:

35. The apparatus of claim 34, wherein the processing module is specifically configured to:

36. The apparatus of claim 35, wherein the feature extraction network comprises a convolutional layer, a pooling layer, and an upsampling layer; for any one time of feature extraction processing, the processing module is specifically configured to:

Sequentially taking 1,2, … … and K from the i;

37. The apparatus of claim 36, wherein the processing module is specifically configured to:

38. The apparatus according to claim 36 or 37, wherein the processing module is specifically configured to:

39. A screen detecting apparatus, comprising: a memory storing a computer program and a processor running the computer program to perform the screen detecting method according to any one of claims 1 to 13.

40. A screen test model training apparatus, comprising: a memory storing a computer program and a processor running the computer program to perform the screen test model training method of any one of claims 14-19.

41. A screen inspection system comprising an image capture device and a screen inspection device, wherein:

the screen detection device is used for processing the first image according to the method of any one of claims 1 to 13 to obtain a detection result of the screen to be detected.

42. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a computer program which, when executed by one or more processors, implements the screen detection method of any one of claims 1-13, or which, when executed by one or more processors, implements the screen detection model training method of any one of claims 14-19.