CN117764998A

CN117764998A - Image segmentation model with multichannel parallel input

Info

Publication number: CN117764998A
Application number: CN202311749684.0A
Authority: CN
Inventors: 刁晓淳; 王文瑞
Original assignee: Shanghai Corelli Software Co ltd
Current assignee: Shanghai Corelli Software Co ltd
Priority date: 2023-12-19
Filing date: 2023-12-19
Publication date: 2024-03-26

Abstract

The invention relates to a multi-channel parallel input image segmentation model which comprises an input image group, a network architecture and an output image, wherein the input image group is an image group formed by a plurality of different channel images, the output image is a segmentation result with the same resolution as the input image, and the network architecture comprises a feature analysis module, a multi-channel semantic synthesis module and a comprehensive analysis module. The image segmentation model input by the multichannel parallel connection can be combined with multiple types of images, so that defects can be effectively and comprehensively identified.

Description

Image segmentation model with multichannel parallel input

Technical Field

The invention relates to the field of image detection, in particular to a multi-channel parallel input image segmentation model.

Background

In the field of industrial automation quality inspection based on image detection algorithms, particularly in the manufacturing process of liquid crystal panels, it is often necessary to switch different types of light sources to image the same product. Meanwhile, imaging is needed for multiple times at the same position in different processes and different process sections of the same product, and finally, multiple different image states are combined to jointly confirm whether defects exist or not and whether the products are good products or not.

The defect that can be judged by combining a plurality of different states is often a defect that is important in a process section and affects the structural parameters and basic functions of the whole product. Such defects often have the characteristics of difficult positioning, difficult detection, difficult joint judgment, and the like. For the detection of such defects, the yield of the product is directly affected.

The image detection algorithm commonly used in the industry is a detection algorithm for single station, single shooting scene and single image, and is difficult to process multiple images at the same position in parallel or a corresponding method is used for processing different images once, so that logic summarization is performed on the result.

The existing detection modes are divided into two types, one is manual detection, namely images with multiple fields of view are displayed on a manual detection software interface at the same time, and the defect position is judged through comprehensive analysis by manual overview of all the images. And the other is to detect by using an image algorithm, independently calculate defect results by using the algorithm for each image, finally collect the defect results of all images, and comprehensively obtain the unique result of the defect according to the formula or according to the priority logic.

The disadvantage of the personnel operation scheme is that an operator needs to be matched with the detection station so as to respond to the defect identification requirement in real time, so that labor cost is wasted, and meanwhile, certain errors can be caused due to reasons of personnel fatigue, poor positioning effect and the like. The multi-channel defect is a defect which is judged by experience of a graph judging person in comparison. Such defects are greatly affected by personnel factors in the process of determination. For the same set of images, different operators can comprehensively judge according to personal experience, and the results are often inconsistent. When the existing field is manually determined, a graph determining person with abundant experience is usually required to be used as a graph determining group leader to perform the spot check and recheck on the manual detection result, so that great labor cost waste is caused. For partial defects with inconsistent judgment, corresponding yield problems such as over-killing and omission can also exist.

The visual detection method based on a single image has two major disadvantages, namely a plurality of defects, and the defect judgment is carried out by integrating multiple different images, so that the algorithm design can be carried out. However, the algorithm based on the single image cannot realize the function, and only the defect characteristic of the image can be found in the image, and the characteristic has limitation. Secondly, in the final defect synthesis link, the logic of defect synthesis is preset and solidified in the program. The adjustment cannot be made according to the actual situation, and the correctness of the preset rule and the correctness of the logic priority are to be questioned. Generally, the detection method of a single image still has corresponding yield problems such as over-killing and missing detection.

Therefore, how to effectively combine each photographing position, each different light source and the image under each process section to comprehensively judge the defects is one of the important difficulties in the image quality inspection industry.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides a multi-channel parallel input image segmentation model which can be combined with multiple types of images to effectively and comprehensively identify defects.

In order to achieve the above purpose, the technical solution of the present invention is as follows:

the image segmentation model comprises an input image group, a network architecture and an output image, wherein the input image group is an image group formed by a plurality of different channel images, the output image is a segmentation result with the same resolution as the input image, and the network architecture comprises a feature analysis module, a multi-channel semantic synthesis module and a comprehensive analysis module.

As an optimal technical scheme, the feature analysis module performs independent calculation on each single-channel image, each single-channel image is respectively sent into each sub-network model according to high-low dimensional semantic features and image features, and finally one single-channel image can output two types of features, one is the high-low dimensional semantic features, and the other is the image features.

As a preferred technical solution, the high-low dimensional semantic feature is essentially a long-column feature vector, which contains high-low dimensional information representative of images, and the image feature is a multi-channel tensor.

As a preferable technical scheme, the multi-channel semantic synthesis module adopts twenty parallel semantic synthesis modules with the same function, three images are selected in the six-channel images as image features, three channels are used as high-low dimensional semantic features for calculation, and the obtained twenty semantic synthesis results are input to the comprehensive analysis module.

As an optimal technical scheme, the single semantic integration module consists of a plurality of SPADE sub-modules, and the SPADE sub-modules fuse high-low dimensional semantic features with image features in a mode of convolution and addition for a plurality of times.

As a preferable technical scheme, the comprehensive analysis module splices the characteristics of all channels into a large-size tensor, and performs dimension-reducing cavity convolution on the large-size tensor until a single-channel mask image is formed, namely, the result is output.

As an optimal technical scheme, the training process of the comprehensive analysis module is as follows: preparing a data set, wherein the data set is reserved with a plurality of sets of data, and one set of data comprises six measured object images in each view field and a semantic segmentation mask with a completed label; 70% of the data set is randomly selected as a training set, 20% is a testing set, and 10% is a verification set; in the single training process, the loss function in the training set is used as the training direction, the loss function in the test set is used as the parameter adjustment basis, and the final effect is characterized by verifying the loss function in the set.

As a preferable technical scheme, the objective function derived in the training process adopts an end-to-end semantic segmentation training mode, and is expressed as a difference loss function of a semantic segmentation mask.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a multi-channel parallel input image segmentation model, the number of pictures input by the model can be freely configured, and the model is not limited to a single piece or fixed pieces. The model thus has algorithmic capabilities that integrate multiple channels of different images. Meanwhile, as the image segmentation model based on the deep neural network is used, in the end-to-end labeling link, the model can automatically learn and acquire the characteristics of the same position in different images only by labeling the defect position in the image of one channel. And establishing correlations among multichannel images, high-dimensional features in current pixels of the images and the adjacent areas of the images through a convolutional neural network, and finally realizing correct judgment.

Drawings

FIG. 1 is a schematic diagram of an image segmentation model of a multi-channel parallel input of the present invention;

FIG. 2 is a schematic diagram of a feature analysis module in an image segmentation model of the multi-channel parallel input of the present invention;

FIG. 3 is a schematic diagram of a multi-channel semantic synthesis module in a multi-channel parallel input image segmentation model of the present invention;

FIG. 4 is a schematic diagram of the SPADENESblock module of FIG. 3;

FIG. 5 is a schematic diagram of the SPADE sub-module of FIG. 4;

FIG. 6 is a schematic diagram of a comprehensive analysis module in an image segmentation model of the multi-channel parallel input of the present invention.

Detailed Description

The technical scheme of the invention is further described below with reference to the specific embodiments:

as shown in FIG. 1, the image segmentation model with parallel multi-channel input comprises an input image group, a network architecture and an output image, wherein the input image group is an image group formed by a plurality of different channel images, the output image is a segmentation result with the same resolution as the input image, and the network architecture comprises a feature analysis module, a multi-channel semantic synthesis module and a comprehensive analysis module.

As shown in fig. 2, the feature analysis module performs independent calculation for each single-channel image, and sends each single-channel image into each sub-network model according to the high-low dimensional semantic feature and the image feature, and finally one single-channel image outputs two types of features, one is the high-low dimensional semantic feature and the other is the image feature. The high-low dimensional semantic features are essentially long-column feature vectors, contain some representative high-and low-dimensional information in the image, and are characterized by multi-channel tensors.

As shown in fig. 3, taking a six-channel input image as an example, after feature analysis of six single-channel images, six high-ground dimensions and features corresponding to the six single-channel images and six image features are obtained, and the two groups of images are selectively input to a later multi-channel semantic synthesis module.

The multi-channel semantic synthesis module adopts twenty parallel semantic synthesis modules with the same function, and the core function of the semantic synthesis module is to use the high-low dimensional characteristics of three images in the six-channel image to carry out semantic fusion on the image characteristics of the other three images so as to obtain a characteristic tensor fused with six images and different dimensional information.

As shown in fig. 4, the single semantic integration module is composed of a plurality of SPADE sub-modules, and as shown in fig. 5, the SPADE sub-modules fuse high-low dimensional semantic features with image features through a mode of multiple convolution and addition.

And inputting twenty semantic comprehensive results into a final comprehensive analysis link.

As shown in fig. 6, the comprehensive analysis module first splices the features of all channels into a large-size tensor, and performs dimension-reducing hole convolution on the large-size tensor until a single-channel mask image is formed, namely outputting the result.

The training process of the comprehensive analysis module is as follows: preparing a data set, wherein the data set is reserved with a plurality of sets of data, and one set of data comprises six measured object images in each view field and a semantic segmentation mask with a completed label; 70% of the data set is randomly selected as a training set, 20% is a testing set, and 10% is a verification set; in the single training process, the loss function in the training set is used as the training direction, the loss function in the test set is used as the parameter adjustment basis, and the final effect is characterized by verifying the loss function in the set. Because the end-to-end semantic segmentation training mode is adopted, the loss function is used as a derivative objective function in the training process and is expressed as a difference loss function of the semantic segmentation mask.

The present embodiment is further illustrative of the present invention and is not to be construed as limiting the invention, and those skilled in the art can make no inventive modifications to the present embodiment as required after reading the present specification, but only as long as they are within the scope of the claims of the present invention.

Claims

1. The image segmentation model is characterized by comprising an input image group, a network architecture and an output image, wherein the input image group is an image group formed by a plurality of different channel images, the output image is a segmentation result with the same resolution as the input image, and the network architecture comprises a feature analysis module, a multi-channel semantic synthesis module and a comprehensive analysis module.

2. The image segmentation model of claim 1, wherein the feature analysis module performs independent calculation for each single-channel image, and sends each single-channel image into each sub-network model according to high-low dimensional semantic features and image features, and finally one single-channel image outputs two types of features, one is high-low dimensional semantic features and the other is image features.

3. The image segmentation model of claim 2, wherein the Gao Diwei semantic feature is essentially a long-column feature vector containing high-and low-dimensional information representative of images, the image feature being a multi-channel tensor.

4. The image segmentation model of multi-channel parallel input according to claim 2, wherein the multi-channel semantic synthesis module adopts twenty parallel semantic synthesis modules with the same function, three images are selected from six-channel images as image features, three channels are selected as high-low dimensional semantic features for calculation, and the twenty obtained semantic synthesis results are input to the comprehensive analysis module.

5. The multi-channel parallel input image segmentation model according to claim 4, wherein the single semantic synthesis module is composed of a plurality of SPADE sub-modules, and the SPADE sub-modules fuse high-low dimensional semantic features with image features in a mode of multiple convolution and addition.

6. The multi-channel parallel input image segmentation model according to claim 1, wherein the comprehensive analysis module splices the features of all channels into a large-size tensor, and performs dimension-reducing hole convolution on the large-size tensor until a single-channel mask image is formed, namely outputting a result.

7. The multi-channel parallel input image segmentation model of claim 6, wherein the training process of the comprehensive analysis module is as follows: preparing a data set, wherein a plurality of sets of data are reserved in the data set, and one set of data comprises six measured object images in each view field and a semantic segmentation mask with a completed label; 70% of the data set is randomly selected as a training set, 20% is a testing set, and 10% is a verification set; in the single training process, the loss function in the training set is used as the training direction, the loss function in the test set is used as the parameter adjustment basis, and the final effect is characterized by verifying the loss function in the set.

8. The multi-channel parallel input image segmentation model of claim 7, wherein the objective function derived during training is represented as a difference loss function of a semantic segmentation mask using an end-to-end semantic segmentation training model.